Re: Session timeout

2016-01-29 Thread Carlos Alonso
I've been in this community and mailing list quite a while now and it's
hard to find questions without answer. There are lots of good experts
willing to help here. If you don't see you question answered I'd advice you
to send it again, because its also true that the mailing list has quite a
lot of activity and its easy sometimes to miss emails.

About this session timeout thing, could you please reply to this thread if
you find a solution? I'm curious about it.

Cheers!

Carlos Alonso | Software Engineer | @calonso 

On 29 January 2016 at 14:19, oleg yusim  wrote:

> Not a problem, Carlos, at least you tried :) I have overall a big problem
> with my queries to Cassandra community. Most of them are not getting
> answered.
>
> Oleg
>
> On Fri, Jan 29, 2016 at 8:03 AM, Carlos Alonso  wrote:
>
>> Oh, I thought you meant read/write timeout, not session timeout due to
>> inactivity...
>>
>> Not sure there's such option. Sorry
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 29 January 2016 at 13:35, oleg yusim  wrote:
>>
>>> Carlos,
>>>
>>> I went through Java and Python drivers... didn't find anything like
>>> that. Can you bring me example from your Ruby driver? Let me also make sure
>>> we are on the same page - I'm talking about session timeout due to
>>> inactivity, not read timeout or something like that.
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso 
>>> wrote:
>>>
 I personally don't use the Java but the Ruby driver, but I'm pretty
 sure you'll be able to find it in the docs:
 https://github.com/datastax/java-driver

 Carlos Alonso | Software Engineer | @calonso
 

 On 29 January 2016 at 13:15, oleg yusim  wrote:

> Hi Carlos,
>
> Thanks for your anwer. Can you, please, get me a bit me information?
> What is the driver? JDBC? What is the name of configuration file?
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
> wrote:
>
>> Hi Oleg.
>>
>> The drivers have builtin the timeout configurable functionality.
>>
>> Hope it helps.
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 28 January 2016 at 22:18, oleg yusim  wrote:
>>
>>> Greetings,
>>>
>>> Does Cassandra support session timeout? If so, where can I find this
>>> configuration switch? If not, what kind of hook I can use to write my 
>>> out
>>> code, terminating session in so many seconds of inactivity?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>
>>
>

>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Hi Romain,

The RF was set to 2.

I changed it to one.

 CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
'replication_factor' : 1}  AND durable_writes = true;

re-inserted the columns, still missing rows.

Regards,
Arindam

On 29 January 2016 at 15:14, Romain Hardouin  wrote:

> Hi,
>
> I assume a RF > 1. Right?
> What is the consistency level you used? cqlsh use ONE by default.
> Try:
> cqlsh> CONSISTENCY ALL
> And run your query again.
>
> Best,
> Romain
>
>
> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> a écrit :
>
>
> Hi Kai,
>
> The table schema is:
>
> CREATE TABLE mordor.things_values_meta (
> thing_id text,
> key text,
> bucket_timestamp timestamp,
> total_rows counter,
> PRIMARY KEY ((thing_id, key), bucket_timestamp)
> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> I am just running "select count(*) from things_values_meta ;" to get the
> count.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 13:39, Kai Wang  wrote:
>
> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>
>
>
>
>
>


Re: compaction throughput

2016-01-29 Thread Jan Karlsson
Keep in mind that compaction in LCS can only run 1 compaction per level. 
Even if it wants to run more compactions in L0 it might be blocked 
because it is already running a compaction in L0.


BR
Jan

On 01/16/2016 01:26 AM, Sebastian Estevez wrote:


LCS is IO ontensive but CPU is also relevant.

On slower disks compaction may not be cpu bound.

If you aren't seeing more than one compaction thread at a time, I 
suspect your system is not compaction bound.


all the best,

Sebastián

On Jan 15, 2016 7:20 PM, "Kai Wang" > wrote:


Sebastian,

Because I have this impression that LCS is IO intensive and it's
recommended only on SSDs. So I am curious to see how far it can
stress those SSDs. But it turns out the most expensive part about
LCS is not IO bound but CUP bound, or more precisely single core
speed bound. This is a little surprising.

Of course LCS is still superior in other aspects.

On Jan 15, 2016 6:34 PM, "Sebastian Estevez"
> wrote:

Correct.

Why are you concerned with the raw throughput, are you
accumulating pending compactions? Are you seeing high sstables
per read statistics?

all the best,

Sebastián

On Jan 15, 2016 6:18 PM, "Kai Wang" > wrote:

Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C*
only uses one core most of the time. *nodetool
compactionstats* shows there's only one compactor running.
I can see C* process only uses one core. So I guess I
should've asked the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
2. Is there any configuration that affects single core
compaction throughput?
3. Is concurrent_compactors the only option to parallelize
compaction? If so, I guess it's the compaction strategy
itself that decides when to parallelize and when to block
on one core. Then there's not much we can do here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa
> wrote:

With SSDs, the typical recommendation is up to 0.8-1
compactor per core (depending on other load). How many
CPU cores do you have?


From: Kai Wang
Reply-To: "user@cassandra.apache.org
"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org
"
Subject: compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction
on my node. The node is CentOS 7 and has SSDs
installed. The table is configured to use LCS. Here is
my compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

*nodetool compaction* shows most of time there is one
compaction. Sometimes there are 3-4 (I suppose this is
controlled by concurrent_compactors). During the
compaction, I see one CPU core is 100%. At that point,
disk IO is about 20-25 M/s write which is much lower
than the disk is capable of. Even when there are 4
compactions running, I see CPU go to +400% but disk IO
is still at 20-25M/s write. I use *nodetool
setcompactionthroughput 0* to disable the compaction
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is
kinda low. Is there anyway to improve the throughput?

Thanks.






Re: Session timeout

2016-01-29 Thread oleg yusim
Carlos,

I went through Java and Python drivers... didn't find anything like that.
Can you bring me example from your Ruby driver? Let me also make sure we
are on the same page - I'm talking about session timeout due to inactivity,
not read timeout or something like that.

Thanks,

Oleg

On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso  wrote:

> I personally don't use the Java but the Ruby driver, but I'm pretty sure
> you'll be able to find it in the docs:
> https://github.com/datastax/java-driver
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 29 January 2016 at 13:15, oleg yusim  wrote:
>
>> Hi Carlos,
>>
>> Thanks for your anwer. Can you, please, get me a bit me information? What
>> is the driver? JDBC? What is the name of configuration file?
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
>> wrote:
>>
>>> Hi Oleg.
>>>
>>> The drivers have builtin the timeout configurable functionality.
>>>
>>> Hope it helps.
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 28 January 2016 at 22:18, oleg yusim  wrote:
>>>
 Greetings,

 Does Cassandra support session timeout? If so, where can I find this
 configuration switch? If not, what kind of hook I can use to write my out
 code, terminating session in so many seconds of inactivity?

 Thanks,

 Oleg

>>>
>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Romain Hardouin
Hi,
I assume a RF > 1. Right?What is the consistency level you used? cqlsh use ONE 
by default. Try: cqlsh> CONSISTENCY ALLAnd run your query again.
Best,Romain 

Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury 
 a écrit :
 

 Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
    thing_id text,
    key text,
    bucket_timestamp timestamp,
    total_rows counter,
    PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang  wrote:

Arindam,

what's the table schema and what does your query to retrieve the rows look like?
 
On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury 
 wrote:

Hi,

I am importing data to a new cassandra cluster using sstableloader. The 
sstableloader runs without any warning or error. But I am missing around 1000 
rows.

Any feedback will be highly appreciated. 

Kind Regards,
Arindam Choudhury






  

Re: Session timeout

2016-01-29 Thread oleg yusim
Not a problem, Carlos, at least you tried :) I have overall a big problem
with my queries to Cassandra community. Most of them are not getting
answered.

Oleg

On Fri, Jan 29, 2016 at 8:03 AM, Carlos Alonso  wrote:

> Oh, I thought you meant read/write timeout, not session timeout due to
> inactivity...
>
> Not sure there's such option. Sorry
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 29 January 2016 at 13:35, oleg yusim  wrote:
>
>> Carlos,
>>
>> I went through Java and Python drivers... didn't find anything like that.
>> Can you bring me example from your Ruby driver? Let me also make sure we
>> are on the same page - I'm talking about session timeout due to inactivity,
>> not read timeout or something like that.
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso 
>> wrote:
>>
>>> I personally don't use the Java but the Ruby driver, but I'm pretty sure
>>> you'll be able to find it in the docs:
>>> https://github.com/datastax/java-driver
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 29 January 2016 at 13:15, oleg yusim  wrote:
>>>
 Hi Carlos,

 Thanks for your anwer. Can you, please, get me a bit me information?
 What is the driver? JDBC? What is the name of configuration file?

 Thanks,

 Oleg

 On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
 wrote:

> Hi Oleg.
>
> The drivers have builtin the timeout configurable functionality.
>
> Hope it helps.
>
> Carlos Alonso | Software Engineer | @calonso
> 
>
> On 28 January 2016 at 22:18, oleg yusim  wrote:
>
>> Greetings,
>>
>> Does Cassandra support session timeout? If so, where can I find this
>> configuration switch? If not, what kind of hook I can use to write my out
>> code, terminating session in so many seconds of inactivity?
>>
>> Thanks,
>>
>> Oleg
>>
>
>

>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
I will check the output of nodetool cfstats.

Its from version 2.1.2 to version 2.1.9.

On 29 January 2016 at 16:02, Jack Krupansky 
wrote:

> Are these sstables from an existing Cassandra cluster or generated by a
> program?
>
> If the former, do a nodetool tablestats or cfstats to get the sstable
> count and compare it to both the number of sstables that the loader is
> reading from and the number that end up in the target cluster.
>
> What Cassandra version did the sstables come from and what version are you
> importing into?
>
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi Romain,
>>
>> The RF was set to 2.
>>
>> I changed it to one.
>>
>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>> 'replication_factor' : 1}  AND durable_writes = true;
>>
>> re-inserted the columns, still missing rows.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 15:14, Romain Hardouin  wrote:
>>
>>> Hi,
>>>
>>> I assume a RF > 1. Right?
>>> What is the consistency level you used? cqlsh use ONE by default.
>>> Try:
>>> cqlsh> CONSISTENCY ALL
>>> And run your query again.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> a écrit :
>>>
>>>
>>> Hi Kai,
>>>
>>> The table schema is:
>>>
>>> CREATE TABLE mordor.things_values_meta (
>>> thing_id text,
>>> key text,
>>> bucket_timestamp timestamp,
>>> total_rows counter,
>>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>> AND bloom_filter_fp_chance = 0.01
>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>> AND comment = ''
>>> AND compaction = {'min_threshold': '4', 'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32'}
>>> AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99.0PERCENTILE';
>>>
>>>
>>> I am just running "select count(*) from things_values_meta ;" to get the
>>> count.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>>
>>> Arindam,
>>>
>>> what's the table schema and what does your query to retrieve the rows
>>> look like?
>>>
>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
>>> Hi,
>>>
>>> I am importing data to a new cassandra cluster using sstableloader. The
>>> sstableloader runs without any warning or error. But I am missing around
>>> 1000 rows.
>>>
>>> Any feedback will be highly appreciated.
>>>
>>> Kind Regards,
>>> Arindam Choudhury
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Session timeout

2016-01-29 Thread Carlos Alonso
Oh, I thought you meant read/write timeout, not session timeout due to
inactivity...

Not sure there's such option. Sorry

Carlos Alonso | Software Engineer | @calonso 

On 29 January 2016 at 13:35, oleg yusim  wrote:

> Carlos,
>
> I went through Java and Python drivers... didn't find anything like that.
> Can you bring me example from your Ruby driver? Let me also make sure we
> are on the same page - I'm talking about session timeout due to inactivity,
> not read timeout or something like that.
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso  wrote:
>
>> I personally don't use the Java but the Ruby driver, but I'm pretty sure
>> you'll be able to find it in the docs:
>> https://github.com/datastax/java-driver
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 29 January 2016 at 13:15, oleg yusim  wrote:
>>
>>> Hi Carlos,
>>>
>>> Thanks for your anwer. Can you, please, get me a bit me information?
>>> What is the driver? JDBC? What is the name of configuration file?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
>>> wrote:
>>>
 Hi Oleg.

 The drivers have builtin the timeout configurable functionality.

 Hope it helps.

 Carlos Alonso | Software Engineer | @calonso
 

 On 28 January 2016 at 22:18, oleg yusim  wrote:

> Greetings,
>
> Does Cassandra support session timeout? If so, where can I find this
> configuration switch? If not, what kind of hook I can use to write my out
> code, terminating session in so many seconds of inactivity?
>
> Thanks,
>
> Oleg
>


>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
Are these sstables from an existing Cassandra cluster or generated by a
program?

If the former, do a nodetool tablestats or cfstats to get the sstable count
and compare it to both the number of sstables that the loader is reading
from and the number that end up in the target cluster.

What Cassandra version did the sstables come from and what version are you
importing into?


-- Jack Krupansky

On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi Romain,
>
> The RF was set to 2.
>
> I changed it to one.
>
>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
> 'replication_factor' : 1}  AND durable_writes = true;
>
> re-inserted the columns, still missing rows.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 15:14, Romain Hardouin  wrote:
>
>> Hi,
>>
>> I assume a RF > 1. Right?
>> What is the consistency level you used? cqlsh use ONE by default.
>> Try:
>> cqlsh> CONSISTENCY ALL
>> And run your query again.
>>
>> Best,
>> Romain
>>
>>
>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> a écrit :
>>
>>
>> Hi Kai,
>>
>> The table schema is:
>>
>> CREATE TABLE mordor.things_values_meta (
>> thing_id text,
>> key text,
>> bucket_timestamp timestamp,
>> total_rows counter,
>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>> AND comment = ''
>> AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>> AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99.0PERCENTILE';
>>
>>
>> I am just running "select count(*) from things_values_meta ;" to get the
>> count.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>
>> Arindam,
>>
>> what's the table schema and what does your query to retrieve the rows
>> look like?
>>
>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>>
>>
>>
>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
I am counting the rows with "select count(*) from
mordor.things_values_meta;"

I am doing one node cluster to one node cluster for testing.

On 29 January 2016 at 16:20, Jack Krupansky 
wrote:

> And how are you counting the rows? With a query? If, so, what is the
> query. Using nodetool cfstats (estimated) key count? Or... what?
>
> Are the tokens for the missing rows is the same range and a distinct range
> from the rest of the data in the original cluster?
>
> How many nodes in the original cluster?
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> I will check the output of nodetool cfstats.
>>
>> Its from version 2.1.2 to version 2.1.9.
>>
>> On 29 January 2016 at 16:02, Jack Krupansky 
>> wrote:
>>
>>> Are these sstables from an existing Cassandra cluster or generated by a
>>> program?
>>>
>>> If the former, do a nodetool tablestats or cfstats to get the sstable
>>> count and compare it to both the number of sstables that the loader is
>>> reading from and the number that end up in the target cluster.
>>>
>>> What Cassandra version did the sstables come from and what version are
>>> you importing into?
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
 Hi Romain,

 The RF was set to 2.

 I changed it to one.

  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
 'replication_factor' : 1}  AND durable_writes = true;

 re-inserted the columns, still missing rows.

 Regards,
 Arindam

 On 29 January 2016 at 15:14, Romain Hardouin 
 wrote:

> Hi,
>
> I assume a RF > 1. Right?
> What is the consistency level you used? cqlsh use ONE by default.
> Try:
> cqlsh> CONSISTENCY ALL
> And run your query again.
>
> Best,
> Romain
>
>
> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> a écrit :
>
>
> Hi Kai,
>
> The table schema is:
>
> CREATE TABLE mordor.things_values_meta (
> thing_id text,
> key text,
> bucket_timestamp timestamp,
> total_rows counter,
> PRIMARY KEY ((thing_id, key), bucket_timestamp)
> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> I am just running "select count(*) from things_values_meta ;" to get
> the count.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 13:39, Kai Wang  wrote:
>
> Arindam,
>
> what's the table schema and what does your query to retrieve the rows
> look like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader.
> The sstableloader runs without any warning or error. But I am missing
> around 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>
>
>
>
>
>

>>>
>>
>


Re: Cassandra's log is full of mesages reset by peers even without traffic

2016-01-29 Thread Anuj Wadehra
Hi Jean,
Please make sure that your Firewall is not dropping TCP connections which are 
in use. Tcp keep alive on all nodes must be less than the firewall setting. 
Please refer to 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
 for details on TCP settings.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Fri, 29 Jan, 2016 at 3:21 pm, Jean Carlo wrote: 
  Hello guys, 

I have a cluster cassandra 2.1.12 with 6 nodes. All the logs of my nodes are 
having this messages marked as INFO

INFO  [SharedPool-Worker-1] 2016-01-29 10:40:57,745 Message.java:532 - 
Unexpected exception during request; channel = [id: 0xff15eb8c, 
/172.16.162.4:9042]
java.io.IOException: Error while read(...): Connection reset by peer
    at io.netty.channel.epoll.Native.readAddress(Native Method) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

This happens either the cluster is stressed or not. Btw it is not production. 
The ip marked there (172.16.162.4) belongs to a node of the cluster, this is 
not the only node that appears, acctually we are having all the node's ip 
having that reset by peer problem.

Our cluster is having more reads than writes. like 50 reads per second. 

Any one got the same problem?


Best regards

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay
  


Re: Session timeout

2016-01-29 Thread oleg yusim
Hi Carlos,

Thanks for your anwer. Can you, please, get me a bit me information? What
is the driver? JDBC? What is the name of configuration file?

Thanks,

Oleg

On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso  wrote:

> Hi Oleg.
>
> The drivers have builtin the timeout configurable functionality.
>
> Hope it helps.
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 28 January 2016 at 22:18, oleg yusim  wrote:
>
>> Greetings,
>>
>> Does Cassandra support session timeout? If so, where can I find this
>> configuration switch? If not, what kind of hook I can use to write my out
>> code, terminating session in so many seconds of inactivity?
>>
>> Thanks,
>>
>> Oleg
>>
>
>


Re: Session timeout

2016-01-29 Thread Carlos Alonso
I personally don't use the Java but the Ruby driver, but I'm pretty sure
you'll be able to find it in the docs:
https://github.com/datastax/java-driver

Carlos Alonso | Software Engineer | @calonso 

On 29 January 2016 at 13:15, oleg yusim  wrote:

> Hi Carlos,
>
> Thanks for your anwer. Can you, please, get me a bit me information? What
> is the driver? JDBC? What is the name of configuration file?
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso  wrote:
>
>> Hi Oleg.
>>
>> The drivers have builtin the timeout configurable functionality.
>>
>> Hope it helps.
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 28 January 2016 at 22:18, oleg yusim  wrote:
>>
>>> Greetings,
>>>
>>> Does Cassandra support session timeout? If so, where can I find this
>>> configuration switch? If not, what kind of hook I can use to write my out
>>> code, terminating session in so many seconds of inactivity?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>
>>
>


Re: Session timeout

2016-01-29 Thread Carlos Alonso
Hi Oleg.

The drivers have builtin the timeout configurable functionality.

Hope it helps.

Carlos Alonso | Software Engineer | @calonso 

On 28 January 2016 at 22:18, oleg yusim  wrote:

> Greetings,
>
> Does Cassandra support session timeout? If so, where can I find this
> configuration switch? If not, what kind of hook I can use to write my out
> code, terminating session in so many seconds of inactivity?
>
> Thanks,
>
> Oleg
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Kai Wang
Arindam,

what's the table schema and what does your query to retrieve the rows look
like?

On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Hi,
>
> I am importing data to a new cassandra cluster using sstableloader. The
> sstableloader runs without any warning or error. But I am missing around
> 1000 rows.
>
> Any feedback will be highly appreciated.
>
> Kind Regards,
> Arindam Choudhury
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
thing_id text,
key text,
bucket_timestamp timestamp,
total_rows counter,
PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the
count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang  wrote:

> Arindam,
>
> what's the table schema and what does your query to retrieve the rows look
> like?
>
> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader. The
>> sstableloader runs without any warning or error. But I am missing around
>> 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>
>


missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Hi,

I am importing data to a new cassandra cluster using sstableloader. The
sstableloader runs without any warning or error. But I am missing around
1000 rows.

Any feedback will be highly appreciated.

Kind Regards,
Arindam Choudhury


Re: Cassandra's log is full of mesages reset by peers even without traffic

2016-01-29 Thread Jean Carlo
Hi anuj,

Thx for your replay, acctually I paste part of the result of the command
gre done over one log and I can see only the ip of the machine local


grep   "Unexpected exception during request"
/var/opt/hosting/log/cassandra/system.log

INFO  [SharedPool-Worker-1] 2016-01-29 10:40:47,744 Message.java:532 -
Unexpected exception during request; channel = [id: 0x6ebe93cb, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-1] 2016-01-29 10:40:57,745 Message.java:532 -
Unexpected exception during request; channel = [id: 0xff15eb8c, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-1] 2016-01-29 10:54:33,721 Message.java:532 -
Unexpected exception during request; channel = [id: 0xc42cc7ff, /
172.16.162.2:11436 :> /172.16.162.5:9042]
INFO  [SharedPool-Worker-2] 2016-01-29 10:45:47,761 Message.java:532 -
Unexpected exception during request; channel = [id: 0x603349e4, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-3] 2016-01-29 10:48:17,766 Message.java:532 -
Unexpected exception during request; channel = [id: 0x5bed4eae, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-2] 2016-01-29 10:48:27,767 Message.java:532 -
Unexpected exception during request; channel = [id: 0x6136756b, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-4] 2016-01-29 10:48:27,767 Message.java:532 -
Unexpected exception during request; channel = [id: 0x17c83eb8, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-4] 2016-01-29 10:52:07,778 Message.java:532 -
Unexpected exception during request; channel = [id: 0x1a78b589, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-2] 2016-01-29 10:52:17,779 Message.java:532 -
Unexpected exception during request; channel = [id: 0x017117b3, /
172.16.162.4:9042]
INFO  [SharedPool-Worker-2] 2016-01-29 11:01:37,813 Message.java:532 -
Unexpected exception during request; channel = [id: 0x3efbd844, /
172.16.162.4:9042]

Then I don't know if the firewall has something to do on that case, becasue
it is a local connection over native protocol

Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Fri, Jan 29, 2016 at 11:02 AM, Anuj Wadehra 
wrote:

> Hi Jean,
>
> Please make sure that your Firewall is not dropping TCP connections which
> are in use. Tcp keep alive on all nodes must be less than the firewall
> setting. Please refer to
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>  for
> details on TCP settings.
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Fri, 29 Jan, 2016 at 3:21 pm, Jean Carlo
>  wrote:
> Hello guys,
>
> I have a cluster cassandra 2.1.12 with 6 nodes. All the logs of my nodes
> are having this messages marked as INFO
>
> INFO  [SharedPool-Worker-1] 2016-01-29 10:40:57,745 Message.java:532 -
> Unexpected exception during request; channel = [id: 0xff15eb8c, /
> 172.16.162.4:9042]
> java.io.IOException: Error while read(...): Connection reset by peer
> at io.netty.channel.epoll.Native.readAddress(Native Method)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
>
> This happens either the cluster is stressed or not. Btw it is not
> production. The ip marked there (172.16.162.4) belongs to a node of the
> cluster, this is not the only node that appears, acctually we are having
> all the node's ip having that reset by peer problem.
>
> Our cluster is having more reads than writes. like 50 reads per second.
>
> Any one got the same problem?
>
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>


Cassandra's log is full of mesages reset by peers even without traffic

2016-01-29 Thread Jean Carlo
Hello guys,

I have a cluster cassandra 2.1.12 with 6 nodes. All the logs of my nodes
are having this messages marked as INFO

INFO  [SharedPool-Worker-1] 2016-01-29 10:40:57,745 Message.java:532 -
Unexpected exception during request; channel = [id: 0xff15eb8c, /
172.16.162.4:9042]
java.io.IOException: Error while read(...): Connection reset by peer
at io.netty.channel.epoll.Native.readAddress(Native Method)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

This happens either the cluster is stressed or not. Btw it is not
production. The ip marked there (172.16.162.4) belongs to a node of the
cluster, this is not the only node that appears, acctually we are having
all the node's ip having that reset by peer problem.

Our cluster is having more reads than writes. like 50 reads per second.

Any one got the same problem?


Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


Re: Session timeout

2016-01-29 Thread oleg yusim
Hi Carlos,

Thanks for encouraging me, I kind of grew a bit desperate. I'm security
person, not a Cassandra expert, and doing security assessment of Cassandra
DB, I have to rely on community heavily. I will put together a composed
version of all my previous queries, will title it "Security assessment
questions" and will post it once again.

As per the session timeout, my understanding, Cassandra currently doesn't
support it. I didn't find any mention of it in documentation. Also, I just
ran simple experiment on my installation (version 2.1.8, default settings):
I opened two ssh sessions on my Linux server, hosting Cassandra DB. On one,
I entered cqlsh, another was just left as is. Then I stepped away from
computer and went for breakfast. Now here are the results: first session
with cqlsh still sits there, 50 minutes into it. Second was terminated in
15 minutes (ssh session timeout).

As per the mailing list, a little housekeeping suggestion if I may. Right
now our mailing list allows to reply to user@cassandra.apache.org. That
leads to a situation, when all the emails are getting filtered to the same
folder at the recipients end (I have it setup such way, I'm sure everybody
else have similar setup too). If we will introduce "Reply to All" option,
which would allow to reply not only to mailing list, but to personal email
addresses of guys, involved into this particular conversation, those emails
would bypass filters and would end up in our personal emails space in
Inbox. This way we would help correspondents, engaged into the conversation
to notice those emails easily, understand that those are targeted to them
and stay engaged in the conversation, until the issue would be resolved one
way or another.

Thanks,

Oleg



On Fri, Jan 29, 2016 at 8:27 AM, Carlos Alonso  wrote:

> I've been in this community and mailing list quite a while now and it's
> hard to find questions without answer. There are lots of good experts
> willing to help here. If you don't see you question answered I'd advice you
> to send it again, because its also true that the mailing list has quite a
> lot of activity and its easy sometimes to miss emails.
>
> About this session timeout thing, could you please reply to this thread if
> you find a solution? I'm curious about it.
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 29 January 2016 at 14:19, oleg yusim  wrote:
>
>> Not a problem, Carlos, at least you tried :) I have overall a big problem
>> with my queries to Cassandra community. Most of them are not getting
>> answered.
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 8:03 AM, Carlos Alonso 
>> wrote:
>>
>>> Oh, I thought you meant read/write timeout, not session timeout due to
>>> inactivity...
>>>
>>> Not sure there's such option. Sorry
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 29 January 2016 at 13:35, oleg yusim  wrote:
>>>
 Carlos,

 I went through Java and Python drivers... didn't find anything like
 that. Can you bring me example from your Ruby driver? Let me also make sure
 we are on the same page - I'm talking about session timeout due to
 inactivity, not read timeout or something like that.

 Thanks,

 Oleg

 On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso 
 wrote:

> I personally don't use the Java but the Ruby driver, but I'm pretty
> sure you'll be able to find it in the docs:
> https://github.com/datastax/java-driver
>
> Carlos Alonso | Software Engineer | @calonso
> 
>
> On 29 January 2016 at 13:15, oleg yusim  wrote:
>
>> Hi Carlos,
>>
>> Thanks for your anwer. Can you, please, get me a bit me information?
>> What is the driver? JDBC? What is the name of configuration file?
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
>> wrote:
>>
>>> Hi Oleg.
>>>
>>> The drivers have builtin the timeout configurable functionality.
>>>
>>> Hope it helps.
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 28 January 2016 at 22:18, oleg yusim  wrote:
>>>
 Greetings,

 Does Cassandra support session timeout? If so, where can I find
 this configuration switch? If not, what kind of hook I can use to 
 write my
 out code, terminating session in so many seconds of inactivity?

 Thanks,

 Oleg

>>>
>>>
>>
>

>>>
>>
>


Re: Session timeout

2016-01-29 Thread Jonathan Haddad
I think the reason why most of your queries aren't being answered is
because you're asking questions that most people don't have the answer to.
On the automatic disconnect, anyone using Cassandra in prod doesn't really
need to think about it because we're always running queries, perhaps
millions a second.  Queries are multiplexed over a single connection.
Almost nobody ever actually runs into a case of leaving a socket open for
hours without a query, so to find out if it actually happens, someone would
have to look it up in the source.

Your questions about auditing are geared more towards if you're using a
database that's built for multi user access.  Cassandra was built to solve
a very different problem.  In most cases, you don't have hundreds of people
connecting from a shell, leaving connections open, casually querying for BI
reports.  This isn't how *most* people use Cassandra, it wasn't really
built for that.  There's better support for users & roles nowadays but it's
relatively new and that's about all you have right now.

I realize you're new to the community, and it can be frustrating to not get
answers to questions that seem completely basic and obvious, but you're
asking about areas that *most* people on this list don't have knowledge
about and zero motivation to learn, because it's not necessary to solve the
problems we face.


On Fri, Jan 29, 2016 at 6:19 AM oleg yusim  wrote:

> Not a problem, Carlos, at least you tried :) I have overall a big problem
> with my queries to Cassandra community. Most of them are not getting
> answered.
>
> Oleg
>
> On Fri, Jan 29, 2016 at 8:03 AM, Carlos Alonso  wrote:
>
>> Oh, I thought you meant read/write timeout, not session timeout due to
>> inactivity...
>>
>> Not sure there's such option. Sorry
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 29 January 2016 at 13:35, oleg yusim  wrote:
>>
>>> Carlos,
>>>
>>> I went through Java and Python drivers... didn't find anything like
>>> that. Can you bring me example from your Ruby driver? Let me also make sure
>>> we are on the same page - I'm talking about session timeout due to
>>> inactivity, not read timeout or something like that.
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 7:23 AM, Carlos Alonso 
>>> wrote:
>>>
 I personally don't use the Java but the Ruby driver, but I'm pretty
 sure you'll be able to find it in the docs:
 https://github.com/datastax/java-driver

 Carlos Alonso | Software Engineer | @calonso
 

 On 29 January 2016 at 13:15, oleg yusim  wrote:

> Hi Carlos,
>
> Thanks for your anwer. Can you, please, get me a bit me information?
> What is the driver? JDBC? What is the name of configuration file?
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 5:12 AM, Carlos Alonso 
> wrote:
>
>> Hi Oleg.
>>
>> The drivers have builtin the timeout configurable functionality.
>>
>> Hope it helps.
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 28 January 2016 at 22:18, oleg yusim  wrote:
>>
>>> Greetings,
>>>
>>> Does Cassandra support session timeout? If so, where can I find this
>>> configuration switch? If not, what kind of hook I can use to write my 
>>> out
>>> code, terminating session in so many seconds of inactivity?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>
>>
>

>>>
>>
>


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Arindam Choudhury
Why in cqlsh when I query "select count(*) from mordor.things_values_meta
;" it says: 4692

But in nodetool cfstats it says Number of keys (estimate): 4720?

On 29 January 2016 at 16:25, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> I am counting the rows with "select count(*) from
> mordor.things_values_meta;"
>
> I am doing one node cluster to one node cluster for testing.
>
> On 29 January 2016 at 16:20, Jack Krupansky 
> wrote:
>
>> And how are you counting the rows? With a query? If, so, what is the
>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>
>> Are the tokens for the missing rows is the same range and a distinct
>> range from the rest of the data in the original cluster?
>>
>> How many nodes in the original cluster?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>>> I will check the output of nodetool cfstats.
>>>
>>> Its from version 2.1.2 to version 2.1.9.
>>>
>>> On 29 January 2016 at 16:02, Jack Krupansky 
>>> wrote:
>>>
 Are these sstables from an existing Cassandra cluster or generated by a
 program?

 If the former, do a nodetool tablestats or cfstats to get the sstable
 count and compare it to both the number of sstables that the loader is
 reading from and the number that end up in the target cluster.

 What Cassandra version did the sstables come from and what version are
 you importing into?


 -- Jack Krupansky

 On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> wrote:

> Hi Romain,
>
> The RF was set to 2.
>
> I changed it to one.
>
>  CREATE KEYSPACE mordor WITH replication = {'class' :
> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>
> re-inserted the columns, still missing rows.
>
> Regards,
> Arindam
>
> On 29 January 2016 at 15:14, Romain Hardouin 
> wrote:
>
>> Hi,
>>
>> I assume a RF > 1. Right?
>> What is the consistency level you used? cqlsh use ONE by default.
>> Try:
>> cqlsh> CONSISTENCY ALL
>> And run your query again.
>>
>> Best,
>> Romain
>>
>>
>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> a écrit :
>>
>>
>> Hi Kai,
>>
>> The table schema is:
>>
>> CREATE TABLE mordor.things_values_meta (
>> thing_id text,
>> key text,
>> bucket_timestamp timestamp,
>> total_rows counter,
>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>> AND comment = ''
>> AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>> AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99.0PERCENTILE';
>>
>>
>> I am just running "select count(*) from things_values_meta ;" to get
>> the count.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>
>> Arindam,
>>
>> what's the table schema and what does your query to retrieve the rows
>> look like?
>>
>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>> Hi,
>>
>> I am importing data to a new cassandra cluster using sstableloader.
>> The sstableloader runs without any warning or error. But I am missing
>> around 1000 rows.
>>
>> Any feedback will be highly appreciated.
>>
>> Kind Regards,
>> Arindam Choudhury
>>
>>
>>
>>
>>
>>
>

>>>
>>
>


Re: Security labels

2016-01-29 Thread oleg yusim
Jack,

Thanks for your suggestion. I'm familiar with Cassandra documentation, and
I'm aware of differences between DSE and Cassandra.

Questions I ask here are those, I found no mention about in documentation.
Let's take security labels for instance. Cassandra documentation is
completely silent on this regard and so is Google. I assume, based on it,
Cassandra doesn't support it. But I can't create federal compliance
security document for Cassandra basing it of my assumptions and lack of
information solely. That is where my questions stem from.

Thanks,

Oleg

On Fri, Jan 29, 2016 at 10:17 AM, Jack Krupansky 
wrote:

> To answer any future questions along these same lines, I suggest that you
> start by simply searching the doc and search the github repo for the source
> code for the relevant keywords. That will give you the definitive answers
> quickly. If something is missing, feel free to propose that it be added (if
> you really need it). And feel free to confirm here if a quick search
> doesn't give you a solid answer.
>
> Here's the root page for security in the Cassandra doc:
>
> https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/secureTOC.html
>
> Also note that on questions of security, DataStax Enterprise may have
> different answers than pure open source Cassandra.
>
> -- Jack Krupansky
>
> On Thu, Jan 28, 2016 at 8:37 PM, oleg yusim  wrote:
>
>> Patrick,
>>
>> Absolutely. Security label is mechanism of access control, utilized by
>> MAC (mandatory access control) model, and not utilized by DAC
>> (discretionary access control) model, we all are used to. In database
>> content it is illustrated for instance here:
>> http://www.postgresql.org/docs/current/static/sql-security-label.html
>>
>> Now, as per my goals, I'm making a security assessment for Cassandra DB
>> with a goal to produce STIG on this product. That is one of the parameters
>> in database SRG I have to assess against.
>>
>> Thanks,
>>
>> Oleg
>>
>>
>> On Thu, Jan 28, 2016 at 6:32 PM, Patrick McFadin 
>> wrote:
>>
>>> Cassandra has support for authentication security, but I'm not familiar
>>> with a security label. Can you describe what you want to do?
>>>
>>> Patrick
>>>
>>> On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim  wrote:
>>>
 Greetings,

 Does Cassandra support security label concept? If so, where can I read
 on how it should be applied?

 Thanks,

 Oleg

>>>
>>>
>>
>


Security labels

2016-01-29 Thread Dani Traphagen
Hi Oleg,

I understand your frustration but unfortunately, in the terms of your
security assessment, you have fallen into a mismatch for Cassandra's
utility.

The eventuality of having multiple sockets open without the query input for
long durations of time isn't something that was
architected...because...Cassnadra was built to take massive quantities
of queries both in volume and velocity.

Your expectation of the database isn't in line with how our why it was
designed. Generally, security solutions are architected
around Cassandra, baked into the data model, many solutions
are home-brewed, written into the application or provided by using another
security client.

DSE has different security aspects rolling out in the next release
as addressed earlier by Jack, like commit log and hint encryptions, as well
as, unified authentication...but secuirty labels aren't on anyone's radar
as a pressing "need." It's not something I've heard about as a
priority before anyway.

Hope this helps!

Cheers,
Dani

On Friday, January 29, 2016, oleg yusim > wrote:

> Jack,
>
> Thanks for your suggestion. I'm familiar with Cassandra documentation, and
> I'm aware of differences between DSE and Cassandra.
>
> Questions I ask here are those, I found no mention about in documentation.
> Let's take security labels for instance. Cassandra documentation is
> completely silent on this regard and so is Google. I assume, based on it,
> Cassandra doesn't support it. But I can't create federal compliance
> security document for Cassandra basing it of my assumptions and lack of
> information solely. That is where my questions stem from.
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 10:17 AM, Jack Krupansky  > wrote:
>
>> To answer any future questions along these same lines, I suggest that you
>> start by simply searching the doc and search the github repo for the source
>> code for the relevant keywords. That will give you the definitive answers
>> quickly. If something is missing, feel free to propose that it be added (if
>> you really need it). And feel free to confirm here if a quick search
>> doesn't give you a solid answer.
>>
>> Here's the root page for security in the Cassandra doc:
>>
>> https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/secureTOC.html
>>
>> Also note that on questions of security, DataStax Enterprise may have
>> different answers than pure open source Cassandra.
>>
>> -- Jack Krupansky
>>
>> On Thu, Jan 28, 2016 at 8:37 PM, oleg yusim  wrote:
>>
>>> Patrick,
>>>
>>> Absolutely. Security label is mechanism of access control, utilized by
>>> MAC (mandatory access control) model, and not utilized by DAC
>>> (discretionary access control) model, we all are used to. In database
>>> content it is illustrated for instance here:
>>> http://www.postgresql.org/docs/current/static/sql-security-label.html
>>>
>>> Now, as per my goals, I'm making a security assessment for Cassandra DB
>>> with a goal to produce STIG on this product. That is one of the parameters
>>> in database SRG I have to assess against.
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>>
>>> On Thu, Jan 28, 2016 at 6:32 PM, Patrick McFadin 
>>> wrote:
>>>
 Cassandra has support for authentication security, but I'm not familiar
 with a security label. Can you describe what you want to do?

 Patrick

 On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim 
 wrote:

> Greetings,
>
> Does Cassandra support security label concept? If so, where can I read
> on how it should be applied?
>
> Thanks,
>
> Oleg
>


>>>
>>
>

-- 
Sent from mobile -- apologizes for brevity or errors.


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
I agree that there should be more clear doc on exactly how the estimation
is calculated. When I inquired about this recently the response was that it
should be within about 2% of the actual key count. I started looking at the
code, but I ran out of time before I chased down all the subsidiary factors
in the calculation.

It would be nice to have an explicit nodetool option to count actual keys.
Presumably that would be more efficient than a select count(*).


-- Jack Krupansky

On Fri, Jan 29, 2016 at 11:27 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> Why in cqlsh when I query "select count(*) from mordor.things_values_meta
> ;" it says: 4692
>
> But in nodetool cfstats it says Number of keys (estimate): 4720?
>
> On 29 January 2016 at 16:25, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> I am counting the rows with "select count(*) from
>> mordor.things_values_meta;"
>>
>> I am doing one node cluster to one node cluster for testing.
>>
>> On 29 January 2016 at 16:20, Jack Krupansky 
>> wrote:
>>
>>> And how are you counting the rows? With a query? If, so, what is the
>>> query. Using nodetool cfstats (estimated) key count? Or... what?
>>>
>>> Are the tokens for the missing rows is the same range and a distinct
>>> range from the rest of the data in the original cluster?
>>>
>>> How many nodes in the original cluster?
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
 I will check the output of nodetool cfstats.

 Its from version 2.1.2 to version 2.1.9.

 On 29 January 2016 at 16:02, Jack Krupansky 
 wrote:

> Are these sstables from an existing Cassandra cluster or generated by
> a program?
>
> If the former, do a nodetool tablestats or cfstats to get the sstable
> count and compare it to both the number of sstables that the loader is
> reading from and the number that end up in the target cluster.
>
> What Cassandra version did the sstables come from and what version are
> you importing into?
>
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
> arindam.choudh...@ackstorm.com> wrote:
>
>> Hi Romain,
>>
>> The RF was set to 2.
>>
>> I changed it to one.
>>
>>  CREATE KEYSPACE mordor WITH replication = {'class' :
>> 'SimpleStrategy', 'replication_factor' : 1}  AND durable_writes = true;
>>
>> re-inserted the columns, still missing rows.
>>
>> Regards,
>> Arindam
>>
>> On 29 January 2016 at 15:14, Romain Hardouin 
>> wrote:
>>
>>> Hi,
>>>
>>> I assume a RF > 1. Right?
>>> What is the consistency level you used? cqlsh use ONE by default.
>>> Try:
>>> cqlsh> CONSISTENCY ALL
>>> And run your query again.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>> Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> a écrit :
>>>
>>>
>>> Hi Kai,
>>>
>>> The table schema is:
>>>
>>> CREATE TABLE mordor.things_values_meta (
>>> thing_id text,
>>> key text,
>>> bucket_timestamp timestamp,
>>> total_rows counter,
>>> PRIMARY KEY ((thing_id, key), bucket_timestamp)
>>> ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
>>> AND bloom_filter_fp_chance = 0.01
>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>> AND comment = ''
>>> AND compaction = {'min_threshold': '4', 'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32'}
>>> AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99.0PERCENTILE';
>>>
>>>
>>> I am just running "select count(*) from things_values_meta ;" to get
>>> the count.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 13:39, Kai Wang  wrote:
>>>
>>> Arindam,
>>>
>>> what's the table schema and what does your query to retrieve the
>>> rows look like?
>>>
>>> On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
>>> arindam.choudh...@ackstorm.com> wrote:
>>>
>>> Hi,
>>>
>>> I am importing data to a new cassandra cluster using sstableloader.
>>> The sstableloader runs without any warning or error. But I am missing
>>> around 1000 rows.
>>>
>>> Any feedback will 

Re: missing rows while importing data using sstable loader

2016-01-29 Thread Jack Krupansky
And how are you counting the rows? With a query? If, so, what is the query.
Using nodetool cfstats (estimated) key count? Or... what?

Are the tokens for the missing rows is the same range and a distinct range
from the rest of the data in the original cluster?

How many nodes in the original cluster?

-- Jack Krupansky

On Fri, Jan 29, 2016 at 10:12 AM, Arindam Choudhury <
arindam.choudh...@ackstorm.com> wrote:

> I will check the output of nodetool cfstats.
>
> Its from version 2.1.2 to version 2.1.9.
>
> On 29 January 2016 at 16:02, Jack Krupansky 
> wrote:
>
>> Are these sstables from an existing Cassandra cluster or generated by a
>> program?
>>
>> If the former, do a nodetool tablestats or cfstats to get the sstable
>> count and compare it to both the number of sstables that the loader is
>> reading from and the number that end up in the target cluster.
>>
>> What Cassandra version did the sstables come from and what version are
>> you importing into?
>>
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 9:34 AM, Arindam Choudhury <
>> arindam.choudh...@ackstorm.com> wrote:
>>
>>> Hi Romain,
>>>
>>> The RF was set to 2.
>>>
>>> I changed it to one.
>>>
>>>  CREATE KEYSPACE mordor WITH replication = {'class' : 'SimpleStrategy',
>>> 'replication_factor' : 1}  AND durable_writes = true;
>>>
>>> re-inserted the columns, still missing rows.
>>>
>>> Regards,
>>> Arindam
>>>
>>> On 29 January 2016 at 15:14, Romain Hardouin 
>>> wrote:
>>>
 Hi,

 I assume a RF > 1. Right?
 What is the consistency level you used? cqlsh use ONE by default.
 Try:
 cqlsh> CONSISTENCY ALL
 And run your query again.

 Best,
 Romain


 Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> a écrit :


 Hi Kai,

 The table schema is:

 CREATE TABLE mordor.things_values_meta (
 thing_id text,
 key text,
 bucket_timestamp timestamp,
 total_rows counter,
 PRIMARY KEY ((thing_id, key), bucket_timestamp)
 ) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class':
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
 'max_threshold': '32'}
 AND compression = {'sstable_compression':
 'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';


 I am just running "select count(*) from things_values_meta ;" to get
 the count.

 Regards,
 Arindam

 On 29 January 2016 at 13:39, Kai Wang  wrote:

 Arindam,

 what's the table schema and what does your query to retrieve the rows
 look like?

 On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury <
 arindam.choudh...@ackstorm.com> wrote:

 Hi,

 I am importing data to a new cassandra cluster using sstableloader. The
 sstableloader runs without any warning or error. But I am missing around
 1000 rows.

 Any feedback will be highly appreciated.

 Kind Regards,
 Arindam Choudhury






>>>
>>
>


Re: Security labels

2016-01-29 Thread Jack Krupansky
To answer any future questions along these same lines, I suggest that you
start by simply searching the doc and search the github repo for the source
code for the relevant keywords. That will give you the definitive answers
quickly. If something is missing, feel free to propose that it be added (if
you really need it). And feel free to confirm here if a quick search
doesn't give you a solid answer.

Here's the root page for security in the Cassandra doc:
https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/secureTOC.html

Also note that on questions of security, DataStax Enterprise may have
different answers than pure open source Cassandra.

-- Jack Krupansky

On Thu, Jan 28, 2016 at 8:37 PM, oleg yusim  wrote:

> Patrick,
>
> Absolutely. Security label is mechanism of access control, utilized by MAC
> (mandatory access control) model, and not utilized by DAC (discretionary
> access control) model, we all are used to. In database content it is
> illustrated for instance here:
> http://www.postgresql.org/docs/current/static/sql-security-label.html
>
> Now, as per my goals, I'm making a security assessment for Cassandra DB
> with a goal to produce STIG on this product. That is one of the parameters
> in database SRG I have to assess against.
>
> Thanks,
>
> Oleg
>
>
> On Thu, Jan 28, 2016 at 6:32 PM, Patrick McFadin 
> wrote:
>
>> Cassandra has support for authentication security, but I'm not familiar
>> with a security label. Can you describe what you want to do?
>>
>> Patrick
>>
>> On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim  wrote:
>>
>>> Greetings,
>>>
>>> Does Cassandra support security label concept? If so, where can I read
>>> on how it should be applied?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>
>>
>


Tuning chunk_length_kb in cassandra 2.1.12

2016-01-29 Thread Jean Carlo
Hi guys

I want to set the param chunk_length_kb in order to improve the read
latency of my cassandra_stress's test.

This is the table

CREATE TABLE "Keyspace1".standard1 (
key blob PRIMARY KEY,
"C0" blob,
"C1" blob,
"C2" blob,
"C3" blob,
"C4" blob
) WITH bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'sstable_size_in_mb': '160', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

I have 6 columns of type blob. This table is filled by cassandra_stres

admin@cqlsh:Keyspace1> select * from standard1 limit 2;

 key|
C0 |
C1 |
C2 |
C3 | C4
+++++
 0x4b343050393536353531 |
0xe0e3d68ed1536e4d994aa74860270ac91cf7941acb5eefd925815481298f0d558d4f |
0xa43f78202576f1ccbdf50657792fac06f0ca7c9416ee68a08125c8dce4dfd085131d |
0xab12b06bf64c73e708d1b96fea9badc678303906e3d5f5f96fae7d8092ee0df0c54c |
0x428a157cb598487a1b938bdb6c45b09fad3b6408fddc290a6b332b91426b00ddaeb2 |
0x0583038d881ab25be72155bc3aa5cb9ec3aab8e795601abe63a2b35f48ce1e359f5e

I am having a read latency of  ~500 microseconds, I think it takes to much
time comparing to the write latency of ~30 microseconds.

My first clue is to fix the  chunk_length_kb to a value close to the size
of the rows in kb

Am I in the right direction? If it is true, how can I compute the size of a
row?

Other question, the value of "Compacted partition" of the command nodetool
cfstats migth give me a value close to the chunk_length_kb ?

Best regards

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


Re: Session timeout

2016-01-29 Thread oleg yusim
Jon,

I suspected something like that. I did a bit of learning on Cassandra
before starting my assessment, and I understand that you are right, and it
is generally not used like that.

However (taking off my developer hat and putting on my security architect
hat), from the security point of view the way Cassandra is used now is not
very secure. For instance, way AAA (authentication, authorization, audit)
is done, doesn't allow for centralized account and access control
management, which in reality translates into shared accounts and no
hierarchy. That in turn translates into situation when one person
compromising credentials means complete disaster - administrative access to
DB was just given up, with all the consequences. To top it all logging
currently implemented in horrible manner too. It doesn't even allow to log
username - basic requirement for any product, which would allow DBA or ISSO
to figure out who did what on DB and recover in case of attack or crash. In
general, logs the way they are today are targeted toward developer, making
changes in DB, not toward the DBA, using it, and doesn't make much sense in
my opinion.

Now if you are interested in that subject, that document:
http://iasecontent.disa.mil/stigs/zip/Jan2016/U_Database_V2R3_SRG.zip
covers security concerns which should be taken in the account, when we are
designing database. It also explains why each of them is important and what
exactly would happen if it would be neglected.

Jon, I would also appreciate suggestion. What I do right now is called
"writing a STIG".That is when somebody takes concepts from SRG (the
document I gave you link to above) and figures out how those are applied to
that particular product. What is met (and what configuration on product
leads to it, exactly), what is not met, but can be with little enhancement
(and again - what those would be exactly), and what is not met and can't be
met at current design. All that is combined into one document, called STIG
and published by government (DISA) on
http://iase.disa.mil/stigs/Pages/a-z.aspx page. Those STIGs mean a great
deal from the security point of view because they:

   - Allow to save a lot of time on re-assessment of the product every
   single time
   - Allow to know what are the products limitations are from the security
   point of view before hands (and as such, place it right on the system,
   implementing all right compensation controls around it)
   - Allow to automate, both configuration checks from the security point
   of view and hardening of the product
   - Give product pass to DoD framework because if product has STIG and was
   configured in accordance to it, it is secure by DoD definition

So overall, it is to the great benefit for the product to have STIG written
for it, since it advances it on security market quite a bit and at the end
- improves product's security posture quite a bit as well. My initial idea
was that I would bring on board my knowledge of security concepts, and when
I would lack understanding of intricate details of DB, I would turn to the
Cassandra community for support.

So far it doesn't work quite well, and from what you are saying, it
wouldn't, because of lack of knowledge and lack of motivation to get it.
What would be your suggestion? Who is capable of answering my questions? Is
there another community, I should turn to?

Would really appreciate your input on that,

Thanks,

Oleg





On Fri, Jan 29, 2016 at 10:24 AM, Jonathan Haddad  wrote:

> I think the reason why most of your queries aren't being answered is
> because you're asking questions that most people don't have the answer to.
> On the automatic disconnect, anyone using Cassandra in prod doesn't really
> need to think about it because we're always running queries, perhaps
> millions a second.  Queries are multiplexed over a single connection.
> Almost nobody ever actually runs into a case of leaving a socket open for
> hours without a query, so to find out if it actually happens, someone would
> have to look it up in the source.
>
> Your questions about auditing are geared more towards if you're using a
> database that's built for multi user access.  Cassandra was built to solve
> a very different problem.  In most cases, you don't have hundreds of people
> connecting from a shell, leaving connections open, casually querying for BI
> reports.  This isn't how *most* people use Cassandra, it wasn't really
> built for that.  There's better support for users & roles nowadays but it's
> relatively new and that's about all you have right now.
>
> I realize you're new to the community, and it can be frustrating to not
> get answers to questions that seem completely basic and obvious, but you're
> asking about areas that *most* people on this list don't have knowledge
> about and zero motivation to learn, because it's not necessary to solve the
> problems we face.
>
>
> On Fri, Jan 29, 2016 at 6:19 AM oleg yusim  wrote:
>

Re: Security labels

2016-01-29 Thread oleg yusim
Dani,

I really appreciate you response. Actually, session timeouts and security
labels are two different topics (first is about attack when somebody
opened, say, ssh window to DB, left his machine unattended and somebody
else stole his session, second - to enable DB to support what called MAC
access model - stays for mandatory access control. It is widely used in the
government and military, but not outside of it, we all are used to DAC
access control model). However, I think you are right and I should move all
my queries under the one big roof and call this thread "Security". I will
do this today.

Now, about what you have said, I just answered the same to Jon, in Session
Timeout thread, but would quickly re-cap here. I understand that
Cassandra's architecture was aimed and tailored for completely different
type of scenario. However, unfortunately, that doesn't mean that Cassandra
is not vulnerable to the same very set of attacks relational database would
be vulnerable to. It just means Cassandra is not protected against those
attacks, because protection against them was not thought of, when database
was created. I already gave the AAA and session's timeout example in Jon's
thread, and those are just one of many.

Now what I'm trying to do, I'm trying to create a STIG - security federal
compliance document, which will assess Cassandra against SRG concepts
(security federal compliance recommendations for databases overall) and
will highlight what is not met, and can't be in current design (i.e. what
system architects should keep in mind and what they need to compensate for
with other controls on different layers of system model) and  what can be
met either with configuration or with little enhancement (and how).

That document would be of great help for Cassandra as a product because it
would allow it to be marketed as a product with existing security
assessment and guidelines, performed according to DoD standards. It would
also allow to move product in the general direction of improving its
security posture. Finally, the document would be posted on DISA site (
http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
architect to utilize, which would greatly reduce the risk for Cassandra
product to be hacked in a field.

To clear things out - what I ask about are not my expectations. I really do
not expect developers of Cassandra to run and start implementing security
labels, just because I asked about it. :) My questions are to build my
internal knowledge of DB current design, so that I can build my security
assessment based of it, not more, not less.

I guess, summarizing what I said on top, from what I'm doing Cassandra as a
product would end up benefiting quite a bit. That is why I think it would
make sense for Cassandra community to help me with my questions even if
they sound completely of the traditional "grid".

Thanks again, I really appreciate your response and conversation overall.

Oleg

On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
dani.trapha...@datastax.com> wrote:

> Also -- it looks like you're really asking questions about session
> timeouts and security labels as they associate, would be more helpful to
> keep in one thread. :)
>
>
> On Friday, January 29, 2016, Dani Traphagen 
> wrote:
>
>> Hi Oleg,
>>
>> I understand your frustration but unfortunately, in the terms of your
>> security assessment, you have fallen into a mismatch for Cassandra's
>> utility.
>>
>> The eventuality of having multiple sockets open without the query input
>> for long durations of time isn't something that was
>> architected...because...Cassnadra was built to take massive quantities
>> of queries both in volume and velocity.
>>
>> Your expectation of the database isn't in line with how our why it was
>> designed. Generally, security solutions are architected
>> around Cassandra, baked into the data model, many solutions
>> are home-brewed, written into the application or provided by using another
>> security client.
>>
>> DSE has different security aspects rolling out in the next release
>> as addressed earlier by Jack, like commit log and hint encryptions, as well
>> as, unified authentication...but secuirty labels aren't on anyone's radar
>> as a pressing "need." It's not something I've heard about as a
>> priority before anyway.
>>
>> Hope this helps!
>>
>> Cheers,
>> Dani
>>
>> On Friday, January 29, 2016, oleg yusim  wrote:
>>
>>> Jack,
>>>
>>> Thanks for your suggestion. I'm familiar with Cassandra documentation,
>>> and I'm aware of differences between DSE and Cassandra.
>>>
>>> Questions I ask here are those, I found no mention about in
>>> documentation. Let's take security labels for instance. Cassandra
>>> documentation is completely silent on this regard and so is Google. I
>>> assume, based on it, Cassandra doesn't support it. But I can't create
>>> federal compliance security document for Cassandra basing it of my
>>> 

Re: Security labels

2016-01-29 Thread Dani Traphagen
Also -- it looks like you're really asking questions about session timeouts
and security labels as they associate, would be more helpful to keep in one
thread. :)

On Friday, January 29, 2016, Dani Traphagen 
wrote:

> Hi Oleg,
>
> I understand your frustration but unfortunately, in the terms of your
> security assessment, you have fallen into a mismatch for Cassandra's
> utility.
>
> The eventuality of having multiple sockets open without the query input
> for long durations of time isn't something that was
> architected...because...Cassnadra was built to take massive quantities
> of queries both in volume and velocity.
>
> Your expectation of the database isn't in line with how our why it was
> designed. Generally, security solutions are architected
> around Cassandra, baked into the data model, many solutions
> are home-brewed, written into the application or provided by using another
> security client.
>
> DSE has different security aspects rolling out in the next release
> as addressed earlier by Jack, like commit log and hint encryptions, as well
> as, unified authentication...but secuirty labels aren't on anyone's radar
> as a pressing "need." It's not something I've heard about as a
> priority before anyway.
>
> Hope this helps!
>
> Cheers,
> Dani
>
> On Friday, January 29, 2016, oleg yusim  wrote:
>
>> Jack,
>>
>> Thanks for your suggestion. I'm familiar with Cassandra documentation,
>> and I'm aware of differences between DSE and Cassandra.
>>
>> Questions I ask here are those, I found no mention about in
>> documentation. Let's take security labels for instance. Cassandra
>> documentation is completely silent on this regard and so is Google. I
>> assume, based on it, Cassandra doesn't support it. But I can't create
>> federal compliance security document for Cassandra basing it of my
>> assumptions and lack of information solely. That is where my questions stem
>> from.
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 10:17 AM, Jack Krupansky <
>> jack.krupan...@gmail.com> wrote:
>>
>>> To answer any future questions along these same lines, I suggest that
>>> you start by simply searching the doc and search the github repo for the
>>> source code for the relevant keywords. That will give you the definitive
>>> answers quickly. If something is missing, feel free to propose that it be
>>> added (if you really need it). And feel free to confirm here if a quick
>>> search doesn't give you a solid answer.
>>>
>>> Here's the root page for security in the Cassandra doc:
>>>
>>> https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/secureTOC.html
>>>
>>> Also note that on questions of security, DataStax Enterprise may have
>>> different answers than pure open source Cassandra.
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Jan 28, 2016 at 8:37 PM, oleg yusim  wrote:
>>>
 Patrick,

 Absolutely. Security label is mechanism of access control, utilized by
 MAC (mandatory access control) model, and not utilized by DAC
 (discretionary access control) model, we all are used to. In database
 content it is illustrated for instance here:
 http://www.postgresql.org/docs/current/static/sql-security-label.html

 Now, as per my goals, I'm making a security assessment for Cassandra DB
 with a goal to produce STIG on this product. That is one of the parameters
 in database SRG I have to assess against.

 Thanks,

 Oleg


 On Thu, Jan 28, 2016 at 6:32 PM, Patrick McFadin 
 wrote:

> Cassandra has support for authentication security, but I'm not
> familiar with a security label. Can you describe what you want to do?
>
> Patrick
>
> On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim 
> wrote:
>
>> Greetings,
>>
>> Does Cassandra support security label concept? If so, where can I
>> read on how it should be applied?
>>
>> Thanks,
>>
>> Oleg
>>
>
>

>>>
>>
>
> --
> Sent from mobile -- apologizes for brevity or errors.
>


-- 
Sent from mobile -- apologizes for brevity or errors.


Re: Questions about the replicas selection and remote coordinator

2016-01-29 Thread Steve Robenalt
Hi Jun,

The 2 diagrams you are comparing come from versions of Cassandra that are
significantly different - 1.2 in the first case and 2.1 in the second case,
so it's not surprising that there are differences. since you haven't
qualified your question with the Cassandra version you are asking about, I
would assume that the 2.1 example is more representative of what you would
be likely to see. In any case, it's best to use a consistent version for
your documentation because Cassandra changes quite rapidly with many of the
releases.

As far as choosing the coordinator node, I don't think there's a way to
force it, nor would it be a good idea to do so. In order to make a
reasonable selection of coordinators, you would need a lot of internal
knowledge about load on the nodes in the cluster and you'd need to also
handle certain classes of failures and retries, so you would end up
duplicating what is already being done for you internally.

Steve


On Fri, Jan 29, 2016 at 9:11 AM, Jun Wu  wrote:

> Hi there,
>
> I have some questions about the replicas selection.
>
> Let's say that we have 2 data centers: DC1 and DC2, the figure also be
> got from link here:
> https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png.
>  There're
> 10 nodes in each data center. We set the replication factor to be 3 and 3
> in each data center, which means there'll be 3 and 3 replicas in each data
> center.
>
> (1) My first question is how to choose which 3 nodes to write data to,
> in the link above, the 3 replicas are node 1, 2, 7. But, is there any
> mechanism to select these 3?
>
> (2) Another question is about the remote coordinator, the previous
> figure shows that node 10 in DC1 will write data to node 10  in DC 2, then
> node 10 in DC2 will write 3 copies to 3 nodes in DC2.
>
> But, another figure from datastax shows different method, the figure
> can be found here,
> https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html.
>  It
> shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2,
> without using remote coordinator.
>
> I'm wondering which case is true, because in multiple data center, the
> time duration for these two methods varies a lot.
>
> Also, is there any mechanism to select which node to be remote
> coordinator?
>
> Thanks!
>
> Jun
>



-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication


RE: Questions about the replicas selection and remote coordinator

2016-01-29 Thread Jun Wu
Hi Steve,
   Thank you so much for your reply. 
   Yes, you're right, I'm using the version of 2.1. So based on this, I think 
I'm outdated. 
However, this comes to another interesting question: why we change this 
part from version 1 to version 2. As we can see that in version 1, there's 
connections from node 10 in DC 1 with node 10 in DC 2, then node 10 in DC 2 
send 3 copies to 3 nodes in DC 2, which should be more time-saving than version 
2.1, which send data from node 10 in DC 1 to 3 nodes in DC 2 directly.
 Also, is there any information on how to choose the replicas. Like here 
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html
Why we choose node 1, 3, 6 as replicas and 4, 8, 11 as another 3 replicas?
Also, is node 11 working as remote coordinator here? Or is the concept of 
remote coordinator really existed, as the figure shows, we even don't need the 
remote coordinator. 
Thanks!
Jun

Date: Fri, 29 Jan 2016 09:55:58 -0800
Subject: Re: Questions about the replicas selection and remote coordinator
From: sroben...@highwire.org
To: user@cassandra.apache.org

Hi Jun,
The 2 diagrams you are comparing come from versions of Cassandra that are 
significantly different - 1.2 in the first case and 2.1 in the second case, so 
it's not surprising that there are differences. since you haven't qualified 
your question with the Cassandra version you are asking about, I would assume 
that the 2.1 example is more representative of what you would be likely to see. 
In any case, it's best to use a consistent version for your documentation 
because Cassandra changes quite rapidly with many of the releases.
As far as choosing the coordinator node, I don't think there's a way to force 
it, nor would it be a good idea to do so. In order to make a reasonable 
selection of coordinators, you would need a lot of internal knowledge about 
load on the nodes in the cluster and you'd need to also handle certain classes 
of failures and retries, so you would end up duplicating what is already being 
done for you internally.
Steve

On Fri, Jan 29, 2016 at 9:11 AM, Jun Wu  wrote:



Hi there,
I have some questions about the replicas selection. 
Let's say that we have 2 data centers: DC1 and DC2, the figure also be got 
from link here: 
https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png.
 There're 10 nodes in each data center. We set the replication factor to be 3 
and 3 in each data center, which means there'll be 3 and 3 replicas in each 
data center.
(1) My first question is how to choose which 3 nodes to write data to, in 
the link above, the 3 replicas are node 1, 2, 7. But, is there any mechanism to 
select these 3?
(2) Another question is about the remote coordinator, the previous figure 
shows that node 10 in DC1 will write data to node 10  in DC 2, then node 10 in 
DC2 will write 3 copies to 3 nodes in DC2.
But, another figure from datastax shows different method, the figure can be 
found here, 
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html.
 It shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2, 
without using remote coordinator.
I'm wondering which case is true, because in multiple data center, the time 
duration for these two methods varies a lot.
Also, is there any mechanism to select which node to be remote coordinator?
Thanks!
Jun 
  


-- 
Steve Robenalt Software architectsroben...@highwire.org (office/cell): 
916-505-1785
HighWire Press, Inc.425 Broadway St, Redwood City, CA 94063www.highwire.org
Technology for Scholarly Communication

  

Questions about the replicas selection and remote coordinator

2016-01-29 Thread Jun Wu
Hi there,
I have some questions about the replicas selection. 
Let's say that we have 2 data centers: DC1 and DC2, the figure also be got 
from link here: 
https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png.
 There're 10 nodes in each data center. We set the replication factor to be 3 
and 3 in each data center, which means there'll be 3 and 3 replicas in each 
data center.
(1) My first question is how to choose which 3 nodes to write data to, in 
the link above, the 3 replicas are node 1, 2, 7. But, is there any mechanism to 
select these 3?
(2) Another question is about the remote coordinator, the previous figure 
shows that node 10 in DC1 will write data to node 10  in DC 2, then node 10 in 
DC2 will write 3 copies to 3 nodes in DC2.
But, another figure from datastax shows different method, the figure can be 
found here, 
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html.
 It shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2, 
without using remote coordinator.
I'm wondering which case is true, because in multiple data center, the time 
duration for these two methods varies a lot.
Also, is there any mechanism to select which node to be remote coordinator?
Thanks!
Jun 
  

Cassandra driver class

2016-01-29 Thread KAMM, BILL
I'm just getting started with Cassandra, and am trying to integrate it with 
JBoss.  I'm configuring the standalone-ha-full.xml file, but don't know what to 
use for the driver class.  For example, I have this:



com.datastax.driver.core.



What do I replace "" with?

Is "com.datastax.driver.core" even correct, or am I going down the wrong path?  
I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.

Should I be using  instead of ?

Does anybody have a working example they can share?  Any help to get me going 
would be appreciated.  Thanks.

Bill




Re: Session timeout

2016-01-29 Thread Jeff Jirsa

> For instance, way AAA (authentication, authorization, audit) is done, doesn't 
> allow for centralized account and access control management, which in reality 
> translates into shared accounts and no hierarchy. 

Authentication and Authorization are both pluggable. Any organization can write 
their own, and tie it to any AAA system they currently have. If they were 
feeling generous, they could open source it for the community, and perhaps 
bring it upstream. There’s nothing fundamentally preventing your organization 
from writing an Authenticator ( 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/IAuthenticator.java
 ) or Authorizor ( 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/IAuthorizer.java
 ) if they were so inclined.

Audit is something that’s being actively discussed ( 
https://issues.apache.org/jira/browse/CASSANDRA-8844 ).

It’s an open source project with a very small number of commercial vendors. In 
general, that means there are 3 options:
Wait for someone else to write it to fit their need, and hopefully they open 
source it. 
Write it yourself
Pay a vendor (such as Datastax), and let them know in advance it’s a 
requirement to get it on their roadmap. This is really #2 with some polish to 
make it easier to get through your legal/AP systems.
>So far it doesn't work quite well, and from what you are saying, it wouldn't, 
>because of lack of knowledge and lack of motivation to get it. What would be 
>your suggestion? Who is capable of answering my questions? Is there another 
>community, I should turn to?

The cassandra-user and cassandra-dev mailing lists are the primary sources of 
knowledge outside of support contracts. For paid support, companies like 
Datastax and The Last Pickle tend to be well respected options. Both of those 
companies will probably answer some of your questions for free if you post on 
these mailing lists. They’ll likely answer even more if you pay them.



From:  oleg yusim
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, January 29, 2016 at 9:16 AM
To:  "user@cassandra.apache.org"
Subject:  Re: Session timeout

Jon, 

I suspected something like that. I did a bit of learning on Cassandra before 
starting my assessment, and I understand that you are right, and it is 
generally not used like that. 

However (taking off my developer hat and putting on my security architect hat), 
from the security point of view the way Cassandra is used now is not very 
secure. For instance, way AAA (authentication, authorization, audit) is done, 
doesn't allow for centralized account and access control management, which in 
reality translates into shared accounts and no hierarchy. That in turn 
translates into situation when one person compromising credentials means 
complete disaster - administrative access to DB was just given up, with all the 
consequences. To top it all logging currently implemented in horrible manner 
too. It doesn't even allow to log username - basic requirement for any product, 
which would allow DBA or ISSO to figure out who did what on DB and recover in 
case of attack or crash. In general, logs the way they are today are targeted 
toward developer, making changes in DB, not toward the DBA, using it, and 
doesn't make much sense in my opinion.

Now if you are interested in that subject, that document: 
http://iasecontent.disa.mil/stigs/zip/Jan2016/U_Database_V2R3_SRG.zip covers 
security concerns which should be taken in the account, when we are designing 
database. It also explains why each of them is important and what exactly would 
happen if it would be neglected.

Jon, I would also appreciate suggestion. What I do right now is called "writing 
a STIG".That is when somebody takes concepts from SRG (the document I gave you 
link to above) and figures out how those are applied to that particular 
product. What is met (and what configuration on product leads to it, exactly), 
what is not met, but can be with little enhancement (and again - what those 
would be exactly), and what is not met and can't be met at current design. All 
that is combined into one document, called STIG and published by government 
(DISA) on http://iase.disa.mil/stigs/Pages/a-z.aspx page. Those STIGs mean a 
great deal from the security point of view because they:
Allow to save a lot of time on re-assessment of the product every single time
Allow to know what are the products limitations are from the security point of 
view before hands (and as such, place it right on the system, implementing all 
right compensation controls around it)
Allow to automate, both configuration checks from the security point of view 
and hardening of the product
Give product pass to DoD framework because if product has STIG and was 
configured in accordance to it, it is secure by DoD definition
So overall, it is to the great benefit for the product to have STIG written for 
it, since it advances it on security market quite a 

Re: Cassandra driver class

2016-01-29 Thread Jack Krupansky
>From the little reading I did do about TEIID it sounded as if they do have
a connector that uses the Cassandra Java Driver, which is of course a good
thing. But that doesn't make their connector itself a topic for the Java
Driver list. I mean, the folks on the Java List are no more likely to know
about TEIID configuration (the original inquiry of this thread) than folks
here. Sure, no harm in asking, but seems like a wild goose chase to me.

Here are the mailing lists for TEIID:
http://teiid.jboss.org/mailinglists/

Here's the TEID project on Jira for Cassandra topics:
https://issues.jboss.org/browse/TEIID-3239?jql=project%20%3D%20TEIID%20AND%20text%20~%20%22cassandra%22

And just for completeness, here's the TEIID Cassandra Connector link that
was offered earlier by Corry, even though it is rather sparse and doesn't
directly address the original inquiry:
https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources

In short, best bet is to ping the TEIID user list:
https://lists.jboss.org/mailman/listinfo/teiid-users


-- Jack Krupansky

On Fri, Jan 29, 2016 at 5:38 PM, Corry Opdenakker  wrote:

> Fully correct Steve, it is a source of confusion an having a standard pool
> at app/driver level will work as good as the jee solution, but just like
> the cql foresees an easy developer entry for cassandra because it is
> similar to sql, the jee datasource could do the same in front of the
> middleware audience.
> When I currently would like to convince my customers to give cassandra a
> try, then Im sure that several tech decission makers will say that  the
> lack of a standard jee datasource impl is strange and will raise some
> concerns and influence the first perception and landing of the product.
> This while in one way it makes sence that there is no standard jee
> datasource solution present.
>
> Seems to be indeed a topic for the java driver.
>
> Op vrijdag 29 januari 2016 heeft Steve Robenalt 
> het volgende geschreven:
>
>> It's probably a source of some confusion that in the JEE world, the
>> driver isn't pooled, but the data source is. Since the Java Driver for
>> Cassandra includes the pooling, there's no need for a JEE data source on
>> top of it. This also means that the Java Driver for Cassandra isn't a
>> one-for-one exchange with a JDBC driver. I'm not sure if this same level of
>> confusion occurs with other language drivers for Cassandra.
>>
>> BTW, as Alex suggested earlier in the thread, this discussion should
>> probably be moved to the Java Driver mailing list.
>>
>> Steve
>>
>> On Fri, Jan 29, 2016 at 1:02 PM, Jack Krupansky > > wrote:
>>
>>> Unfortunately, somebody is likely going to need to educate us in the
>>> Cassandra community as to what a JBOSS VDB and TEIID really are. For now,
>>> our response will probably end up being that you should use the Java Driver
>>> for Cassandra, bypassing any JBOSS/VDB/TEIID support, for now. That TEIID
>>> link above may shed some light. Otherwise, you'll probably have to ping the
>>> TEIID community as far as how to configure JBOSS/TEIID. We're here to
>>> answer questions about Cassandra itself.
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 3:42 PM, Corry Opdenakker 
>>> wrote:
>>>
 What about this cassandra specific howto explained in a recent jboss
 doc?

 https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources?_sscc=t

 Im also searching for the real recommended way of connecting to a
 cassandra db from a jee server, but I didnt found any standard documented
 solution yet. was a bit surprised that there is not any standard
 jca/resource archive solution foreseen while Cassandra itself is java
 based. Maybe I overlooked the info somewhere?

 Dbcp could help for a large part, but of course one requires a fully
 reliable production ready solution.
 https://commons.apache.org/proper/commons-dbcp/

 Currently I would go for a standard conn pool at app level as is
 described in the cassandra java driver pdf, knowing that middleware
 admins don't like that nonstandard jee approach.






 Op vrijdag 29 januari 2016 heeft Alex Popescu  het
 volgende geschreven:

> I think both of those options expect a JDBC driver, while the DataStax
> Java driver is not one.
>
> As a side note, if you'd provide a more detailed description of the
> setup you want to get and post it to the Java driver mailing list
> https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
> chances of getting an answer will be higher.
>
> On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  wrote:
>
>> I’m just getting started with Cassandra, and am trying to integrate
>> it with JBoss.  I’m configuring the standalone-ha-full.xml file, but 
>> don’t
>> know 

Re: Session timeout

2016-01-29 Thread Bryan Cheng
To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a
well-funded and fairly large company. You are certainly free to continue
using the list and asking for community support (I am definitely not in any
position to tell you otherwise, anyway), but that community support is by
definition ad-hoc and best effort. Furthermore, your questions range from
trivial to, as Jonathan as mentioned earlier, concepts that many of us have
no reason to consider at this time (perhaps your work will convince us
otherwise- but you'll need to finish it first ;) )

What I'm getting at here is that perhaps, if you need faster, deeper level,
and more elaborate support than this list can provide, you should look into
the services of a paid Cassandra support company like Datastax.

On Fri, Jan 29, 2016 at 3:34 PM, Robert Coli  wrote:

> On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky 
> wrote:
>
>> One last time, I'll simply renew my objection to the way you are abusing
>> this list.
>>
>
> FWIW, while I appreciate that OP (Oleg) is attempting to do a service for
> the community, I agree that the flood of single topic, context-lacking
> posts regarding deep internals of Cassandra is likely to inspire the
> opposite of a helpful response.
>
> This is important work, however, so hopefully we can collectively find a
> way through the meta and can discuss this topic without acrimony! :D
>
> =Rob
>
>


Re: EC2 storage options for C*

2016-01-29 Thread Eric Plowe
Bryan,

Correct, I should have clarified that. I'm evaluating instance types based
on one SSD or two in RAID 0. I thinking its going to be two in RAID 0, but
as I've had no experience running a production C* cluster in EC2, I wanted
to reach out to the list.

Sorry for the half-baked question :)

Eric

On Friday, January 29, 2016, Bryan Cheng  wrote:

> Do you have any idea what kind of disk performance you need?
>
> Cassandra with RAID 0 is a fairly common configuration (Al's awesome
> tuning guide has a blurb on it
> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so if
> you feel comfortable with the operational overhead it seems like a solid
> choice.
>
> To clarify, though,  by "just one", do you mean just using one of two
> available ephemeral disks available to the instance, or are you evaluating
> different instance types based on one disk vs two?
>
> On Fri, Jan 29, 2016 at 4:33 PM, Eric Plowe  > wrote:
>
>> My company is planning on rolling out a C* cluster in EC2. We are
>> thinking about going with ephemeral SSDs. The question is this: Should we
>> put two in RAID 0 or just go with one? We currently run a cluster in our
>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with
>> the performance we are seeing thus far.
>>
>> Thanks!
>>
>> Eric
>>
>
>


Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Corry Opdenakker
@JC, Get the pid of your target java process (something like "ps -ef | grep
-i cassandra") .
Then do a kill -3  (at unix/linux)
Check the stdout logfile of the process.
 it should contain the threaddump.
If you found it, then great!
Let that kill -3 loop for about 2 or 3 minutes.
Herafter copy paste and load the stdout file into one if the mentioned
tools.
If you are not familiar with the java internals, then those threaddumps
will learn you a lot:)



Op vrijdag 29 januari 2016 heeft Jean Carlo  het
volgende geschreven:

> I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I
> am not good on jvm so I would like to know how to do what @CorryOpdenakker
> propose with cassandra.
>
> :)
>
> I check concurrent_compactors
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker  > wrote:
>
>> Hi guys,
>> Cassandra is still new for me, but I have a lot of java tuning experience.
>>
>> For root cause detection of performance degradations its always good to
>> start with collecting a series of java thread dumps. Take at problem
>> occurrence using a loopscript for example 60 thread dumps with an interval
>> of 1 or 2 seconds.
>> Then load those dumps into IBM thread dump analyzer or in "eclipse mat"
>> or any similar tool and see which methods appear to be most active or
>> blocking others.
>>
>> Its really very useful
>>
>> Same can be be done in a normal situation to compare the difference.
>>
>> That should give more insights.
>>
>> Cheers, Corry
>>
>>
>> Op vrijdag 29 januari 2016 heeft Peddi, Praveen > > het volgende
>> geschreven:
>>
>>> Hello,
>>> We have another update on performance on 2.1.11. compression_chunk_size
>>>  didn’t really help much but We changed concurrent_compactors from default
>>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>>> metric that could affect read latencies is that 2.1.11 is running
>>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>>> 2.0.9 even though our read_repair_chance is same on both. Could anyone
>>> shed some light on why 2.1.11 could be running read repair 10 to 50 times
>>> more in spite of same configuration on both clusters?
>>>
>>> dclocal_read_repair_chance=0.10 AND
>>> read_repair_chance=0.00 AND
>>>
>>> Here is the table for read repair metrics for both clusters.
>>> 2.0.9 2.1.11
>>> ReadRepairedBackground 5MinAvg 0.006 0.1
>>> 15MinAvg 0.009 0.153
>>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>>> 15MinAvg 0.007 0.91
>>>
>>> Thanks
>>> Praveen
>>>
>>> From: Jeff Jirsa 
>>> Reply-To: 
>>> Date: Thursday, January 14, 2016 at 2:58 PM
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> Sorry I wasn’t as explicit as I should have been
>>>
>>> The same buffer size is used by compressed reads as well, but tuned with
>>> compression_chunk_size table property. It’s likely true that if you lower
>>> compression_chunk_size, you’ll see improved read performance.
>>>
>>> This was covered in the AWS re:Invent youtube link I sent in my original
>>> reply.
>>>
>>>
>>>
>>> From: "Peddi, Praveen"
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Thursday, January 14, 2016 at 11:36 AM
>>> To: "user@cassandra.apache.org", Zhiyan Shao
>>> Cc: "Agrawal, Pratik"
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> Hi,
>>> We will try with reduced “rar_buffer_size” to 4KB. However
>>> CASSANDRA-10249  says
>>> "this only affects users who have 1. disabled compression, 2. switched to
>>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>>> default disk_access_mode which should be mmap. We also used
>>> LZ4Compressor when created table.
>>>
>>> We will let you know if this property had any effect. We were testing
>>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>>> latest version.
>>>
>>> Praveen
>>>
>>>
>>>
>>> From: Jeff Jirsa 
>>> Reply-To: 
>>> Date: Thursday, January 14, 2016 at 1:29 PM
>>> To: Zhiyan Shao , "user@cassandra.apache.org" <
>>> user@cassandra.apache.org>
>>> Cc: "Agrawal, Pratik" 
>>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>>
>>> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>>>  / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or
>>> not this is really the case depends on how much of your data is in page
>>> 

Re: Session timeout

2016-01-29 Thread Jack Krupansky
No offense, but my suggestion here is that you write up a preliminary list
of your own answers based on your own reading of the doc, specs, and white
papers (and source code) and post that list, like on Google Docs, for
people to review in bulk, rather than force all Cassandra users on this
list to participate in a full security review one item at a time. To
reiterate, you should be treating the doc as the definitive guide to what
is supported - given the importance that the Cassandra and DSE developers
placed on security features over the past couple of years, it really is
truly safe to say that if it isn't in the doc then it is definitively not
supported. Yes, it would be good to review your final list as a courtesy
check, but asking us to confirm what appears to be obvious (i.e., it is not
in the doc) seems more than a bit excessive to me.

If there is any true confusion in the doc, of course let us know (or email
to d...@datastax.com), but there is no need for us to confirm that you did
not find something in the doc.

-- Jack Krupansky

On Fri, Jan 29, 2016 at 5:02 PM, oleg yusim  wrote:

> Jack,
>
> Appreciate the links. As I mentioned, I looked over both DSE and Cassandra
> sets of documentation, and ran some experiments on my Cassandra
> installation. What I'm bringing here is something I couldn't find
> definitive answer for in any of the above-mentioned sources.
>
> For instance, regarding logging, here are questions I have:
>
> 1)  Identity-based logging (we investigated it in another thread and I got
> "not supported" as an answer)
> 2)  Logging source and destinations (server and client IP)
> 3)  Logging connections and disconnections - same
> 4)  Logging hostname
> 5)  Ability to automatically shut down in case if it ran out of space to
> store logs?
> 6)  Ability to automatically overwrite audit logs in case if no more space
> is available (oldest first) ?
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 3:47 PM, Jack Krupansky 
> wrote:
>
>> There is some more detail on DSE Security in this white paper:
>>
>> http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-SOX-Compliance.pdf
>>
>> It mentions auditing, for example. I think you were asking abut that
>> earlier.
>>
>> There may be some additional info or discussion related to security on
>> these main web site pages:
>> http://www.datastax.com/products/datastax-enterprise-security
>>
>> Security was given a reasonably high priority for DSE in releases 3.0 and
>> beyond, so that if something is not highlighted in those promotional
>> materials, then it probably isn't in the software.
>>
>> In general, if you see a feature in DSE, just do a keyword search in the
>> Cassandra doc to see if it is supported outside of DSE.
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 4:23 PM, oleg yusim  wrote:
>>
>>> Alex,
>>>
>>> No offense are taken, your question is absolutely legit. As we used to
>>> joke in security world "putting on my black hat"/"putting on my white hat"
>>> - i.e. same set of questions I would be asking for hacking and protecting
>>> the product. So, I commend you for being careful here.
>>>
>>> Now, at that particular case I'm acting with my "white hat on". :) I'm
>>> hired by VMware, to help them improve security posture for their new
>>> products (vRealize package). I do that as part of the security team on
>>> VMware side, and working in conjunction with DISA (
>>> http://iase.disa.mil/stigs/Pages/a-z.aspx) we are creating STIGs (I
>>> explained this term in details in this same thread above, in my response to
>>> Jon, so I wouldn't repeat myself here) for all the components vRealize
>>> suite of products has, including Cassandra, which is used in one of the
>>> products. This STIGs would be handed over to DISA, reviewed by their SMEs
>>> and published on their website, creating great opportunity for all the
>>> products covered to improve their security posture and advance on a market
>>> for free.
>>>
>>> For VMware purposes, we would harden our suite of products, based on
>>> STIGs, and create own overall Security Guideline, riding on top of STIGs.
>>>
>>> As I mentioned above, for both Cassandra and DSE, equally, this document
>>> would be very beneficial, since it would enable customers and help them to
>>> run hardening on the product and place it right on the system, surrounded
>>> by the correct set of compensation controls.
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 1:10 PM, Alex Popescu 
>>> wrote:
>>>

 On Fri, Jan 29, 2016 at 8:17 AM, oleg yusim 
 wrote:

> Thanks for encouraging me, I kind of grew a bit desperate. I'm
> security person, not a Cassandra expert, and doing security assessment of
> Cassandra DB, I have to rely on community heavily. I will put together a
> composed version of all my previous queries, will title it 

Re: Session timeout

2016-01-29 Thread oleg yusim
Jack,

I have to note, Cassandra documentation the way it stays now, is not nearly
detailed enough. For instance:
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html
is all Cassandra has to say about logging. The reason why I bring my
questions to the mailing list is, once again, I can't make security
recommendations which would be followed across US, based of the lack of
information. It is really not that difficult to confirm that such feature
is not present.

Besides, questions I ask might give some implementations ideas. Even from
that particular discussion one has been raised already.
https://issues.apache.org/jira/browse/CASSANDRA-11097 With that in mind,
would you please be able to respond with definitive answers to questions I
raised here? My assumption, answer would be "not supported" for all 5 not
yet answered, but I need a confirmation from community.

Thanks,

Oleg

On Fri, Jan 29, 2016 at 4:34 PM, Jack Krupansky 
wrote:

> No offense, but my suggestion here is that you write up a preliminary list
> of your own answers based on your own reading of the doc, specs, and white
> papers (and source code) and post that list, like on Google Docs, for
> people to review in bulk, rather than force all Cassandra users on this
> list to participate in a full security review one item at a time. To
> reiterate, you should be treating the doc as the definitive guide to what
> is supported - given the importance that the Cassandra and DSE developers
> placed on security features over the past couple of years, it really is
> truly safe to say that if it isn't in the doc then it is definitively not
> supported. Yes, it would be good to review your final list as a courtesy
> check, but asking us to confirm what appears to be obvious (i.e., it is not
> in the doc) seems more than a bit excessive to me.
>
> If there is any true confusion in the doc, of course let us know (or email
> to d...@datastax.com), but there is no need for us to confirm that you
> did not find something in the doc.
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 5:02 PM, oleg yusim  wrote:
>
>> Jack,
>>
>> Appreciate the links. As I mentioned, I looked over both DSE and
>> Cassandra sets of documentation, and ran some experiments on my Cassandra
>> installation. What I'm bringing here is something I couldn't find
>> definitive answer for in any of the above-mentioned sources.
>>
>> For instance, regarding logging, here are questions I have:
>>
>> 1)  Identity-based logging (we investigated it in another thread and I
>> got "not supported" as an answer)
>> 2)  Logging source and destinations (server and client IP)
>> 3)  Logging connections and disconnections - same
>> 4)  Logging hostname
>> 5)  Ability to automatically shut down in case if it ran out of space to
>> store logs?
>> 6)  Ability to automatically overwrite audit logs in case if no more
>> space is available (oldest first) ?
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 3:47 PM, Jack Krupansky > > wrote:
>>
>>> There is some more detail on DSE Security in this white paper:
>>>
>>> http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-SOX-Compliance.pdf
>>>
>>> It mentions auditing, for example. I think you were asking abut that
>>> earlier.
>>>
>>> There may be some additional info or discussion related to security on
>>> these main web site pages:
>>> http://www.datastax.com/products/datastax-enterprise-security
>>>
>>> Security was given a reasonably high priority for DSE in releases 3.0
>>> and beyond, so that if something is not highlighted in those promotional
>>> materials, then it probably isn't in the software.
>>>
>>> In general, if you see a feature in DSE, just do a keyword search in the
>>> Cassandra doc to see if it is supported outside of DSE.
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 29, 2016 at 4:23 PM, oleg yusim  wrote:
>>>
 Alex,

 No offense are taken, your question is absolutely legit. As we used to
 joke in security world "putting on my black hat"/"putting on my white hat"
 - i.e. same set of questions I would be asking for hacking and protecting
 the product. So, I commend you for being careful here.

 Now, at that particular case I'm acting with my "white hat on". :) I'm
 hired by VMware, to help them improve security posture for their new
 products (vRealize package). I do that as part of the security team on
 VMware side, and working in conjunction with DISA (
 http://iase.disa.mil/stigs/Pages/a-z.aspx) we are creating STIGs (I
 explained this term in details in this same thread above, in my response to
 Jon, so I wouldn't repeat myself here) for all the components vRealize
 suite of products has, including Cassandra, which is used in one of the
 products. This STIGs would be handed over to DISA, reviewed by their SMEs

Re: Session timeout

2016-01-29 Thread Robert Coli
On Fri, Jan 29, 2016 at 3:12 PM, Jack Krupansky 
wrote:

> One last time, I'll simply renew my objection to the way you are abusing
> this list.
>

FWIW, while I appreciate that OP (Oleg) is attempting to do a service for
the community, I agree that the flood of single topic, context-lacking
posts regarding deep internals of Cassandra is likely to inspire the
opposite of a helpful response.

This is important work, however, so hopefully we can collectively find a
way through the meta and can discuss this topic without acrimony! :D

=Rob


Re: EC2 storage options for C*

2016-01-29 Thread Eric Plowe
RAID 0 regardless of instance type*

On Friday, January 29, 2016, Eric Plowe  wrote:

> Bryan,
>
> Correct, I should have clarified that. I'm evaluating instance types based
> on one SSD or two in RAID 0. I thinking its going to be two in RAID 0,
> but as I've had no experience running a production C* cluster in EC2, I
> wanted to reach out to the list.
>
> Sorry for the half-baked question :)
>
> Eric
>
> On Friday, January 29, 2016, Bryan Cheng  > wrote:
>
>> Do you have any idea what kind of disk performance you need?
>>
>> Cassandra with RAID 0 is a fairly common configuration (Al's awesome
>> tuning guide has a blurb on it
>> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so
>> if you feel comfortable with the operational overhead it seems like a solid
>> choice.
>>
>> To clarify, though,  by "just one", do you mean just using one of two
>> available ephemeral disks available to the instance, or are you evaluating
>> different instance types based on one disk vs two?
>>
>> On Fri, Jan 29, 2016 at 4:33 PM, Eric Plowe  wrote:
>>
>>> My company is planning on rolling out a C* cluster in EC2. We are
>>> thinking about going with ephemeral SSDs. The question is this: Should we
>>> put two in RAID 0 or just go with one? We currently run a cluster in our
>>> data center with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with
>>> the performance we are seeing thus far.
>>>
>>> Thanks!
>>>
>>> Eric
>>>
>>
>>


Re: Security labels

2016-01-29 Thread oleg yusim
Thanks Dani!

Oleg

On Fri, Jan 29, 2016 at 3:28 PM, Dani Traphagen  wrote:

> ​Hi Oleg,
>
> Thanks that helped clear things up! This sounds like a daunting task. I
> wish you all the best with it.
>
> Cheers,
> Dani​
>
> On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim  wrote:
>
>> Dani,
>>
>> I really appreciate you response. Actually, session timeouts and security
>> labels are two different topics (first is about attack when somebody
>> opened, say, ssh window to DB, left his machine unattended and somebody
>> else stole his session, second - to enable DB to support what called MAC
>> access model - stays for mandatory access control. It is widely used in the
>> government and military, but not outside of it, we all are used to DAC
>> access control model). However, I think you are right and I should move all
>> my queries under the one big roof and call this thread "Security". I will
>> do this today.
>>
>> Now, about what you have said, I just answered the same to Jon, in
>> Session Timeout thread, but would quickly re-cap here. I understand that
>> Cassandra's architecture was aimed and tailored for completely different
>> type of scenario. However, unfortunately, that doesn't mean that Cassandra
>> is not vulnerable to the same very set of attacks relational database would
>> be vulnerable to. It just means Cassandra is not protected against those
>> attacks, because protection against them was not thought of, when database
>> was created. I already gave the AAA and session's timeout example in Jon's
>> thread, and those are just one of many.
>>
>> Now what I'm trying to do, I'm trying to create a STIG - security federal
>> compliance document, which will assess Cassandra against SRG concepts
>> (security federal compliance recommendations for databases overall) and
>> will highlight what is not met, and can't be in current design (i.e. what
>> system architects should keep in mind and what they need to compensate for
>> with other controls on different layers of system model) and  what can be
>> met either with configuration or with little enhancement (and how).
>>
>> That document would be of great help for Cassandra as a product because
>> it would allow it to be marketed as a product with existing security
>> assessment and guidelines, performed according to DoD standards. It would
>> also allow to move product in the general direction of improving its
>> security posture. Finally, the document would be posted on DISA site (
>> http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
>> architect to utilize, which would greatly reduce the risk for Cassandra
>> product to be hacked in a field.
>>
>> To clear things out - what I ask about are not my expectations. I really
>> do not expect developers of Cassandra to run and start implementing
>> security labels, just because I asked about it. :) My questions are to
>> build my internal knowledge of DB current design, so that I can build my
>> security assessment based of it, not more, not less.
>>
>> I guess, summarizing what I said on top, from what I'm doing Cassandra as
>> a product would end up benefiting quite a bit. That is why I think it would
>> make sense for Cassandra community to help me with my questions even if
>> they sound completely of the traditional "grid".
>>
>> Thanks again, I really appreciate your response and conversation overall.
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
>> dani.trapha...@datastax.com> wrote:
>>
>>> Also -- it looks like you're really asking questions about session
>>> timeouts and security labels as they associate, would be more helpful to
>>> keep in one thread. :)
>>>
>>>
>>> On Friday, January 29, 2016, Dani Traphagen 
>>> wrote:
>>>
 Hi Oleg,

 I understand your frustration but unfortunately, in the terms of your
 security assessment, you have fallen into a mismatch for Cassandra's
 utility.

 The eventuality of having multiple sockets open without the query input
 for long durations of time isn't something that was
 architected...because...Cassnadra was built to take massive quantities
 of queries both in volume and velocity.

 Your expectation of the database isn't in line with how our why it was
 designed. Generally, security solutions are architected
 around Cassandra, baked into the data model, many solutions
 are home-brewed, written into the application or provided by using another
 security client.

 DSE has different security aspects rolling out in the next release
 as addressed earlier by Jack, like commit log and hint encryptions, as well
 as, unified authentication...but secuirty labels aren't on anyone's radar
 as a pressing "need." It's not something I've heard about as a
 priority before anyway.

 Hope this helps!

 Cheers,
 Dani

 On Friday, January 29, 

EC2 storage options for C*

2016-01-29 Thread Eric Plowe
My company is planning on rolling out a C* cluster in EC2. We are thinking
about going with ephemeral SSDs. The question is this: Should we put two in
RAID 0 or just go with one? We currently run a cluster in our data center
with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the
performance we are seeing thus far.

Thanks!

Eric


Re: EC2 storage options for C*

2016-01-29 Thread Bryan Cheng
Do you have any idea what kind of disk performance you need?

Cassandra with RAID 0 is a fairly common configuration (Al's awesome tuning
guide has a blurb on it
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so if
you feel comfortable with the operational overhead it seems like a solid
choice.

To clarify, though,  by "just one", do you mean just using one of two
available ephemeral disks available to the instance, or are you evaluating
different instance types based on one disk vs two?

On Fri, Jan 29, 2016 at 4:33 PM, Eric Plowe  wrote:

> My company is planning on rolling out a C* cluster in EC2. We are thinking
> about going with ephemeral SSDs. The question is this: Should we put two in
> RAID 0 or just go with one? We currently run a cluster in our data center
> with 2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the
> performance we are seeing thus far.
>
> Thanks!
>
> Eric
>


Re: Session timeout

2016-01-29 Thread oleg yusim
Jack,

Appreciate the links. As I mentioned, I looked over both DSE and Cassandra
sets of documentation, and ran some experiments on my Cassandra
installation. What I'm bringing here is something I couldn't find
definitive answer for in any of the above-mentioned sources.

For instance, regarding logging, here are questions I have:

1)  Identity-based logging (we investigated it in another thread and I got
"not supported" as an answer)
2)  Logging source and destinations (server and client IP)
3)  Logging connections and disconnections - same
4)  Logging hostname
5)  Ability to automatically shut down in case if it ran out of space to
store logs?
6)  Ability to automatically overwrite audit logs in case if no more space
is available (oldest first) ?

Thanks,

Oleg

On Fri, Jan 29, 2016 at 3:47 PM, Jack Krupansky 
wrote:

> There is some more detail on DSE Security in this white paper:
>
> http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-SOX-Compliance.pdf
>
> It mentions auditing, for example. I think you were asking abut that
> earlier.
>
> There may be some additional info or discussion related to security on
> these main web site pages:
> http://www.datastax.com/products/datastax-enterprise-security
>
> Security was given a reasonably high priority for DSE in releases 3.0 and
> beyond, so that if something is not highlighted in those promotional
> materials, then it probably isn't in the software.
>
> In general, if you see a feature in DSE, just do a keyword search in the
> Cassandra doc to see if it is supported outside of DSE.
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 4:23 PM, oleg yusim  wrote:
>
>> Alex,
>>
>> No offense are taken, your question is absolutely legit. As we used to
>> joke in security world "putting on my black hat"/"putting on my white hat"
>> - i.e. same set of questions I would be asking for hacking and protecting
>> the product. So, I commend you for being careful here.
>>
>> Now, at that particular case I'm acting with my "white hat on". :) I'm
>> hired by VMware, to help them improve security posture for their new
>> products (vRealize package). I do that as part of the security team on
>> VMware side, and working in conjunction with DISA (
>> http://iase.disa.mil/stigs/Pages/a-z.aspx) we are creating STIGs (I
>> explained this term in details in this same thread above, in my response to
>> Jon, so I wouldn't repeat myself here) for all the components vRealize
>> suite of products has, including Cassandra, which is used in one of the
>> products. This STIGs would be handed over to DISA, reviewed by their SMEs
>> and published on their website, creating great opportunity for all the
>> products covered to improve their security posture and advance on a market
>> for free.
>>
>> For VMware purposes, we would harden our suite of products, based on
>> STIGs, and create own overall Security Guideline, riding on top of STIGs.
>>
>> As I mentioned above, for both Cassandra and DSE, equally, this document
>> would be very beneficial, since it would enable customers and help them to
>> run hardening on the product and place it right on the system, surrounded
>> by the correct set of compensation controls.
>>
>> Thanks,
>>
>> Oleg
>>
>> On Fri, Jan 29, 2016 at 1:10 PM, Alex Popescu  wrote:
>>
>>>
>>> On Fri, Jan 29, 2016 at 8:17 AM, oleg yusim  wrote:
>>>
 Thanks for encouraging me, I kind of grew a bit desperate. I'm security
 person, not a Cassandra expert, and doing security assessment of Cassandra
 DB, I have to rely on community heavily. I will put together a composed
 version of all my previous queries, will title it "Security assessment
 questions" and will post it once again.
>>>
>>>
>>> Oleg,
>>>
>>> I'll apologize in advance if my answer will sound initially harsh. I've
>>> been following your questions (mostly because I find them interesting), but
>>> I've never jumped to answer any of them as I confess not knowing the
>>> purpose of your research/report makes me caution (e.g. are you doing this
>>> for your current employer evaluating the future use of the product? are you
>>> doing this for an analyst company? are you planning to sell this report?
>>> etc. etc).
>>>
>>>
>>> --
>>> Bests,
>>>
>>> Alex Popescu | @al3xandru
>>> Sen. Product Manager @ DataStax
>>>
>>>
>>
>


Re: Cassandra driver class

2016-01-29 Thread Corry Opdenakker
Fully correct Steve, it is a source of confusion an having a standard pool
at app/driver level will work as good as the jee solution, but just like
the cql foresees an easy developer entry for cassandra because it is
similar to sql, the jee datasource could do the same in front of the
middleware audience.
When I currently would like to convince my customers to give cassandra a
try, then Im sure that several tech decission makers will say that  the
lack of a standard jee datasource impl is strange and will raise some
concerns and influence the first perception and landing of the product.
This while in one way it makes sence that there is no standard jee
datasource solution present.

Seems to be indeed a topic for the java driver.

Op vrijdag 29 januari 2016 heeft Steve Robenalt 
het volgende geschreven:

> It's probably a source of some confusion that in the JEE world, the driver
> isn't pooled, but the data source is. Since the Java Driver for Cassandra
> includes the pooling, there's no need for a JEE data source on top of it.
> This also means that the Java Driver for Cassandra isn't a one-for-one
> exchange with a JDBC driver. I'm not sure if this same level of confusion
> occurs with other language drivers for Cassandra.
>
> BTW, as Alex suggested earlier in the thread, this discussion should
> probably be moved to the Java Driver mailing list.
>
> Steve
>
> On Fri, Jan 29, 2016 at 1:02 PM, Jack Krupansky  > wrote:
>
>> Unfortunately, somebody is likely going to need to educate us in the
>> Cassandra community as to what a JBOSS VDB and TEIID really are. For now,
>> our response will probably end up being that you should use the Java Driver
>> for Cassandra, bypassing any JBOSS/VDB/TEIID support, for now. That TEIID
>> link above may shed some light. Otherwise, you'll probably have to ping the
>> TEIID community as far as how to configure JBOSS/TEIID. We're here to
>> answer questions about Cassandra itself.
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 3:42 PM, Corry Opdenakker > > wrote:
>>
>>> What about this cassandra specific howto explained in a recent jboss doc?
>>>
>>> https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources?_sscc=t
>>>
>>> Im also searching for the real recommended way of connecting to a
>>> cassandra db from a jee server, but I didnt found any standard documented
>>> solution yet. was a bit surprised that there is not any standard
>>> jca/resource archive solution foreseen while Cassandra itself is java
>>> based. Maybe I overlooked the info somewhere?
>>>
>>> Dbcp could help for a large part, but of course one requires a fully
>>> reliable production ready solution.
>>> https://commons.apache.org/proper/commons-dbcp/
>>>
>>> Currently I would go for a standard conn pool at app level as is
>>> described in the cassandra java driver pdf, knowing that middleware
>>> admins don't like that nonstandard jee approach.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Op vrijdag 29 januari 2016 heeft Alex Popescu >> > het volgende
>>> geschreven:
>>>
 I think both of those options expect a JDBC driver, while the DataStax
 Java driver is not one.

 As a side note, if you'd provide a more detailed description of the
 setup you want to get and post it to the Java driver mailing list
 https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
 chances of getting an answer will be higher.

 On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  wrote:

> I’m just getting started with Cassandra, and am trying to integrate it
> with JBoss.  I’m configuring the standalone-ha-full.xml file, but don’t
> know what to use for the driver class.  For example, I have this:
>
>
>
> 
>
> 
>
> com.datastax.driver.core.
>
> 
>
> 
>
>
>
> What do I replace “” with?
>
>
>
> Is “com.datastax.driver.core” even correct, or am I going down the
> wrong path?  I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.
>
>
>
> Should I be using  instead of ?
>
>
>
> Does anybody have a working example they can share?  Any help to get
> me going would be appreciated.  Thanks.
>
>
>
> Bill
>
>
>
>
>



 --
 Bests,

 Alex Popescu | @al3xandru
 Sen. Product Manager @ DataStax


>>>
>>> --
>>> --
>>> Bestdata.be
>>> Optimised ict
>>> Tel:+32(0)496609576
>>> co...@bestdata.be
>>> --
>>>
>>>
>>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org
> 

Re: EC2 storage options for C*

2016-01-29 Thread Jeff Jirsa
If you have to ask that question, I strongly recommend m4 or c4 instances with 
GP2 EBS.  When you don’t care about replacing a node because of an instance 
failure, go with i2+ephemerals. Until then, GP2 EBS is capable of amazing 
things, and greatly simplifies life.

We gave a talk on this topic at both Cassandra Summit and AWS re:Invent: 
https://www.youtube.com/watch?v=1R-mgOcOSd4 It’s very much a viable option, 
despite any old documents online that say otherwise.



From:  Eric Plowe
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, January 29, 2016 at 4:33 PM
To:  "user@cassandra.apache.org"
Subject:  EC2 storage options for C*

My company is planning on rolling out a C* cluster in EC2. We are thinking 
about going with ephemeral SSDs. The question is this: Should we put two in 
RAID 0 or just go with one? We currently run a cluster in our data center with 
2 250gig Samsung 850 EVO's in RAID 0 and we are happy with the performance we 
are seeing thus far.

Thanks!

Eric



smime.p7s
Description: S/MIME cryptographic signature


Problem while migrating a single node cluster from 2.1 to 3.2

2016-01-29 Thread Ajaya Agrawal
Hi,

I am a newbie when it comes to Cassandra administration and operation. We
have a single node cluster running 2.1 in EC2 and we are planning to move
it to better single machine instance and want to run 3.2 on that.

I installed 3.2 on the new machine and created a snapshot of the old
cluster and then copied over all the relevant directories to the new
machine in the appropriate directory. Specifically I copied
"/var/lib/cassandra/data/{keyspace} from old machine to the new machine.
Before that I created relevant schema in the new cluster. I was hoping that
Cassandra would see new directories and load up the new copied SSTables
automatically. At the least I was hoping to see the snapshot created in the
old cluster, when I did a "nodetool listsnapshots" in the new cluster.

I have also changed the name of the new cluster.

Please help me and let me know if I forgot to add any detail.

Cheers,
Ajaya


Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Jean Carlo
I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I am
not good on jvm so I would like to know how to do what @CorryOpdenakker
propose with cassandra.

:)

I check concurrent_compactors


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker  wrote:

> Hi guys,
> Cassandra is still new for me, but I have a lot of java tuning experience.
>
> For root cause detection of performance degradations its always good to
> start with collecting a series of java thread dumps. Take at problem
> occurrence using a loopscript for example 60 thread dumps with an interval
> of 1 or 2 seconds.
> Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or
> any similar tool and see which methods appear to be most active or blocking
> others.
>
> Its really very useful
>
> Same can be be done in a normal situation to compare the difference.
>
> That should give more insights.
>
> Cheers, Corry
>
>
> Op vrijdag 29 januari 2016 heeft Peddi, Praveen  het
> volgende geschreven:
>
>> Hello,
>> We have another update on performance on 2.1.11. compression_chunk_size
>>  didn’t really help much but We changed concurrent_compactors from default
>> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
>> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
>> metric that could affect read latencies is that 2.1.11 is running
>> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
>> 2.0.9 even though our read_repair_chance is same on both. Could anyone
>> shed some light on why 2.1.11 could be running read repair 10 to 50 times
>> more in spite of same configuration on both clusters?
>>
>> dclocal_read_repair_chance=0.10 AND
>> read_repair_chance=0.00 AND
>>
>> Here is the table for read repair metrics for both clusters.
>> 2.0.9 2.1.11
>> ReadRepairedBackground 5MinAvg 0.006 0.1
>> 15MinAvg 0.009 0.153
>> ReadRepairedBlocking 5MinAvg 0.002 0.55
>> 15MinAvg 0.007 0.91
>>
>> Thanks
>> Praveen
>>
>> From: Jeff Jirsa 
>> Reply-To: 
>> Date: Thursday, January 14, 2016 at 2:58 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Sorry I wasn’t as explicit as I should have been
>>
>> The same buffer size is used by compressed reads as well, but tuned with
>> compression_chunk_size table property. It’s likely true that if you lower
>> compression_chunk_size, you’ll see improved read performance.
>>
>> This was covered in the AWS re:Invent youtube link I sent in my original
>> reply.
>>
>>
>>
>> From: "Peddi, Praveen"
>> Reply-To: "user@cassandra.apache.org"
>> Date: Thursday, January 14, 2016 at 11:36 AM
>> To: "user@cassandra.apache.org", Zhiyan Shao
>> Cc: "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Hi,
>> We will try with reduced “rar_buffer_size” to 4KB. However
>> CASSANDRA-10249  says
>> "this only affects users who have 1. disabled compression, 2. switched to
>> buffered i/o from mmap’d”. None of this is true for us I believe. We use
>> default disk_access_mode which should be mmap. We also used
>> LZ4Compressor when created table.
>>
>> We will let you know if this property had any effect. We were testing
>> with 2.1.11 and this was only fixed in 2.1.12 so we need to play with
>> latest version.
>>
>> Praveen
>>
>>
>>
>> From: Jeff Jirsa 
>> Reply-To: 
>> Date: Thursday, January 14, 2016 at 1:29 PM
>> To: Zhiyan Shao , "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> Cc: "Agrawal, Pratik" 
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>>  / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
>> this is really the case depends on how much of your data is in page cache,
>> and whether or not you’re using mmap. Since the original question was asked
>> by someone using small RAM instances, it’s possible.
>>
>> We mitigate this by dropping compression_chunk_size in order to force a
>> smaller buffer on reads, so we don’t over read very small blocks. This has
>> other side effects (lower compression ratio, more garbage during
>> streaming), but significantly speeds up read workloads for us.
>>
>>
>> From: Zhiyan Shao
>> Date: Thursday, January 14, 2016 at 9:49 AM
>> To: "user@cassandra.apache.org"
>> Cc: Jeff Jirsa, "Agrawal, Pratik"
>> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>>
>> Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this
>> forum, you can find another thread I sent a while ago. The perf 

Re: Session timeout

2016-01-29 Thread Jack Krupansky
One last time, I'll simply renew my objection to the way you are abusing
this list. You'll hear no further reply from me and I will begin marking
any more of your excessive inquiries as spam. If others in the community
wish to do your security review for you one item at a time, that is their
prerogative and I'll respect their wishes. My suggestions for a superior
approach to getting feedback for your review still stands and requires no
further efforts from me at this stage.

-- Jack Krupansky

On Fri, Jan 29, 2016 at 5:50 PM, oleg yusim  wrote:

> Jack,
>
> I have to note, Cassandra documentation the way it stays now, is not
> nearly detailed enough. For instance:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLoggingLevels_r.html
> is all Cassandra has to say about logging. The reason why I bring my
> questions to the mailing list is, once again, I can't make security
> recommendations which would be followed across US, based of the lack of
> information. It is really not that difficult to confirm that such feature
> is not present.
>
> Besides, questions I ask might give some implementations ideas. Even from
> that particular discussion one has been raised already.
> https://issues.apache.org/jira/browse/CASSANDRA-11097 With that in mind,
> would you please be able to respond with definitive answers to questions I
> raised here? My assumption, answer would be "not supported" for all 5 not
> yet answered, but I need a confirmation from community.
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 4:34 PM, Jack Krupansky 
> wrote:
>
>> No offense, but my suggestion here is that you write up a preliminary
>> list of your own answers based on your own reading of the doc, specs, and
>> white papers (and source code) and post that list, like on Google Docs, for
>> people to review in bulk, rather than force all Cassandra users on this
>> list to participate in a full security review one item at a time. To
>> reiterate, you should be treating the doc as the definitive guide to what
>> is supported - given the importance that the Cassandra and DSE developers
>> placed on security features over the past couple of years, it really is
>> truly safe to say that if it isn't in the doc then it is definitively not
>> supported. Yes, it would be good to review your final list as a courtesy
>> check, but asking us to confirm what appears to be obvious (i.e., it is not
>> in the doc) seems more than a bit excessive to me.
>>
>> If there is any true confusion in the doc, of course let us know (or
>> email to d...@datastax.com), but there is no need for us to confirm that
>> you did not find something in the doc.
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 29, 2016 at 5:02 PM, oleg yusim  wrote:
>>
>>> Jack,
>>>
>>> Appreciate the links. As I mentioned, I looked over both DSE and
>>> Cassandra sets of documentation, and ran some experiments on my Cassandra
>>> installation. What I'm bringing here is something I couldn't find
>>> definitive answer for in any of the above-mentioned sources.
>>>
>>> For instance, regarding logging, here are questions I have:
>>>
>>> 1)  Identity-based logging (we investigated it in another thread and I
>>> got "not supported" as an answer)
>>> 2)  Logging source and destinations (server and client IP)
>>> 3)  Logging connections and disconnections - same
>>> 4)  Logging hostname
>>> 5)  Ability to automatically shut down in case if it ran out of space
>>> to store logs?
>>> 6)  Ability to automatically overwrite audit logs in case if no more
>>> space is available (oldest first) ?
>>>
>>> Thanks,
>>>
>>> Oleg
>>>
>>> On Fri, Jan 29, 2016 at 3:47 PM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
 There is some more detail on DSE Security in this white paper:

 http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-SOX-Compliance.pdf

 It mentions auditing, for example. I think you were asking abut that
 earlier.

 There may be some additional info or discussion related to security on
 these main web site pages:
 http://www.datastax.com/products/datastax-enterprise-security

 Security was given a reasonably high priority for DSE in releases 3.0
 and beyond, so that if something is not highlighted in those promotional
 materials, then it probably isn't in the software.

 In general, if you see a feature in DSE, just do a keyword search in
 the Cassandra doc to see if it is supported outside of DSE.

 -- Jack Krupansky

 On Fri, Jan 29, 2016 at 4:23 PM, oleg yusim 
 wrote:

> Alex,
>
> No offense are taken, your question is absolutely legit. As we used to
> joke in security world "putting on my black hat"/"putting on my white hat"
> - i.e. same set of questions I would be asking for hacking and protecting
> the product. So, I commend you for 

Re: Security labels

2016-01-29 Thread Dani Traphagen
​Hi Oleg,

Thanks that helped clear things up! This sounds like a daunting task. I
wish you all the best with it.

Cheers,
Dani​

On Fri, Jan 29, 2016 at 10:03 AM, oleg yusim  wrote:

> Dani,
>
> I really appreciate you response. Actually, session timeouts and security
> labels are two different topics (first is about attack when somebody
> opened, say, ssh window to DB, left his machine unattended and somebody
> else stole his session, second - to enable DB to support what called MAC
> access model - stays for mandatory access control. It is widely used in the
> government and military, but not outside of it, we all are used to DAC
> access control model). However, I think you are right and I should move all
> my queries under the one big roof and call this thread "Security". I will
> do this today.
>
> Now, about what you have said, I just answered the same to Jon, in Session
> Timeout thread, but would quickly re-cap here. I understand that
> Cassandra's architecture was aimed and tailored for completely different
> type of scenario. However, unfortunately, that doesn't mean that Cassandra
> is not vulnerable to the same very set of attacks relational database would
> be vulnerable to. It just means Cassandra is not protected against those
> attacks, because protection against them was not thought of, when database
> was created. I already gave the AAA and session's timeout example in Jon's
> thread, and those are just one of many.
>
> Now what I'm trying to do, I'm trying to create a STIG - security federal
> compliance document, which will assess Cassandra against SRG concepts
> (security federal compliance recommendations for databases overall) and
> will highlight what is not met, and can't be in current design (i.e. what
> system architects should keep in mind and what they need to compensate for
> with other controls on different layers of system model) and  what can be
> met either with configuration or with little enhancement (and how).
>
> That document would be of great help for Cassandra as a product because it
> would allow it to be marketed as a product with existing security
> assessment and guidelines, performed according to DoD standards. It would
> also allow to move product in the general direction of improving its
> security posture. Finally, the document would be posted on DISA site (
> http://iase.disa.mil/stigs/Pages/a-z.aspx) available for every security
> architect to utilize, which would greatly reduce the risk for Cassandra
> product to be hacked in a field.
>
> To clear things out - what I ask about are not my expectations. I really
> do not expect developers of Cassandra to run and start implementing
> security labels, just because I asked about it. :) My questions are to
> build my internal knowledge of DB current design, so that I can build my
> security assessment based of it, not more, not less.
>
> I guess, summarizing what I said on top, from what I'm doing Cassandra as
> a product would end up benefiting quite a bit. That is why I think it would
> make sense for Cassandra community to help me with my questions even if
> they sound completely of the traditional "grid".
>
> Thanks again, I really appreciate your response and conversation overall.
>
> Oleg
>
> On Fri, Jan 29, 2016 at 11:20 AM, Dani Traphagen <
> dani.trapha...@datastax.com> wrote:
>
>> Also -- it looks like you're really asking questions about session
>> timeouts and security labels as they associate, would be more helpful to
>> keep in one thread. :)
>>
>>
>> On Friday, January 29, 2016, Dani Traphagen 
>> wrote:
>>
>>> Hi Oleg,
>>>
>>> I understand your frustration but unfortunately, in the terms of your
>>> security assessment, you have fallen into a mismatch for Cassandra's
>>> utility.
>>>
>>> The eventuality of having multiple sockets open without the query input
>>> for long durations of time isn't something that was
>>> architected...because...Cassnadra was built to take massive quantities
>>> of queries both in volume and velocity.
>>>
>>> Your expectation of the database isn't in line with how our why it was
>>> designed. Generally, security solutions are architected
>>> around Cassandra, baked into the data model, many solutions
>>> are home-brewed, written into the application or provided by using another
>>> security client.
>>>
>>> DSE has different security aspects rolling out in the next release
>>> as addressed earlier by Jack, like commit log and hint encryptions, as well
>>> as, unified authentication...but secuirty labels aren't on anyone's radar
>>> as a pressing "need." It's not something I've heard about as a
>>> priority before anyway.
>>>
>>> Hope this helps!
>>>
>>> Cheers,
>>> Dani
>>>
>>> On Friday, January 29, 2016, oleg yusim  wrote:
>>>
 Jack,

 Thanks for your suggestion. I'm familiar with Cassandra documentation,
 and I'm aware of differences between DSE and Cassandra.

 

Re: Cassandra driver class

2016-01-29 Thread Steve Robenalt
It's probably a source of some confusion that in the JEE world, the driver
isn't pooled, but the data source is. Since the Java Driver for Cassandra
includes the pooling, there's no need for a JEE data source on top of it.
This also means that the Java Driver for Cassandra isn't a one-for-one
exchange with a JDBC driver. I'm not sure if this same level of confusion
occurs with other language drivers for Cassandra.

BTW, as Alex suggested earlier in the thread, this discussion should
probably be moved to the Java Driver mailing list.

Steve

On Fri, Jan 29, 2016 at 1:02 PM, Jack Krupansky 
wrote:

> Unfortunately, somebody is likely going to need to educate us in the
> Cassandra community as to what a JBOSS VDB and TEIID really are. For now,
> our response will probably end up being that you should use the Java Driver
> for Cassandra, bypassing any JBOSS/VDB/TEIID support, for now. That TEIID
> link above may shed some light. Otherwise, you'll probably have to ping the
> TEIID community as far as how to configure JBOSS/TEIID. We're here to
> answer questions about Cassandra itself.
>
> -- Jack Krupansky
>
> On Fri, Jan 29, 2016 at 3:42 PM, Corry Opdenakker 
> wrote:
>
>> What about this cassandra specific howto explained in a recent jboss doc?
>> https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources?_sscc=t
>>
>> Im also searching for the real recommended way of connecting to a
>> cassandra db from a jee server, but I didnt found any standard documented
>> solution yet. was a bit surprised that there is not any standard
>> jca/resource archive solution foreseen while Cassandra itself is java
>> based. Maybe I overlooked the info somewhere?
>>
>> Dbcp could help for a large part, but of course one requires a fully
>> reliable production ready solution.
>> https://commons.apache.org/proper/commons-dbcp/
>>
>> Currently I would go for a standard conn pool at app level as is
>> described in the cassandra java driver pdf, knowing that middleware
>> admins don't like that nonstandard jee approach.
>>
>>
>>
>>
>>
>>
>> Op vrijdag 29 januari 2016 heeft Alex Popescu  het
>> volgende geschreven:
>>
>>> I think both of those options expect a JDBC driver, while the DataStax
>>> Java driver is not one.
>>>
>>> As a side note, if you'd provide a more detailed description of the
>>> setup you want to get and post it to the Java driver mailing list
>>> https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
>>> chances of getting an answer will be higher.
>>>
>>> On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  wrote:
>>>
 I’m just getting started with Cassandra, and am trying to integrate it
 with JBoss.  I’m configuring the standalone-ha-full.xml file, but don’t
 know what to use for the driver class.  For example, I have this:



 

 

 com.datastax.driver.core.

 

 



 What do I replace “” with?



 Is “com.datastax.driver.core” even correct, or am I going down the
 wrong path?  I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.



 Should I be using  instead of ?



 Does anybody have a working example they can share?  Any help to get me
 going would be appreciated.  Thanks.



 Bill





>>>
>>>
>>>
>>> --
>>> Bests,
>>>
>>> Alex Popescu | @al3xandru
>>> Sen. Product Manager @ DataStax
>>>
>>>
>>
>> --
>> --
>> Bestdata.be
>> Optimised ict
>> Tel:+32(0)496609576
>> co...@bestdata.be
>> --
>>
>>
>


-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication


Re: Session timeout

2016-01-29 Thread Jack Krupansky
There is some more detail on DSE Security in this white paper:
http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-SOX-Compliance.pdf

It mentions auditing, for example. I think you were asking abut that
earlier.

There may be some additional info or discussion related to security on
these main web site pages:
http://www.datastax.com/products/datastax-enterprise-security

Security was given a reasonably high priority for DSE in releases 3.0 and
beyond, so that if something is not highlighted in those promotional
materials, then it probably isn't in the software.

In general, if you see a feature in DSE, just do a keyword search in the
Cassandra doc to see if it is supported outside of DSE.

-- Jack Krupansky

On Fri, Jan 29, 2016 at 4:23 PM, oleg yusim  wrote:

> Alex,
>
> No offense are taken, your question is absolutely legit. As we used to
> joke in security world "putting on my black hat"/"putting on my white hat"
> - i.e. same set of questions I would be asking for hacking and protecting
> the product. So, I commend you for being careful here.
>
> Now, at that particular case I'm acting with my "white hat on". :) I'm
> hired by VMware, to help them improve security posture for their new
> products (vRealize package). I do that as part of the security team on
> VMware side, and working in conjunction with DISA (
> http://iase.disa.mil/stigs/Pages/a-z.aspx) we are creating STIGs (I
> explained this term in details in this same thread above, in my response to
> Jon, so I wouldn't repeat myself here) for all the components vRealize
> suite of products has, including Cassandra, which is used in one of the
> products. This STIGs would be handed over to DISA, reviewed by their SMEs
> and published on their website, creating great opportunity for all the
> products covered to improve their security posture and advance on a market
> for free.
>
> For VMware purposes, we would harden our suite of products, based on
> STIGs, and create own overall Security Guideline, riding on top of STIGs.
>
> As I mentioned above, for both Cassandra and DSE, equally, this document
> would be very beneficial, since it would enable customers and help them to
> run hardening on the product and place it right on the system, surrounded
> by the correct set of compensation controls.
>
> Thanks,
>
> Oleg
>
> On Fri, Jan 29, 2016 at 1:10 PM, Alex Popescu  wrote:
>
>>
>> On Fri, Jan 29, 2016 at 8:17 AM, oleg yusim  wrote:
>>
>>> Thanks for encouraging me, I kind of grew a bit desperate. I'm security
>>> person, not a Cassandra expert, and doing security assessment of Cassandra
>>> DB, I have to rely on community heavily. I will put together a composed
>>> version of all my previous queries, will title it "Security assessment
>>> questions" and will post it once again.
>>
>>
>> Oleg,
>>
>> I'll apologize in advance if my answer will sound initially harsh. I've
>> been following your questions (mostly because I find them interesting), but
>> I've never jumped to answer any of them as I confess not knowing the
>> purpose of your research/report makes me caution (e.g. are you doing this
>> for your current employer evaluating the future use of the product? are you
>> doing this for an analyst company? are you planning to sell this report?
>> etc. etc).
>>
>>
>> --
>> Bests,
>>
>> Alex Popescu | @al3xandru
>> Sen. Product Manager @ DataStax
>>
>>
>


Re: Session timeout

2016-01-29 Thread oleg yusim
Jeff,

Understood. Thanks for your response. I would put together my questions in
one thread here, will title it "Security". Then I will move whatever was
not answered to the dev thread.

Thanks,

Oleg

On Fri, Jan 29, 2016 at 11:42 AM, Jeff Jirsa 
wrote:

>
> > For instance, way AAA (authentication, authorization, audit) is done,
> doesn't allow for centralized account and access control management, which
> in reality translates into shared accounts and no hierarchy.
>
> Authentication and Authorization are both pluggable. Any organization can
> write their own, and tie it to any AAA system they currently have. If they
> were feeling generous, they could open source it for the community, and
> perhaps bring it upstream. There’s nothing fundamentally preventing your
> organization from writing an Authenticator (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/IAuthenticator.java
>  )
> or Authorizor (
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/IAuthorizer.java
>  )
> if they were so inclined.
>
> Audit is something that’s being actively discussed (
> https://issues.apache.org/jira/browse/CASSANDRA-8844 ).
>
> It’s an open source project with a very small number of commercial
> vendors. In general, that means there are 3 options:
>
>1. Wait for someone else to write it to fit their need, and hopefully
>they open source it.
>2. Write it yourself
>3. Pay a vendor (such as Datastax), and let them know in advance it’s
>a requirement to get it on their roadmap. This is really #2 with some
>polish to make it easier to get through your legal/AP systems.
>
> > So far it doesn't work quite well, and from what you are saying, it
> wouldn't, because of lack of knowledge and lack of motivation to get it.
> What would be your suggestion? Who is capable of answering my questions? Is
> there another community, I should turn to?
>
> The cassandra-user and cassandra-dev mailing lists are the primary sources
> of knowledge outside of support contracts. For paid support, companies like
> Datastax and The Last Pickle tend to be well respected options. Both of
> those companies will probably answer some of your questions for free if you
> post on these mailing lists. They’ll likely answer even more if you pay
> them.
>
>
>
> From: oleg yusim
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, January 29, 2016 at 9:16 AM
> To: "user@cassandra.apache.org"
> Subject: Re: Session timeout
>
> Jon,
>
> I suspected something like that. I did a bit of learning on Cassandra
> before starting my assessment, and I understand that you are right, and it
> is generally not used like that.
>
> However (taking off my developer hat and putting on my security architect
> hat), from the security point of view the way Cassandra is used now is not
> very secure. For instance, way AAA (authentication, authorization, audit)
> is done, doesn't allow for centralized account and access control
> management, which in reality translates into shared accounts and no
> hierarchy. That in turn translates into situation when one person
> compromising credentials means complete disaster - administrative access to
> DB was just given up, with all the consequences. To top it all logging
> currently implemented in horrible manner too. It doesn't even allow to log
> username - basic requirement for any product, which would allow DBA or ISSO
> to figure out who did what on DB and recover in case of attack or crash. In
> general, logs the way they are today are targeted toward developer, making
> changes in DB, not toward the DBA, using it, and doesn't make much sense in
> my opinion.
>
> Now if you are interested in that subject, that document:
> http://iasecontent.disa.mil/stigs/zip/Jan2016/U_Database_V2R3_SRG.zip
> covers security concerns which should be taken in the account, when we are
> designing database. It also explains why each of them is important and what
> exactly would happen if it would be neglected.
>
> Jon, I would also appreciate suggestion. What I do right now is called
> "writing a STIG".That is when somebody takes concepts from SRG (the
> document I gave you link to above) and figures out how those are applied to
> that particular product. What is met (and what configuration on product
> leads to it, exactly), what is not met, but can be with little enhancement
> (and again - what those would be exactly), and what is not met and can't be
> met at current design. All that is combined into one document, called STIG
> and published by government (DISA) on
> http://iase.disa.mil/stigs/Pages/a-z.aspx page. Those STIGs mean a great
> deal from the security point of view because they:
>
>- Allow to save a lot of time on re-assessment of the product every
>single time
>- Allow to know what are the products limitations are from the
>security point of view before hands (and as such, place it right on the
>   

Re: why one of the new added nodes' bootstrap is very slow?

2016-01-29 Thread Alain RODRIGUEZ
Hi Dillon


> What should I do for this wrong bootstrap?


You should first remove the .184 nodes (the node with almost no data). The
standard command is *nodetool decommission* from the node you want remove
from the cluster. Yet this would move the data from the node we want to
remove to other nodes and we don't trust data on these 2 nodes. Instead you
can stop the node and use *nodetool removenode * from an other node
to remove it the using other nodes to create new replicas.

Then, when your node is down and out of the cluster, any of the following
should work.

sudo rm -rf /var/lib/cassandra/data/*
sudo rm -rf /var/lib/cassandra/commitlog/*
sudo rm -rf /var/lib/cassandra/savedcaches/*
sudo rm -rf /var/log/cassandra/*

or just

sudo rm -rf /var/lib/cassandra/*

Set auto_bootstrap=true in cassandra.yaml and start cassandra when you're
ready to bootstrap.

Yes, the old nodes with Memory: 64G, Disk: 4 X 1.1T and CPU: 16 cores, the
> old nodes with: Memory 32G, Disk: 1 X 460G and CPU: 32 cores


Not sure which ones are the new and which are the old ones, but, some have
bigger CPU / memory, other have more space and it is hard to say how you
should configure vnodes, the balance Power / Space depends on your use
case. In general I *wouldn't *advise the use of heterogeneous hardware
(even if Cassandra allows it) to avoid operational overload. It is way
easier to consider each node the same way imho.

No why :-), the 512 is from some one example, 256 because I used different
> hardware, I can modified all the numbers after I add these new nodes
> successfully?


About the number of vnodes, when the node are in it is too late. I heard
(and did not check) that the number of vnodes by default is too high for
most cases, impacting performances for repairs for example. So I wanted to
let you know. Maybe someone else will be able to tell you more about it.

Under "192.21.0.185 229.2GB", I can directly "rm -rf
> /path_to_cassandra/data/" without changing anything else, and start the
> cassadra again?


I am not sure what's wrong with this node. I would probably dig it a bit
more before removing it. Did you try repairs / cleanup on your nodes ? Is
there any error in your logs ? Do you have snapshots taking some space ?
What is the output of a *nodetool status  *?

Yet decision is yours, I have no idea if this is a production cluster or
not and the environment / things you did, etc.

Good luck

C*heers,

-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com


2016-01-28 2:57 GMT+01:00 土卜皿 :

> Hi Alain,
> Thank you very much!
>
>
>> UJ  192.21.0.185  299.22 GB  256 ?   
>> 84c0dd16-6491-4bfb-b288-d4e410cd8c2a  RAC1
>>> UN  192.21.0.184  670.14 MB  256 ?   
>>> 4041c232-c110-4315-89a1-23ca53b851c2  RAC1
>>>
>>>
>> Obviously .184 didn't bootstrap correctly. When a node is added, it
>> becomes responsible for a range (multiple ranges with vnodes), so it has to
>> receive data from nodes previously responsible for this (these) range(s).
>> So 600 MB looks wrong.
>>
>
> What should I do for this wrong bootstrap?
>
>
>>
>> So .185 is behaving as expected, .184 isn't.
>>
>> Yet .185 having twice the data from other node is weird unless you
>> changed Replication factor or streamed data multiple time (then compaction
>> will eventually fix this).
>>
>
> No, I did not change Replication factor
>
>
>> Plus this node has less tokens than the first 3 nodes.
>> Are you running heterogeneous hardware ?
>>
>
> Yes, the old nodes with Memory: 64G, Disk: 4 X 1.1T and CPU: 16 cores, the
> old nodes with: Memory 32G, Disk: 1 X 460G and CPU: 32 cores
>
>
>
>> Why setting 512 token for the 3 first nodes, and 256 for other nodes ?
>> From what I heard default vnodes is a way too high, you generally want to
>> go with something between 16 and 64 on production (if it is not too late).
>>
>
> No why :-), the 512 is from some one example, 256 because I used different
> hardware, I can modified all the numbers after I add these new nodes
> successfully?
>
>
> So I restarted it and the join continued! I don't know why there is the
>>> difference between the two nodes?
>>>
>> My guess is the join did not continue. Once you bootstrap a node, system
>> keyspace is filled up with some information. If the bootstrap fails, you
>> need to wipe the data directory. I advice you to directly "rm -rf
>> /path_to_cassandra/data/*".
>>
>> If you don't remove system KS, node will behave as he is already part of
>> the ring and so, won't stream anything, it won't bootstrap, just start. So
>> that would be the difference imho.
>>
>> If you just wipe the system keyspace (not your data), it will work, yet
>> you will end up streaming the same data and will need to compact, adding
>> useless work.
>>
>> So I would go clean stat and start the process again.
>>
> Sorry, I am not so clear for the above description, you mean:
>
> Under "192.21.0.185 229.2GB", I can directly "rm -rf
> 

Re: Questions about the replicas selection and remote coordinator

2016-01-29 Thread Steve Robenalt
Hi Jun,

The replicas are chosen according to factors that are generally more easily
selected internally, as is the case with coordinators. Even if the replicas
were selected in a completely round-robin fashion initially, they could end
up being re-distributed as a result of node failures, additions/removals
to/from the cluster, etc, particularly when vnodes are used. As such, the
diagrams and the nodes they refer to are hypothetical, but accurate in the
sense that they are non-contiguous, and that different sets of replicas are
distributed to various parts of the cluster.

As far as the remote coordinator is concerned, I'm not sure what motivated
the change from 1.2 to 2.1 and would be interested in understanding that
change myself. I do know that improved performance was a big part of the
2.1 release, but I'm not sure if the change in coordinators was part of
that effort or not.

Steve


On Fri, Jan 29, 2016 at 10:13 AM, Jun Wu  wrote:

> Hi Steve,
>
>Thank you so much for your reply.
>
>Yes, you're right, I'm using the version of 2.1. So based on this, I
> think I'm outdated.
>
> However, this comes to another interesting question: why we change
> this part from version 1 to version 2. As we can see that in version 1,
> there's connections from node 10 in DC 1 with node 10 in DC 2, then node 10
> in DC 2 send 3 copies to 3 nodes in DC 2, which should be more time-saving
> than version 2.1, which send data from node 10 in DC 1 to 3 nodes in DC 2
> directly.
>
>  Also, is there any information on how to choose the replicas. Like
> here
> https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html
> Why we choose node 1, 3, 6 as replicas and 4, 8, 11 as another 3
> replicas?
>
> Also, is node 11 working as remote coordinator here? Or is the concept
> of remote coordinator really existed, as the figure shows, we even don't
> need the remote coordinator.
>
> Thanks!
>
> Jun
>
>
>
>
> --
> Date: Fri, 29 Jan 2016 09:55:58 -0800
> Subject: Re: Questions about the replicas selection and remote coordinator
> From: sroben...@highwire.org
> To: user@cassandra.apache.org
>
>
> Hi Jun,
>
> The 2 diagrams you are comparing come from versions of Cassandra that are
> significantly different - 1.2 in the first case and 2.1 in the second case,
> so it's not surprising that there are differences. since you haven't
> qualified your question with the Cassandra version you are asking about, I
> would assume that the 2.1 example is more representative of what you would
> be likely to see. In any case, it's best to use a consistent version for
> your documentation because Cassandra changes quite rapidly with many of the
> releases.
>
> As far as choosing the coordinator node, I don't think there's a way to
> force it, nor would it be a good idea to do so. In order to make a
> reasonable selection of coordinators, you would need a lot of internal
> knowledge about load on the nodes in the cluster and you'd need to also
> handle certain classes of failures and retries, so you would end up
> duplicating what is already being done for you internally.
>
> Steve
>
>
> On Fri, Jan 29, 2016 at 9:11 AM, Jun Wu  wrote:
>
> Hi there,
>
> I have some questions about the replicas selection.
>
> Let's say that we have 2 data centers: DC1 and DC2, the figure also be
> got from link here:
> https://docs.datastax.com/en/cassandra/1.2/cassandra/images/write_access_multidc_12.png.
>  There're
> 10 nodes in each data center. We set the replication factor to be 3 and 3
> in each data center, which means there'll be 3 and 3 replicas in each data
> center.
>
> (1) My first question is how to choose which 3 nodes to write data to,
> in the link above, the 3 replicas are node 1, 2, 7. But, is there any
> mechanism to select these 3?
>
> (2) Another question is about the remote coordinator, the previous
> figure shows that node 10 in DC1 will write data to node 10  in DC 2, then
> node 10 in DC2 will write 3 copies to 3 nodes in DC2.
>
> But, another figure from datastax shows different method, the figure
> can be found here,
> https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/architectureClientRequestsMultiDCWrites_c.html.
>  It
> shows that node 10 in DC 1 will send directly 3 copies to 3 nodes in DC2,
> without using remote coordinator.
>
> I'm wondering which case is true, because in multiple data center, the
> time duration for these two methods varies a lot.
>
> Also, is there any mechanism to select which node to be remote
> coordinator?
>
> Thanks!
>
> Jun
>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>



-- 
Steve Robenalt
Software Architect

Re: Session timeout

2016-01-29 Thread Alex Popescu
On Fri, Jan 29, 2016 at 8:17 AM, oleg yusim  wrote:

> Thanks for encouraging me, I kind of grew a bit desperate. I'm security
> person, not a Cassandra expert, and doing security assessment of Cassandra
> DB, I have to rely on community heavily. I will put together a composed
> version of all my previous queries, will title it "Security assessment
> questions" and will post it once again.


Oleg,

I'll apologize in advance if my answer will sound initially harsh. I've
been following your questions (mostly because I find them interesting), but
I've never jumped to answer any of them as I confess not knowing the
purpose of your research/report makes me caution (e.g. are you doing this
for your current employer evaluating the future use of the product? are you
doing this for an analyst company? are you planning to sell this report?
etc. etc).


-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Peddi, Praveen
Hello,
We have another update on performance on 2.1.11. compression_chunk_size  didn’t 
really help much but We changed concurrent_compactors from default to 64 in 
2.1.11 and read latencies improved significantly. However, 2.1.11 read 
latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX metric 
that could affect read latencies is that 2.1.11 is running 
ReadRepairedBackground and ReadRepairedBlocking too frequently compared to 
2.0.9 even though our read_repair_chance is same on both. Could anyone shed 
some light on why 2.1.11 could be running read repair 10 to 50 times more in 
spite of same configuration on both clusters?

dclocal_read_repair_chance=0.10 AND
read_repair_chance=0.00 AND

Here is the table for read repair metrics for both clusters.
2.0.9   2.1.11
ReadRepairedBackground  5MinAvg 0.006   0.1
15MinAvg0.009   0.153
ReadRepairedBlocking5MinAvg 0.002   0.55
15MinAvg0.007   0.91

Thanks
Praveen

From: Jeff Jirsa >
Reply-To: >
Date: Thursday, January 14, 2016 at 2:58 PM
To: "user@cassandra.apache.org" 
>
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Sorry I wasn’t as explicit as I should have been

The same buffer size is used by compressed reads as well, but tuned with 
compression_chunk_size table property. It’s likely true that if you lower 
compression_chunk_size, you’ll see improved read performance.

This was covered in the AWS re:Invent youtube link I sent in my original reply.



From: "Peddi, Praveen"
Reply-To: "user@cassandra.apache.org"
Date: Thursday, January 14, 2016 at 11:36 AM
To: "user@cassandra.apache.org", Zhiyan Shao
Cc: "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Hi,
We will try with reduced “rar_buffer_size” to 4KB. However 
CASSANDRA-10249 says 
"this only affects users who have 1. disabled compression, 2. switched to 
buffered i/o from mmap’d”. None of this is true for us I believe. We use 
default disk_access_mode which should be mmap. We also used LZ4Compressor when 
created table.

We will let you know if this property had any effect. We were testing with 
2.1.11 and this was only fixed in 2.1.12 so we need to play with latest version.

Praveen





From: Jeff Jirsa >
Reply-To: >
Date: Thursday, January 14, 2016 at 1:29 PM
To: Zhiyan Shao >, 
"user@cassandra.apache.org" 
>
Cc: "Agrawal, Pratik" >
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 / 
https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not this is 
really the case depends on how much of your data is in page cache, and whether 
or not you’re using mmap. Since the original question was asked by someone 
using small RAM instances, it’s possible.

We mitigate this by dropping compression_chunk_size in order to force a smaller 
buffer on reads, so we don’t over read very small blocks. This has other side 
effects (lower compression ratio, more garbage during streaming), but 
significantly speeds up read workloads for us.


From: Zhiyan Shao
Date: Thursday, January 14, 2016 at 9:49 AM
To: "user@cassandra.apache.org"
Cc: Jeff Jirsa, "Agrawal, Pratik"
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this forum, you 
can find another thread I sent a while ago. The perf test I did indicated that 
read is slower for 2.1.6 than 2.0.14 so we stayed with 2.0.14.

On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen 
> wrote:
Thanks Jeff for your reply. Sorry for delayed response. We were running some 
more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see 
below) for the same exact load test. Memory spikes were also aggressive on 
2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some 
testing with Cassandra stress test and default Cassandra installation. Here are 
the results we saw between 2.0.9 and 2.1.11. Both are default installations and 
both use Cassandra stress test with same params. This is the closest 
apple-apple comparison we can get. As you can see both read and write 

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Nate McCall
On Fri, Jan 29, 2016 at 12:30 PM, Peddi, Praveen  wrote:
>
> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
 didn’t really help much but We changed concurrent_compactors from default
to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
metric that could affect read latencies is that 2.1.11 is running
ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
2.0.9 even though our read_repair_chance is same on both. Could anyone shed
some light on why 2.1.11 could be running read repair 10 to 50 times more
in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.10 AND
> read_repair_chance=0.00 AND
>
> Here is the table for read repair metrics for both clusters.
> 2.0.9 2.1.11
> ReadRepairedBackground 5MinAvg 0.006 0.1
> 15MinAvg 0.009 0.153
> ReadRepairedBlocking 5MinAvg 0.002 0.55
> 15MinAvg 0.007 0.91

The concurrent_compactors setting is not a surprise. The default in 2.0 was
the number of cores and in 2.1 is now:
"the smaller of (number of disks, number of cores), with a minimum of 2 and
a maximum of 8"
https://github.com/apache/cassandra/blob/cassandra-2.1/conf/cassandra.yaml#L567-L568

So in your case this was "8" in 2.0 vs. "2" in 2.1 (assuming these are
still the stock-ish c3.2xl mentioned previously?). Regardless, 64 is way to
high. Set it back to 8.

Note: this got dropped off the "Upgrading" guide for 2.1 in
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt though, so
lots of folks miss it.

Per said upgrading guide - are you sure the data directory is in the same
place between the two versions and you are not pegging the wrong
disk/partition? The default locations changed for data, cache and commitlog:
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L171-L180

I ask because being really busy on a single disk would cause latency and
potentially dropped messages which could eventually cause a
DigestMismatchException requiring a blocking read repair.

Anything unusual in the node-level IO activity between the two clusters?

That said, the difference in nodetool tpstats output during and after on
both could be insightful.

When we do perf tests internally we usually use a combination of Grafana
and Riemann to monitor Cassandra internals, the JVM and the OS. Otherwise,
it's guess work.

--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Cassandra driver class

2016-01-29 Thread Alex Popescu
I think both of those options expect a JDBC driver, while the DataStax Java
driver is not one.

As a side note, if you'd provide a more detailed description of the setup
you want to get and post it to the Java driver mailing list
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
chances of getting an answer will be higher.

On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  wrote:

> I’m just getting started with Cassandra, and am trying to integrate it
> with JBoss.  I’m configuring the standalone-ha-full.xml file, but don’t
> know what to use for the driver class.  For example, I have this:
>
>
>
> 
>
> 
>
> com.datastax.driver.core.
>
> 
>
> 
>
>
>
> What do I replace “” with?
>
>
>
> Is “com.datastax.driver.core” even correct, or am I going down the wrong
> path?  I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.
>
>
>
> Should I be using  instead of ?
>
>
>
> Does anybody have a working example they can share?  Any help to get me
> going would be appreciated.  Thanks.
>
>
>
> Bill
>
>
>
>
>



-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Corry Opdenakker
Hi guys,
Cassandra is still new for me, but I have a lot of java tuning experience.

For root cause detection of performance degradations its always good to
start with collecting a series of java thread dumps. Take at problem
occurrence using a loopscript for example 60 thread dumps with an interval
of 1 or 2 seconds.
Then load those dumps into IBM thread dump analyzer or in "eclipse mat" or
any similar tool and see which methods appear to be most active or blocking
others.

Its really very useful

Same can be be done in a normal situation to compare the difference.

That should give more insights.

Cheers, Corry

Op vrijdag 29 januari 2016 heeft Peddi, Praveen  het
volgende geschreven:

> Hello,
> We have another update on performance on 2.1.11. compression_chunk_size
>  didn’t really help much but We changed concurrent_compactors from default
> to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.11
> read latencies are still 1.5 slower than 2.0.9. One thing we noticed in JMX
> metric that could affect read latencies is that 2.1.11 is running
> ReadRepairedBackground and ReadRepairedBlocking too frequently compared to
> 2.0.9 even though our read_repair_chance is same on both. Could anyone
> shed some light on why 2.1.11 could be running read repair 10 to 50 times
> more in spite of same configuration on both clusters?
>
> dclocal_read_repair_chance=0.10 AND
> read_repair_chance=0.00 AND
>
> Here is the table for read repair metrics for both clusters.
> 2.0.9 2.1.11
> ReadRepairedBackground 5MinAvg 0.006 0.1
> 15MinAvg 0.009 0.153
> ReadRepairedBlocking 5MinAvg 0.002 0.55
> 15MinAvg 0.007 0.91
>
> Thanks
> Praveen
>
> From: Jeff Jirsa  >
> Reply-To:  >
> Date: Thursday, January 14, 2016 at 2:58 PM
> To: "user@cassandra.apache.org
> " <
> user@cassandra.apache.org
> >
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Sorry I wasn’t as explicit as I should have been
>
> The same buffer size is used by compressed reads as well, but tuned with
> compression_chunk_size table property. It’s likely true that if you lower
> compression_chunk_size, you’ll see improved read performance.
>
> This was covered in the AWS re:Invent youtube link I sent in my original
> reply.
>
>
>
> From: "Peddi, Praveen"
> Reply-To: "user@cassandra.apache.org
> "
> Date: Thursday, January 14, 2016 at 11:36 AM
> To: "user@cassandra.apache.org
> ", Zhiyan Shao
> Cc: "Agrawal, Pratik"
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> Hi,
> We will try with reduced “rar_buffer_size” to 4KB. However CASSANDRA-10249
>  says "this only
> affects users who have 1. disabled compression, 2. switched to buffered i/o
> from mmap’d”. None of this is true for us I believe. We use default disk_
> access_mode which should be mmap. We also used LZ4Compressor when created
> table.
>
> We will let you know if this property had any effect. We were testing with
> 2.1.11 and this was only fixed in 2.1.12 so we need to play with latest
> version.
>
> Praveen
>
>
>
> From: Jeff Jirsa  >
> Reply-To:  >
> Date: Thursday, January 14, 2016 at 1:29 PM
> To: Zhiyan Shao  >, "
> user@cassandra.apache.org
> " <
> user@cassandra.apache.org
> >
> Cc: "Agrawal, Pratik"  >
> Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11
>
> This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249
>  / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not
> this is really the case depends on how much of your data is in page cache,
> and whether or not you’re using mmap. Since the original question was asked
> by someone using small RAM instances, it’s possible.
>
> We mitigate this by dropping compression_chunk_size in order to force a
> smaller buffer on reads, so we don’t over read very small blocks. This has
> other side effects (lower compression ratio, more garbage during
> streaming), but significantly speeds up read workloads for us.
>
>
> From: Zhiyan Shao
> Date: Thursday, January 14, 2016 at 9:49 AM
> To: "user@cassandra.apache.org
> 

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-29 Thread Peddi, Praveen
Thanks Nate for your quick reply. We will test with different 
concurrent_compactors settings. It would save lot of time for others if 
documentation can be fixed. We spent days to come up with this setting and that 
too by chance.

As far as data folder and IO is concerned. I confirmed that data folders in 
both cases is the same hardly any reads in both cases (see below). Can you tell 
me what could trigger very high read repair numbers in 2.1.11 compared to 2.0.9 
(10 times more in 2.1.11)?

Please find tpstats and iostat for both 2.0.9 and 2.1.11:
Tpstats for 2.0.9
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 04352903 0  
   0
ReadStage 0 0   46282140 0  
   0
RequestResponseStage  0 0   12779370 0  
   0
ReadRepairStage   0 0  18719 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0  5 0  
   0
FlushWriter   0 0  91885 0  
  10
MemoryMeter   0 0  82032 0  
   0
GossipStage   0 0 457802 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  6 0  
   0
CompactionExecutor0 0 993103 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0 28 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0  5 0  
   0
MemtablePostFlusher   0 0  94496 0  
   0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Tpstats for 2.1.11
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 01113428 0  
   0
ReadStage 0 0   23496750 0  
   0
RequestResponseStage  0 0   29951269 0  
   0
ReadRepairStage   0 03848733 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0  4 0  
   0
GossipStage   0 0 182727 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  89820 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0 10 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0  6 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0  38222 0  
   0
MemtablePostFlush 0 0  39814 0  
   0
MemtableReclaimMemory 0 0  38222 0  
   0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

IOSTAT for 2.1.11
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  21.21 

Re: Session timeout

2016-01-29 Thread oleg yusim
Alex,

No offense are taken, your question is absolutely legit. As we used to joke
in security world "putting on my black hat"/"putting on my white hat" -
i.e. same set of questions I would be asking for hacking and protecting the
product. So, I commend you for being careful here.

Now, at that particular case I'm acting with my "white hat on". :) I'm
hired by VMware, to help them improve security posture for their new
products (vRealize package). I do that as part of the security team on
VMware side, and working in conjunction with DISA (
http://iase.disa.mil/stigs/Pages/a-z.aspx) we are creating STIGs (I
explained this term in details in this same thread above, in my response to
Jon, so I wouldn't repeat myself here) for all the components vRealize
suite of products has, including Cassandra, which is used in one of the
products. This STIGs would be handed over to DISA, reviewed by their SMEs
and published on their website, creating great opportunity for all the
products covered to improve their security posture and advance on a market
for free.

For VMware purposes, we would harden our suite of products, based on STIGs,
and create own overall Security Guideline, riding on top of STIGs.

As I mentioned above, for both Cassandra and DSE, equally, this document
would be very beneficial, since it would enable customers and help them to
run hardening on the product and place it right on the system, surrounded
by the correct set of compensation controls.

Thanks,

Oleg

On Fri, Jan 29, 2016 at 1:10 PM, Alex Popescu  wrote:

>
> On Fri, Jan 29, 2016 at 8:17 AM, oleg yusim  wrote:
>
>> Thanks for encouraging me, I kind of grew a bit desperate. I'm security
>> person, not a Cassandra expert, and doing security assessment of Cassandra
>> DB, I have to rely on community heavily. I will put together a composed
>> version of all my previous queries, will title it "Security assessment
>> questions" and will post it once again.
>
>
> Oleg,
>
> I'll apologize in advance if my answer will sound initially harsh. I've
> been following your questions (mostly because I find them interesting), but
> I've never jumped to answer any of them as I confess not knowing the
> purpose of your research/report makes me caution (e.g. are you doing this
> for your current employer evaluating the future use of the product? are you
> doing this for an analyst company? are you planning to sell this report?
> etc. etc).
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>
>


Re: Cassandra driver class

2016-01-29 Thread Corry Opdenakker
What about this cassandra specific howto explained in a recent jboss doc?
https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources?_sscc=t

Im also searching for the real recommended way of connecting to a cassandra
db from a jee server, but I didnt found any standard documented solution
yet. was a bit surprised that there is not any standard jca/resource
archive solution foreseen while Cassandra itself is java based. Maybe I
overlooked the info somewhere?

Dbcp could help for a large part, but of course one requires a fully
reliable production ready solution.
https://commons.apache.org/proper/commons-dbcp/

Currently I would go for a standard conn pool at app level as is described
in the cassandra java driver pdf, knowing that middleware admins don't like
that nonstandard jee approach.





Op vrijdag 29 januari 2016 heeft Alex Popescu  het
volgende geschreven:

> I think both of those options expect a JDBC driver, while the DataStax
> Java driver is not one.
>
> As a side note, if you'd provide a more detailed description of the setup
> you want to get and post it to the Java driver mailing list
> https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
> chances of getting an answer will be higher.
>
> On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  > wrote:
>
>> I’m just getting started with Cassandra, and am trying to integrate it
>> with JBoss.  I’m configuring the standalone-ha-full.xml file, but don’t
>> know what to use for the driver class.  For example, I have this:
>>
>>
>>
>> 
>>
>> 
>>
>> com.datastax.driver.core.
>>
>> 
>>
>> 
>>
>>
>>
>> What do I replace “” with?
>>
>>
>>
>> Is “com.datastax.driver.core” even correct, or am I going down the wrong
>> path?  I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.
>>
>>
>>
>> Should I be using  instead of ?
>>
>>
>>
>> Does anybody have a working example they can share?  Any help to get me
>> going would be appreciated.  Thanks.
>>
>>
>>
>> Bill
>>
>>
>>
>>
>>
>
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>
>

-- 
--
Bestdata.be
Optimised ict
Tel:+32(0)496609576
co...@bestdata.be
--


Re: Cassandra driver class

2016-01-29 Thread Jack Krupansky
Unfortunately, somebody is likely going to need to educate us in the
Cassandra community as to what a JBOSS VDB and TEIID really are. For now,
our response will probably end up being that you should use the Java Driver
for Cassandra, bypassing any JBOSS/VDB/TEIID support, for now. That TEIID
link above may shed some light. Otherwise, you'll probably have to ping the
TEIID community as far as how to configure JBOSS/TEIID. We're here to
answer questions about Cassandra itself.

-- Jack Krupansky

On Fri, Jan 29, 2016 at 3:42 PM, Corry Opdenakker  wrote:

> What about this cassandra specific howto explained in a recent jboss doc?
> https://docs.jboss.org/author/display/TEIID/Cassandra+Data+Sources?_sscc=t
>
> Im also searching for the real recommended way of connecting to a
> cassandra db from a jee server, but I didnt found any standard documented
> solution yet. was a bit surprised that there is not any standard
> jca/resource archive solution foreseen while Cassandra itself is java
> based. Maybe I overlooked the info somewhere?
>
> Dbcp could help for a large part, but of course one requires a fully
> reliable production ready solution.
> https://commons.apache.org/proper/commons-dbcp/
>
> Currently I would go for a standard conn pool at app level as is described
> in the cassandra java driver pdf, knowing that middleware admins don't like
> that nonstandard jee approach.
>
>
>
>
>
>
> Op vrijdag 29 januari 2016 heeft Alex Popescu  het
> volgende geschreven:
>
>> I think both of those options expect a JDBC driver, while the DataStax
>> Java driver is not one.
>>
>> As a side note, if you'd provide a more detailed description of the setup
>> you want to get and post it to the Java driver mailing list
>> https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
>> chances of getting an answer will be higher.
>>
>> On Fri, Jan 29, 2016 at 9:56 AM, KAMM, BILL  wrote:
>>
>>> I’m just getting started with Cassandra, and am trying to integrate it
>>> with JBoss.  I’m configuring the standalone-ha-full.xml file, but don’t
>>> know what to use for the driver class.  For example, I have this:
>>>
>>>
>>>
>>> 
>>>
>>> 
>>>
>>> com.datastax.driver.core.
>>>
>>> 
>>>
>>> 
>>>
>>>
>>>
>>> What do I replace “” with?
>>>
>>>
>>>
>>> Is “com.datastax.driver.core” even correct, or am I going down the wrong
>>> path?  I am using the DataStax 2.0.2 driver, with Cassandra 2.0.8.
>>>
>>>
>>>
>>> Should I be using  instead of ?
>>>
>>>
>>>
>>> Does anybody have a working example they can share?  Any help to get me
>>> going would be appreciated.  Thanks.
>>>
>>>
>>>
>>> Bill
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Bests,
>>
>> Alex Popescu | @al3xandru
>> Sen. Product Manager @ DataStax
>>
>>
>
> --
> --
> Bestdata.be
> Optimised ict
> Tel:+32(0)496609576
> co...@bestdata.be
> --
>
>