Re: Better way to define UDT's in Cassandra

2020-07-02 Thread Check Peck
Following up again on this. Any thoughts on this?


Re: Better way to define UDT's in Cassandra

2020-06-30 Thread Check Peck
Does anyone have any thoughts on this?

On Tue, Jun 30, 2020 at 10:42 AM Check Peck  wrote:

> We are trying to remove two columns in a table with 3 and make them UDT
> instead of having them as columns. So we came up with two options below. I
> wanted to understand if there is any difference between these two UDT in
> the Cassandra database?
>
>
> *One option is:*
>
>> CREATE TYPE test_type (
>> cid int,
>> type text,
>> hid int
>> );
>
>
> and then using like this in a table definition
>
> test_types set>,
>
>
> vs
>
> *Second option is:*
>
> CREATE TYPE test_type (
>> type text,
>> hid int
>> );
>
>
> and then using like this in a table definition
>
> test_types map
>
>
> So just curious which one is the preferred option here for performance
> related or they both are the same?
>


Better way to define UDT's in Cassandra

2020-06-30 Thread Check Peck
We are trying to remove two columns in a table with 3 and make them UDT
instead of having them as columns. So we came up with two options below. I
wanted to understand if there is any difference between these two UDT in
the Cassandra database?


*One option is:*

> CREATE TYPE test_type (
> cid int,
> type text,
> hid int
> );


and then using like this in a table definition

test_types set>,


vs

*Second option is:*

CREATE TYPE test_type (
> type text,
> hid int
> );


and then using like this in a table definition

test_types map


So just curious which one is the preferred option here for performance
related or they both are the same?


What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Check Peck
I have a scylla table as shown below:


cqlsh:sampleks> describe table test;


CREATE TABLE test (

client_id int,

when timestamp,

process_ids list,

md text,

PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when DESC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}

AND comment = ''

AND compaction = {'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}

AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 172800

AND max_index_interval = 1024

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';


And I see this is how we are querying it. It's been a long time I worked on
cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like
recently added). Can someone explain what does this do with some example in
a layman language? I couldn't find any good doc on that which explains
easily.


SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;


Re: CQL datatype for long?

2016-12-07 Thread Check Peck
And then from datastax java driver, I can use. Am I right?

To Read:
row.getLong();

To write
boundStatement.setLong()


On Wed, Dec 7, 2016 at 6:50 PM, Varun Barala 
wrote:

>  use `bigint` for long.
>
>
> Regards,
> Varun Barala
>
> On Thu, Dec 8, 2016 at 10:32 AM, Check Peck 
> wrote:
>
>> What is the CQL data type I should use for long? I have to create a
>> column with long data type. Cassandra version is 2.0.10.
>>
>> CREATE TABLE storage (
>>   key text,
>>   clientid int,
>>   deviceid long, // this is wrong I guess as I don't see long in CQL?
>>   PRIMARY KEY (topic, partition)
>> );
>>
>> I need to have "deviceid" as long data type. Bcoz I am getting deviceid
>> as long and that's how I want to store it.
>>
>
>


CQL datatype for long?

2016-12-07 Thread Check Peck
What is the CQL data type I should use for long? I have to create a column
with long data type. Cassandra version is 2.0.10.

CREATE TABLE storage (
  key text,
  clientid int,
  deviceid long, // this is wrong I guess as I don't see long in CQL?
  PRIMARY KEY (topic, partition)
);

I need to have "deviceid" as long data type. Bcoz I am getting deviceid as
long and that's how I want to store it.


Count Number of Users in Cassandra column family?

2015-05-13 Thread Check Peck
I have a table like this in Cassandra-

CREATE TABLE DATA_HOLDER (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE
BLOB, PRIMARY KEY (USER_ID, RECORD_NAME));

I want to count distinct USER_ID in my above table? Is there any way I can
do that?

My Cassandra version is:

[cqlsh 4.1.1 | Cassandra 2.0.10.71 | DSE 4.5.2 | CQL spec 3.1.1 |
Thrift protocol 19.39.0]


Re: How to extract all the user id from a single table in Cassandra?

2015-03-01 Thread Check Peck
Sending again as I didn't got any response on this.

Any thoughts?

On Fri, Feb 27, 2015 at 8:24 PM, Check Peck  wrote:

> I have a Cassandra table like this -
>
> create table user_record (user_id text, record_name text, record_value
> blob, primary key (user_id, record_name));
>
> What is the best way to extract all the user_id from this table? As of
> now, I cannot change my data model to do this exercise so I need to find a
> way by which I can extract all the user_id from the above table.
>
> I am using Datastax Java driver in my project. Is there any other easy way
> apart from code to extract all the user_id from the above table through
> come cqlsh utility and dump it into some file?
>
> I am thinking below code might timed out after some time -
>
> public class TestCassandra {
>
> private Session session = null;
> private Cluster cluster = null;
>
> private static class ConnectionHolder {
> static final TestCassandra connection = new
> TestCassandra();
> }
>
> public static TestCassandra getInstance() {
> return ConnectionHolder.connection;
> }
>
> private TestCassandra() {
> Builder builder = Cluster.builder();
> builder.addContactPoints("127.0.0.1");
>
> PoolingOptions opts = new PoolingOptions();
> opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
> opts.getCoreConnectionsPerHost(HostDistance.LOCAL));
>
> cluster =
> builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
> .withLoadBalancingPolicy(new TokenAwarePolicy(new
> DCAwareRoundRobinPolicy("PI")))
> .withReconnectionPolicy(new
> ConstantReconnectionPolicy(100L))
> .build();
> session = cluster.connect();
> }
>
> private Set getRandomUsers() {
> Set userList = new HashSet();
>
> String sql = "select user_id from testkeyspace.user_record;";
>
> try {
> SimpleStatement query = new SimpleStatement(sql);
> query.setConsistencyLevel(ConsistencyLevel.ONE);
> ResultSet res = session.execute(query);
>
> Iterator rows = res.iterator();
> while (rows.hasNext()) {
> Row r = rows.next();
>
> String user_id = r.getString("user_id");
> userList.add(user_id);
> }
> } catch (Exception e) {
> System.out.println("error= " + e);
> }
>
> return userList;
> }
> }
>
> Adding java-driver group and Cassandra group as well to see whether there
> is any better way to execute this?
>


How to extract all the user id from a single table in Cassandra?

2015-02-27 Thread Check Peck
I have a Cassandra table like this -

create table user_record (user_id text, record_name text, record_value
blob, primary key (user_id, record_name));

What is the best way to extract all the user_id from this table? As of now,
I cannot change my data model to do this exercise so I need to find a way
by which I can extract all the user_id from the above table.

I am using Datastax Java driver in my project. Is there any other easy way
apart from code to extract all the user_id from the above table through
come cqlsh utility and dump it into some file?

I am thinking below code might timed out after some time -

public class TestCassandra {

private Session session = null;
private Cluster cluster = null;

private static class ConnectionHolder {
static final TestCassandra connection = new TestCassandra();
}

public static TestCassandra getInstance() {
return ConnectionHolder.connection;
}

private TestCassandra() {
Builder builder = Cluster.builder();
builder.addContactPoints("127.0.0.1");

PoolingOptions opts = new PoolingOptions();
opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
opts.getCoreConnectionsPerHost(HostDistance.LOCAL));

cluster =
builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
.withLoadBalancingPolicy(new TokenAwarePolicy(new
DCAwareRoundRobinPolicy("PI")))
.withReconnectionPolicy(new
ConstantReconnectionPolicy(100L))
.build();
session = cluster.connect();
}

private Set getRandomUsers() {
Set userList = new HashSet();

String sql = "select user_id from testkeyspace.user_record;";

try {
SimpleStatement query = new SimpleStatement(sql);
query.setConsistencyLevel(ConsistencyLevel.ONE);
ResultSet res = session.execute(query);

Iterator rows = res.iterator();
while (rows.hasNext()) {
Row r = rows.next();

String user_id = r.getString("user_id");
userList.add(user_id);
}
} catch (Exception e) {
System.out.println("error= " + e);
}

return userList;
}
}

Adding java-driver group and Cassandra group as well to see whether there
is any better way to execute this?


designing table

2015-02-19 Thread Check Peck
I am trying to design a table in Cassandra in which I will have multiple
JSON String for a particular client id.

abc123 -   jsonA
abc123 -   jsonB
abcd12345   -   jsonC
My query pattern is going to be -

Give me all JSON String for a particular client id.
Give me all the client id's and json strings for a particular date.

What is the best way to design table for this?


Re: How to get data which has changed within x minutes using CQL?

2014-09-23 Thread Check Peck
On Tue, Sep 23, 2014 at 3:41 PM, DuyHai Doan  wrote:

> now - 15 mins



Can I run like this in CQL using cqlsh?

SELECT * FROM client_data WHERE client_id = 1 and last_modified_date >= now
- 15 mins

When I ran the above query I got an error on my cql client -

Bad Request: line 1:81 no viable alternative at input '-'


Re: How to get data which has changed within x minutes using CQL?

2014-09-23 Thread Check Peck
Yes I can provide client_id in my where clause. So now my query pattern
will be -

Give me everything for what has changed within last 15 minutes or 5 minutes
whose client_id is equal to 1?

How does my query will look like then?


On Tue, Sep 23, 2014 at 3:26 PM, DuyHai Doan  wrote:

> It is possible to request a "range" of data according to the
> last_modified_date but you still need to provide the client_id , the
> partition key, in any case
>
>
> On Wed, Sep 24, 2014 at 12:23 AM, Check Peck 
> wrote:
>
>> I have a table structure like below -
>>
>> CREATE TABLE client_data (
>>   client_id int,
>>   consumer_id text,
>>   last_modified_date timestamp,
>>   PRIMARY KEY (client_id, last_modified_date, consumer_id)
>> )
>>
>> I have a query pattern like this - Give me everything for what has
>> changed withing last 15 minutes or 5 minutes? Is this possible to in CQL
>> with the above tables?
>>
>
>


How to get data which has changed within x minutes using CQL?

2014-09-23 Thread Check Peck
I have a table structure like below -

CREATE TABLE client_data (
  client_id int,
  consumer_id text,
  last_modified_date timestamp,
  PRIMARY KEY (client_id, last_modified_date, consumer_id)
)

I have a query pattern like this - Give me everything for what has changed
withing last 15 minutes or 5 minutes? Is this possible to in CQL with the
above tables?


Re: Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
@DuyHai - I have put that because of this condition -

In this table, we can have multiple record_data for same client_name.

It can be multiple combinations of client_name and record_data for each
distinct test_id.


On Fri, Sep 19, 2014 at 8:48 AM, DuyHai Doan  wrote:

> "Does my above table falls under the category of wide rows in Cassandra
> or not?" --> It depends on the cardinality. For each distinct test_id, how
> many combinations of client_name/record_data do you have ?
>
>  By the way, why do you put the record_data as part of primary key ?
>
> In your table partiton key = test_id, client_name = first clustering
> column, record_data = second clustering column
>
>
> On Fri, Sep 19, 2014 at 5:41 PM, Check Peck 
> wrote:
>
>> I am trying to use wide rows concept in my data modelling design for
>> Cassandra. We are using Cassandra 2.0.6.
>>
>> CREATE TABLE test_data (
>>   test_id int,
>>   client_name text,
>>   record_data text,
>>   creation_date timestamp,
>>   last_modified_date timestamp,
>>   PRIMARY KEY (test_id, client_name, record_data)
>> )
>>
>> So I came up with above table design. Does my above table falls under the
>> category of wide rows in Cassandra or not?
>>
>> And is there any problem If I have three columns in my  PRIMARY KEY? I
>> guess PARTITION KEY will be test_id right? And what about other two?
>>
>> In this table, we can have multiple record_data for same client_name.
>>
>> Query Pattern will be -
>>
>> select client_name, record_data from test_data where test_id = 1;
>>
>
>


Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
I am trying to use wide rows concept in my data modelling design for
Cassandra. We are using Cassandra 2.0.6.

CREATE TABLE test_data (
  test_id int,
  client_name text,
  record_data text,
  creation_date timestamp,
  last_modified_date timestamp,
  PRIMARY KEY (test_id, client_name, record_data)
)

So I came up with above table design. Does my above table falls under the
category of wide rows in Cassandra or not?

And is there any problem If I have three columns in my  PRIMARY KEY? I
guess PARTITION KEY will be test_id right? And what about other two?

In this table, we can have multiple record_data for same client_name.

Query Pattern will be -

select client_name, record_data from test_data where test_id = 1;


Why column with timestamp datatype come in different format?

2014-09-18 Thread Check Peck
I have a Cassandra cluster version as -

cqlsh:dataks> show version;
[cqlsh 2.3.0 | Cassandra 2.0.6 | CQL spec 3.0.0 | Thrift protocol
19.39.0]

And I have a table like this -

CREATE TABLE data_test (
  valid_id int,
  data_id text,
  client_name text,
  creation_date timestamp,
  last_modified_date timestamp,
  PRIMARY KEY (valid_id, data_id)
)

I am inserting the data like this in my above table -

insert into data_test (valid_id, data_id, client_name, creation_date,
last_modified_date) values (1, 'b4b61aa', 'TTLAP', dateOf(now()),
dateOf(now()));

After I do a select on my table, I am seeing creation_date and
last_modified_date coming in some other format, not sure why?

 valid_id | data_id | client_name | creation_date  |
last_modified_date

--+
  1   | b4b61aa | TTLAP   | \x00\x00\x01H\x89\xf0\xb6I |
\x00\x00\x01H\x89\xf0\xb6I

Does anyone know why creation_date and last_modified_date is coming like
this and how we can get actual timestamp in those columns?


Re: Cassandra Data Model design

2014-09-17 Thread Check Peck
It takes around more than 50 seconds to return back 500 records from cqlsh
command not from the code so that's why I am saying it is pretty slow.

On Wed, Sep 17, 2014 at 3:17 PM, Hao Cheng  wrote:

> How slow is slow? Regardless of the data model question, in my experience
> 500 rows of relatively light content should be lightning fast. Looking at
> my performance results on a test cluster of 3x r3.large AWS instances, we
> reach an op rate on Cassandra's stress test of at least 1000 operations per
> second and on average 7500 operations for second over the stress test data
> set.
>
> More broadly, it seems like you would benefit from either deltas (only
> retrieve new data) or something like paging (only retrieve currently
> relevant data), although its really difficult to say without more
> information.
>
> On Wed, Sep 17, 2014 at 1:01 PM, Check Peck 
> wrote:
>
>> I have recently started working with Cassandra. We have cassandra cluster
>> which is using DSE 4.0 version and has VNODES enabled. We have a tables
>> like this -
>>
>> Below is my first table -
>>
>> CREATE TABLE customers (
>>   customer_id int PRIMARY KEY,
>>   last_modified_date timeuuid,
>>   customer_value text
>> )
>>
>> Read query pattern is like this on above table as of now since we need to
>> get everything from above table and load it into our application memory
>> every x minutes.
>>
>> select customer_id, customer_value from datakeyspace.customers;
>>
>> We have second table like this -
>>
>> CREATE TABLE client_data (
>>   client_name text PRIMARY KEY,
>>   client_id text,
>>   creation_date timestamp,
>>   is_valid int,
>>   last_modified_date timestamp
>> )
>>
>> Right now in the above table, we have 500 records and all those records
>> has "is_valid" column value set as 1. And the read query pattern is like
>> this on above table as of now since we need to get everything from above
>> table and load it into our application memory every x minutes so the below
>> query will return me all 500 records since everything has is_valid set to 1.
>>
>> select client_name, client_id from  datakeyspace.client_data where
>> is_valid=1;
>>
>> Since our cluster is VNODES enabled so my above query pattern is not
>> efficient at all and it is taking lot of time to get the data from
>> Cassandra. We are reading from these table with consistency level QUORUM.
>>
>> Is there any possibility of improving our data model?
>>
>> Any suggestions will be greatly appreciated.
>>
>
>


Cassandra Data Model design

2014-09-17 Thread Check Peck
I have recently started working with Cassandra. We have cassandra cluster
which is using DSE 4.0 version and has VNODES enabled. We have a tables
like this -

Below is my first table -

CREATE TABLE customers (
  customer_id int PRIMARY KEY,
  last_modified_date timeuuid,
  customer_value text
)

Read query pattern is like this on above table as of now since we need to
get everything from above table and load it into our application memory
every x minutes.

select customer_id, customer_value from datakeyspace.customers;

We have second table like this -

CREATE TABLE client_data (
  client_name text PRIMARY KEY,
  client_id text,
  creation_date timestamp,
  is_valid int,
  last_modified_date timestamp
)

Right now in the above table, we have 500 records and all those records has
"is_valid" column value set as 1. And the read query pattern is like this
on above table as of now since we need to get everything from above table
and load it into our application memory every x minutes so the below query
will return me all 500 records since everything has is_valid set to 1.

select client_name, client_id from  datakeyspace.client_data where
is_valid=1;

Since our cluster is VNODES enabled so my above query pattern is not
efficient at all and it is taking lot of time to get the data from
Cassandra. We are reading from these table with consistency level QUORUM.

Is there any possibility of improving our data model?

Any suggestions will be greatly appreciated.


Cassandra Consistency Level

2014-08-19 Thread Check Peck
We have cassandra cluster in three different datacenters (DC1, DC2 and DC3)
and we have 10 machines in each datacenter. We have few tables in cassandra
in which we have less than 100 records.

What we are seeing - some tables are out of sync between machines in DC3 as
compared to DC1 or DC2 when we do select count(*) on it.

As an example we did select count(*) while connecting to one cassandra
machine in dc3 datacenter as compared to one cassandra machine in dc1
datacenter and the results were different.

root@machineA:/home/david/apache-cassandra/bin# python cqlsh
dc3114.dc3.host.com
Connected to TestCluster at dc3114.dc3.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol
19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;

count
---
12

cqlsh:testingkeyspace> exit
root@machineA:/home/david/apache-cassandra/bin# python cqlsh
dc18b0c.dc1.host.com
Connected to TestCluster at dc18b0c.dc1.host.com:9160.
[cqlsh 2.3.0 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol
19.36.0]
Use HELP for help.
cqlsh> use testingkeyspace ;
cqlsh:testingkeyspace> select count(*) from test_metadata ;

count
---
16

What could be the reason for this sync issue? Can anyone shed  some light
on this?

Since our java driver code and datastax c++ driver code are using these
tables with CONSISTENCY LEVEL ONE.


Cassandra is making cross datacenter call internally to get the data from different datacenters?

2014-06-27 Thread Check Peck
I have our application code deployed in two data centers, DC1 and DC2 and
Cassandra nodes are also in DC1 and DC2 making a single cluster.

Our application servers in DC1 are communicating to DC1 Cassandra nodes
which I verified with "netstat -a  | grep 9042"

But somehow internally DC1 Cassandra nodes are trying to get the data from
DC2 Cassandra nodes and because of that calls are timing out at the
application servers.

We are running DSE4.0 Cassandra version and our app servers are using
Datastax cpp driver.

What can be the issue and what are the things I can try out to see what's
happening.


How to replace cluster name without any impact?

2014-04-09 Thread Check Peck
We have around 36 node Cassandra cluster and we have three Datacenters.
Each datacenter have 12 node.

We already have data flowing in Cassandra now and we cannot wipe out all
our data now.

Considering this - what is the right way to rename the cluster name without
any or minimal impact?


Re: Securing Cassandra database

2014-04-04 Thread Check Peck
Just to add, nobody should be able to read and write into our Cassandra
database through any API *or any CQL client as well *only our team should
be able to do that.


On Fri, Apr 4, 2014 at 11:29 PM, Check Peck  wrote:

> Thanks Mark. But what about Cassandra database? I don't want anybody to
> read and write into our Cassandra database through any API only just our
> team should be able to do that.
>
> We are using CQL based tables so data doesn't get shown on the OPSCENTER.
>
> In our case, we would like to secure database itself. Is this possible to
> do as well anyhow?
>
>
>
>
>
> On Fri, Apr 4, 2014 at 11:24 PM, Mark Reddy wrote:
>
>> Hi,
>>
>> If you want to just secure OpsCenter itself take a look here:
>> http://www.datastax.com/documentation/opscenter/4.1/opsc/configure/opscAssigningAccessRoles_t.html
>>
>>
>> If you want to enable internal authentication and still allow OpsCenter
>> access, you can create an OpsCenter user and once you have auth turned
>> within the cluster update the cluster config with the user name and
>> password for the OpsCenter user.
>>
>> Depending on your installation type you will find the cluster config in
>> one of the following locations:
>> Packaged installs: /etc/opscenter/clusters/.conf
>> Binary installs: /conf/clusters/.conf
>> Windows installs: Program Files (x86)\DataStax
>> Community\opscenter\conf\clusters\.conf
>>
>> Open the file and update the username and password values under the
>> [cassandra] section:
>>
>> [cassandra]
>> username =
>> seed_hosts =
>> api_port =
>> password =
>>
>> After changing properties in this file, restart OpsCenter for the changes
>> to take effect.
>>
>>
>> Mark
>>
>>
>> On Sat, Apr 5, 2014 at 6:54 AM, Check Peck wrote:
>>
>>> Hi All,
>>>
>>> We would like to secure our Cassandra database. We don't want anybody to
>>> read/write on our Cassandra database leaving our team members only.
>>>
>>>
>>>
>>> We are using Cassandra 1.2.9 in Production and we have 36 node Cassandra
>>> cluster. 12 in each colo as we have three datacenters.
>>>
>>>
>>> But we would like to have OPSCENTER working as it is working currently.
>>>
>>>
>>>
>>> Is this possible to do anyhow? Is there any settings in yaml file which
>>> we can enforce?
>>>
>>>
>>>
>>>
>>
>>
>


Re: Securing Cassandra database

2014-04-04 Thread Check Peck
Thanks Mark. But what about Cassandra database? I don't want anybody to
read and write into our Cassandra database through any API only just our
team should be able to do that.

We are using CQL based tables so data doesn't get shown on the OPSCENTER.

In our case, we would like to secure database itself. Is this possible to
do as well anyhow?




On Fri, Apr 4, 2014 at 11:24 PM, Mark Reddy  wrote:

> Hi,
>
> If you want to just secure OpsCenter itself take a look here:
> http://www.datastax.com/documentation/opscenter/4.1/opsc/configure/opscAssigningAccessRoles_t.html
>
>
> If you want to enable internal authentication and still allow OpsCenter
> access, you can create an OpsCenter user and once you have auth turned
> within the cluster update the cluster config with the user name and
> password for the OpsCenter user.
>
> Depending on your installation type you will find the cluster config in
> one of the following locations:
> Packaged installs: /etc/opscenter/clusters/.conf
> Binary installs: /conf/clusters/.conf
> Windows installs: Program Files (x86)\DataStax
> Community\opscenter\conf\clusters\.conf
>
> Open the file and update the username and password values under the
> [cassandra] section:
>
> [cassandra]
> username =
> seed_hosts =
> api_port =
> password =
>
> After changing properties in this file, restart OpsCenter for the changes
> to take effect.
>
>
> Mark
>
>
> On Sat, Apr 5, 2014 at 6:54 AM, Check Peck wrote:
>
>> Hi All,
>>
>> We would like to secure our Cassandra database. We don't want anybody to
>> read/write on our Cassandra database leaving our team members only.
>>
>>
>>
>> We are using Cassandra 1.2.9 in Production and we have 36 node Cassandra
>> cluster. 12 in each colo as we have three datacenters.
>>
>>
>> But we would like to have OPSCENTER working as it is working currently.
>>
>>
>>
>> Is this possible to do anyhow? Is there any settings in yaml file which
>> we can enforce?
>>
>>
>>
>>
>
>


Securing Cassandra database

2014-04-04 Thread Check Peck
Hi All,

We would like to secure our Cassandra database. We don't want anybody to
read/write on our Cassandra database leaving our team members only.



We are using Cassandra 1.2.9 in Production and we have 36 node Cassandra
cluster. 12 in each colo as we have three datacenters.


But we would like to have OPSCENTER working as it is working currently.



Is this possible to do anyhow? Is there any settings in yaml file which we
can enforce?





*Raihan Jamal*


Re: Datastax C++ driver on Windows x64

2014-03-04 Thread Check Peck
Hi Guys,

I have couple of question on Datastax C++ driver.. Not related to this
particular post as nobody is replying to my original email thread.. And in
this email thread I saw people talking about Datastax C++ driver.

Not sure whether you might be able to help me or not but trying my luck -

We have 36 nodes Cassandra cluster. 12 nodes in DC1, 12 nodes in DC2, 12
nodes in DC3 datacenters.

And our application code is also in three datacenters- 11 node in DC1, 11
node in DC2, 11 node in DC3 datacenter.

So my question is if the application call is coming from DC1 datacenter,
then will it go to DC1 Cassandra nodes automatically with the use of cpp
driver? And same with DC2 and DC3?

Or we need to add some config changes in our C++ code while making
connection to cassandra which will then make sure if the call is coming
from DC1 datacenter then it will go to DC1 Cassandra nodes?

If there is any config change which we need to add in our C++ code, then
can you please point me to that?


On Tue, Mar 4, 2014 at 3:01 PM, Green, John M (HP Education) <
john.gr...@hp.com> wrote:

> Thanks Michael.This is the "ray of hope" I desperately needed.  I'll
> let you know how it goes.
>
> -Original Message-
> From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael
> Shuler
> Sent: Tuesday, March 04, 2014 2:58 PM
> To: user@cassandra.apache.org
> Subject: Re: Datastax C++ driver on Windows x64
>
> On 03/04/2014 04:30 PM, Michael Shuler wrote:
> > On 03/04/2014 04:22 PM, Michael Shuler wrote:
> >> On 03/04/2014 04:12 PM, Dwight Smith wrote:
> >>> Second that question
> >>>
> >>> *From:*Green, John M (HP Education) [mailto:john.gr...@hp.com]
> >>> *Sent:* Tuesday, March 04, 2014 2:03 PM
> >>> *To:* user@cassandra.apache.org
> >>> *Subject:* Datastax C++ driver on Windows x64
> >>>
> >>> Has anyone successfully built the Datastax C++ driver for a Windows
> >>> 64-bit platform?
> >>>
> >>> While I've made some progress I'm still not there and wondering if I
> >>> should give-up and use a local socket to another process (e.g., JVM or
> >>> .NET runtime) instead.I'd prefer to use C++ because that's what the
> >>> rest of the application is using.However, my C++ and makefile
> >>> experience is very dated and I've never used cmake before.Still I'd
> >>> be very interested to know if anyone had success using the C++
> >>> driver on Windows x64.
> >>
> >> http://cassci.datastax.com/job/y_cpp_driver_win32/lastBuild/consoleFu
> >> ll
> >>
> >> Please, let me know, and I'll dig for some further details, if this
> >> doesn't fully help.  I did not set this particular job up, but
> >> jenkins runs the following batch script after git pull:
> >>
> >> 
> >> @echo off
> >> cd C:\jenkins\workspace
> >> mkdir y_cpp_driver_win32\bin
> >> copy CMakeCache.txt y_cpp_driver_win32\bin cd y_cpp_driver_win32\bin
> >> cmake .
> >> msbuild ALL_BUILD.vcxproj
> >> msbuild UNINSTALL.vcxproj
> >> msbuild INSTALL.vcxproj
> >> 
> >
> > I may have replied a bit too quickly - it does look like this is using
> > all 32-bit libs in the includes, even though it's built on a 64-bit
> > machine.
> >
> > You might be able to touch base with the developers on the freenode
> > #datastax-drivers channel.
> >
>
> I uploaded the CMakeCache.txt that is being copied over so you could peek
> at it, too.
>
> http://cassci.datastax.com/userContent/y_cpp_driver_win32-config/
>
> --
> Michael
>


Re: Cassandra cpp driver call to local cassandra colo

2014-03-04 Thread Check Peck
I guess you are not right.. Cluster.builder.addContactPoint(...) will add
nodes in the connection pool.. And it will discover all the other nodes in
the connection pool automatically.. To filter out nodes only for local colo
we need to use to different settings in Java driver..

There should be similar stuff in cpp driver as well..


On Tue, Mar 4, 2014 at 8:20 AM, Manoj Khangaonkar wrote:

> Hi ,
>
> Your client/application will connect to one of the nodes from the nodes
> you tell it to connect. In the java driver this is done by calling
> Cluster.builder.addContactPoint(...). I suppose the C++ driver will have
> similar class method. For the app in DC1 provide only nodes in DC1 as
> contact points.
>
> regards
>
>
> On Tue, Mar 4, 2014 at 6:47 AM, Check Peck wrote:
>
>> I have couple of question on Datastax C++ driver.
>>
>> We have 36 nodes Cassandra cluster. 12 nodes in DC1, 12 nodes in DC2, 12
>> nodes in DC3 datacenters.
>>
>> And our application code is also in three datacenters- 11 node in DC1, 11
>> node in DC2, 11 node in DC3 datacenter.
>>
>> So my question is if the application call is coming from DC1 datacenter,
>> then will it go to DC1 Cassandra nodes automatically with the use of cpp
>> driver? And same with DC2 and DC3?
>>
>> Or we need to add some config changes in our C++ code while making
>> connection to cassandra which will then make sure if the call is coming
>> from DC1 datacenter then it will go to DC1 Cassandra nodes?
>>
>> If there is any config change which we need to add in our C++ code, then
>> can you please point me to that?
>>
>>
>
>
> --
> http://khangaonkar.blogspot.com/
>


Cassandra cpp driver call to local cassandra colo

2014-03-04 Thread Check Peck
I have couple of question on Datastax C++ driver.

We have 36 nodes Cassandra cluster. 12 nodes in DC1, 12 nodes in DC2, 12
nodes in DC3 datacenters.

And our application code is also in three datacenters- 11 node in DC1, 11
node in DC2, 11 node in DC3 datacenter.

So my question is if the application call is coming from DC1 datacenter,
then will it go to DC1 Cassandra nodes automatically with the use of cpp
driver? And same with DC2 and DC3?

Or we need to add some config changes in our C++ code while making
connection to cassandra which will then make sure if the call is coming
from DC1 datacenter then it will go to DC1 Cassandra nodes?

If there is any config change which we need to add in our C++ code, then
can you please point me to that?


How to retrieve snappy compressed data from Cassandra using Datastax?

2014-01-28 Thread Check Peck
I am working on a project in which I am supposed to store the snappy
compressed data in Cassandra, so that when I retrieve the same data from
Cassandra, it should be snappy compressed in memory and then I will
decompress that data using snappy to get the actual data from it.

I am having a byte array in `bytesToStore` variable, then I am snappy
compressing it using google `Snappy` and stored it back into Cassandra -

// .. some code here
System.out.println(bytesToStore);

byte[] compressed = Snappy.compress(bytesToStore);

attributesMap.put("e1", compressed);

ICassandraClient client = CassandraFactory.getInstance().getDao();
// write to Cassandra
client.upsertAttributes("0123", attributesMap, "sample_table");

After inserting the data in Cassandra, I went back into CQL mode and I
queried it and I can see this data in my table for the test_id `0123`-

cqlsh:testingks> select * from sample_table where test_id = '0123';

 test_id | name | value

-+-+
0123 |   e1 |
0x2cac7fff012c4ebb9555001e42797465204172726179205465737420466f722042696720456e6469616e


Now I am trying to read the same data back from Cassandra and everytime it
is giving me `IllegalArgumentException` -

public Map getDataFromCassandra(final String rowKey,
final Collection attributeNames) {

Map dataFromCassandra = new
ConcurrentHashMap();

try {
String query="SELECT test_id, name, value from sample_table
where test_id = '"+rowKey+ "';";
//SELECT test_id, name, value from sample_table where test_id =
'0123';
System.out.println(query);

DatastaxConnection.getInstance();

ResultSet result =
DatastaxConnection.getSession().execute(query);

Iterator it = result.iterator();

while (it.hasNext()) {
Row r = it.next();
for(String str : attributeNames) {
ByteBuffer bb = r.getBytes(str); // this line is
throwing an exception for me
byte[] ba=new byte[bb.remaining()];
bb.get(ba, 0, ba.length);
dataFromCassandra.put(str, ba);
}
}
} catch (Exception e) {
e.printStackTrace();
}

return dataFromCassandra;
}

This is the Exception I am getting -

java.lang.IllegalArgumentException: e1 is not a column defined in this
metadata

In the above method, I am passing rowKey as `0123` and `attributeNames`
contains `e1` as the string.

I am expecting Snappy Compressed data in `dataFromCassandra` Map. In this
map the key should be `e1` and the value should be snappy compressed data
if I am not wrong.. And then I will iterate this Map to snappy decompress
the data..

I am using Datastax Java client working with Cassandra 1.2.9.

Any thoughts what wrong I am doing here?