Suggestion Date as a Partition key

2015-02-03 Thread Asit KAUSHIK
HI All,
Please excuse me if my queries of are of novice user . On continuation with
my last table design issue i am thinking of creating a Partition key on
date(only) as in our search criteria time frame would always be there.

So my queries are

1) Is this a good idea as i don't have any other field to add it with the
date key. All the time series examples points to the combination here i am
talking only about a datewhich i would convert to an int format like
04022015.
2) Also any elaborate doc or writeup to identify how much data is on which
node  so that i can see the distribution of data on to the nodes

For your reference below is my table structure


CREATE TABLE logentries (
eventDate bigint PRIMARY KEY,
context text,
date_to_hour bigint,
durationinseconds float,
eventtimestamputc timestamp,
ipaddress inet,
logentrytimestamputc timestamp,
loglevel int,
logmessagestring text,
logsequence int,
message text,
modulename text,
productname text,
searchitems maptext, text,
servername text,
sessionname text,
stacktrace text,
threadname text,
timefinishutc timestamp,
timestartutc timestamp,
urihostname text,
uripathvalue text,
uriquerystring text,
useragentstring text,
username text
);

Thanks so much all for the help

Cheers
Asit


RE: Smart column searching for a particular rowKey

2015-02-03 Thread Mohammed Guller
Astyanax allows you to execute CQL statements. I don’t remember the details, 
but it is there.

One tip – when you create the column family, use CLUSTERING ORDER WITH 
(timestamp DESC). Then you query becomes straightforward and C* will do all the 
heavy lifting for you.

Mohammed

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Tuesday, February 3, 2015 11:54 AM
To: user@cassandra.apache.org
Subject: RE: Smart column searching for a particular rowKey

Cannot find something corresponding to where clause there.

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Tuesday, February 03, 2015 2:44 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Smart column searching for a particular rowKey

Thanks, it does.
How about in astyanax?

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Tuesday, February 03, 2015 1:49 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Smart column searching for a particular rowKey

WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote:
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks





Re: No schema agreement from live replicas?

2015-02-03 Thread Clint Kelly
FWIW increasing the threshold for withMaxSchemaAgreementWaitSeconds to
30sec was enough to fix my problem---I would like to understand
whether the cluster has some kind of configuration problem that made
doing so necessary, however.

Thanks!

On Tue, Feb 3, 2015 at 7:44 AM, Clint Kelly clint.ke...@gmail.com wrote:
 Hi all,

 I have an application that uses the Java driver to create a table and then
 immediately write to it.  I see the following warning in my logs:

 [10.241.17.134] out: 15/02/03 09:32:24 WARN
 com.datastax.driver.core.Cluster: No schema agreement from live replicas
 after 10 s. The schema may not be up to date on some nodes.

 ...this seems to happen after creating a table, and the schema not being up
 to date leads to errors when trying to write the the new tables:

 [10.241.17.134] out: Exception in thread main
 com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
 columnfamily schema_hash

 Any suggestions on what to do about this (other than increasing
 withMaxSchemaAgreementWaitSeconds)?  This is only a three-node test
 cluster.  I have not gotten this warning before, even on much bigger
 clusters.

 Best regards,
 Clint


No schema agreement from live replicas?

2015-02-03 Thread Clint Kelly
Hi all,

I have an application that uses the Java driver to create a table and then
immediately write to it.  I see the following warning in my logs:

[10.241.17.134] out: 15/02/03 09:32:24 WARN
com.datastax.driver.core.Cluster: No schema agreement from live replicas
after 10 s. The schema may not be up to date on some nodes.

...this seems to happen after creating a table, and the schema not being up
to date leads to errors when trying to write the the new tables:

[10.241.17.134] out: Exception in thread main
com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
columnfamily schema_hash

Any suggestions on what to do about this (other than increasing
withMaxSchemaAgreementWaitSeconds)?  This is only a three-node test
cluster.  I have not gotten this warning before, even on much bigger
clusters.

Best regards,
Clint


Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks




Re: No schema agreement from live replicas?

2015-02-03 Thread graham sanderson
What version of C* are you using; you could be seeing 
https://issues.apache.org/jira/browse/CASSANDRA-7734 
https://issues.apache.org/jira/browse/CASSANDRA-7734 which I think affects 
2.0.7 thru 2.0.10

 On Feb 3, 2015, at 9:47 AM, Clint Kelly clint.ke...@gmail.com wrote:
 
 FWIW increasing the threshold for withMaxSchemaAgreementWaitSeconds to
 30sec was enough to fix my problem---I would like to understand
 whether the cluster has some kind of configuration problem that made
 doing so necessary, however.
 
 Thanks!
 
 On Tue, Feb 3, 2015 at 7:44 AM, Clint Kelly clint.ke...@gmail.com wrote:
 Hi all,
 
 I have an application that uses the Java driver to create a table and then
 immediately write to it.  I see the following warning in my logs:
 
 [10.241.17.134] out: 15/02/03 09:32:24 WARN
 com.datastax.driver.core.Cluster: No schema agreement from live replicas
 after 10 s. The schema may not be up to date on some nodes.
 
 ...this seems to happen after creating a table, and the schema not being up
 to date leads to errors when trying to write the the new tables:
 
 [10.241.17.134] out: Exception in thread main
 com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
 columnfamily schema_hash
 
 Any suggestions on what to do about this (other than increasing
 withMaxSchemaAgreementWaitSeconds)?  This is only a three-node test
 cluster.  I have not gotten this warning before, even on much bigger
 clusters.
 
 Best regards,
 Clint



smime.p7s
Description: S/MIME cryptographic signature


RE: Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Cannot find something corresponding to where clause there.

From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
Sent: Tuesday, February 03, 2015 2:44 PM
To: user@cassandra.apache.org
Subject: RE: Smart column searching for a particular rowKey

Thanks, it does.
How about in astyanax?

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Tuesday, February 03, 2015 1:49 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Smart column searching for a particular rowKey

WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote:
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks





RE: Smart column searching for a particular rowKey

2015-02-03 Thread Ravi Agrawal
Thanks, it does.
How about in astyanax?

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Tuesday, February 03, 2015 1:49 PM
To: user@cassandra.apache.org
Subject: Re: Smart column searching for a particular rowKey

WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal 
ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote:
Hi Guys,
Need help with this.
My rowKey is stockName like GOOGLE, APPLE.
Columns are sorted as per timestamp and they include some set of data fields 
like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 
9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares
I want to search this column family using partition key timestamp.
For a rowkey, if I search for data on partition id 9:33:00 which does not 
actually exist in columns, I want to return the last value where data was 
present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 
9:35:09 which is greater than input value entered.
One obvious way would be iterating through each columns and storing last data, 
if new timestamp is greater than given timestamp then return the last data 
stored.
Is it any optimized way to achieve the same? Since columns are already sorted.
Thanks





Re: Smart column searching for a particular rowKey

2015-02-03 Thread Eric Stevens
WHERE  + ORDER DESC + LIMIT should be able to accomplish that.

On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.com
wrote:

  Hi Guys,

 Need help with this.

 My rowKey is stockName like GOOGLE, APPLE.

 Columns are sorted as per timestamp and they include some set of data
 fields like price and size. So, data would be like 1. 9:31:00, $520, 100
 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares

 I want to search this column family using partition key timestamp.

 For a rowkey, if I search for data on partition id 9:33:00 which does not
 actually exist in columns, I want to return the last value where data was
 present. In this case 9:31:00, $520, 100 shares, since the next
 partitionkey is 9:35:09 which is greater than input value entered.

 One obvious way would be iterating through each columns and storing last
 data, if new timestamp is greater than given timestamp then return the last
 data stored.

 Is it any optimized way to achieve the same? Since columns are already
 sorted.

 Thanks







Re: Unable to create a keyspace

2015-02-03 Thread Saurabh Sethi
Thanks Carlos for pointing that out. The clock on one of the nodes was not in 
sync and fixing that solved the issue.

From: Jan cne...@yahoo.commailto:cne...@yahoo.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org, Jan 
cne...@yahoo.commailto:cne...@yahoo.com
Date: Saturday, January 31, 2015 at 9:59 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Unable to create a keyspace

Saurabh;

a)   How exactly are the three nodes hosted.
b)  Can you take down node 2 and  create the keyspace from node 1
c) Can you take down node 1 and  create the keyspace from node2
d)   Do the nodes see each other with 'nodetool status'

cheers
Jan/

C* Architect


On Saturday, January 31, 2015 5:40 AM, Carlos Rolo 
r...@pythian.commailto:r...@pythian.com wrote:


Something that can cause weird behavior is the machine clocks not being 
properly synced.  I didn't read the thread in full detail, so disregard this if 
it is not the case.


--






Re: Tombstone gc after gc grace seconds

2015-02-03 Thread Alain RODRIGUEZ
Hi, thanks for sharing your tests !

Though, how did you inserted the data ? Did you try adding columns in an
atomic and random order, with a small memtable size to achieve a big
sharding (normal in time series use case) ?

I think performing ./md_test against this set of data would be interesting,
taking care to shard enough to have parts of each key on each SSTable, or
in many at least.

This would measure the effectiveness of this parameter in a normal time
series workflow (which is a standard use case of C*).

By the way, this might be done on both LCS
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
and STCS
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/tabProp.html?scroll=tabProp__moreCompaction
and
see how both behave.

Thanks again, it is interesting to hear about tests like yours !

C*heers

Alain

2015-01-30 17:32 GMT+01:00 Ravi Agrawal ragra...@clearpoolgroup.com:

  I did a small test. I wrote data to 4 different column family. 30MB of
 data.

 256 rowkeys and 100K columns on an average.

 And then deleted all data from all of them.



 1.   Md_normal - created using default compaction parameters and Gc
 Grace seconds was 5 seconds. Data was written and then deleted. Compaction
 was ran using nodetool compact keyspace columnfamily - I see full disk
 data, but cannot query columns(since data was deleted consistent behavior)
 and cannot query rows in cqlsh. Hits timeout.

 2.   Md_test - created using following compact parameters -
 compaction={'tombstone_threshold': '0.01', 'class':
 'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds.
 Disksize is reduced, and am able to query rows which return 0.

 3.   Md_test2 - created using following compact parameters -
 compaction={'tombstone_threshold': '0.0', 'class':
 'SizeTieredCompactionStrategy'}. Disksize is reduced, not able to query
 rows using cqlsh. Hits timeout.

 4.   Md_forcecompact - created using compaction parameters
 compaction={'unchecked_tombstone_compaction': 'true', 'class':
 'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds. Data
 was written and then deleted. I see full disk data, but cannot query any
 data using mddbreader and cannot query rows in cqlsh. Hits timeout.



 Next day sizes were -

 30M ./md_forcecompact

 4.0K./md_test

 304K./md_test2

 30M ./md_normal



 Feel of the data that we have is -

 8000 rowkeys per day and columns are added throughout the day. 300K
 columns on an average per rowKey.







 *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
 *Sent:* Friday, January 30, 2015 4:26 AM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Tombstone gc after gc grace seconds



 The point is that all the parts or fragments of the row need to be in
 the SSTables implied in the compaction for C* to be able to evict the row
 effectively.



 My understanding of those parameters is that they will trigger a
 compaction on the SSTable that exceed this ratio. This will work properly
 if you never update a row (by modifying a value or adding a column). If
 your workflow is something like Write once per partition key, this
 parameter will do the job.



 If you have fragments, you might trigger this compaction for nothing. In
 the case of frequently updated rows (like when using wide rows / time
 series) your only way to get rid of tombstone is a major compaction.



 That's how I understand this.



 Hope this help,



 C*heers,



 Alain



 2015-01-30 1:29 GMT+01:00 Mohammed Guller moham...@glassbeam.com:

  Ravi -



 It may help.



 What version are you running? Do you know if minor compaction is getting
 triggered at all? One way to check would be see how many sstables the data
 directory has.



 Mohammed



 *From:* Ravi Agrawal [mailto:ragra...@clearpoolgroup.com]
 *Sent:* Thursday, January 29, 2015 1:29 PM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Tombstone gc after gc grace seconds



 Hi,

 I saw there are 2 more interesting parameters -

 a.   tombstone_threshold - A ratio of garbage-collectable tombstones
 to all contained columns, which if exceeded by the SSTable triggers
 compaction (with no other SSTables) for the purpose of purging the
 tombstones. Default value - 0.2

 b.  unchecked_tombstone_compaction - True enables more aggressive
 than normal tombstone compactions. A single SSTable tombstone compaction
 runs without checking the likelihood of success. Cassandra 2.0.9 and later.

 Could I use these to get what I want?

 Problem I am encountering is even long after gc_grace_seconds I see no
 reduction in disk space until I run compaction manually. I was thinking to
 make tombstone threshold close to 0 and unchecked compaction set to true.

 Also we are not running nodetool repair on weekly basis as of now.



 *From:* Eric Stevens [mailto:migh...@gmail.com migh...@gmail.com]
 *Sent:* Monday, January 26, 2015 12:11 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Tombstone