Re: trouble with deleted counter columns

2011-11-30 Thread Sylvain Lebresne
On Wed, Nov 30, 2011 at 8:36 AM, Thorsten von Eicken t...@rightscale.com 
wrote:
 Running a single 1.0.3 node and using counter columns I have a problem.
 I have rows with ~200k counters. I deleted a number of such rows and now
 I can't put counters back in, or really, I can't query what I put back in.

The reason is explained at
http://wiki.apache.org/cassandra/Counters#Technical_limitations,
though it wasn't clear that it was taking your situation into account
(I've just updated it though). To rephrase, counters removal is only
supported if definitive. You cannot increment after a deletion. Or
rather, if you do, the behavior is undetermined. This holds for row
deletion too; if you delete a row, you can't increment any counter
that was there previously (the truth being that if you wait enough it
would work, but how many is enough depends on things like when
compaction happens and what is your gc_grace value).

Note that I understand this could be a problem for your use case
but that is an unfortunate limitation of the current design.

 Example using the cli:
 [default@rslog_production] get req_word_freq['2024'];
 Returned 0 results.
 Elapsed time: 2089 msec(s).
 [default@rslog_production] incr req_word_freq['2024']['test'];
 Value incremented.
 [default@rslog_production] get req_word_freq['2024'];
 Returned 0 results.
 Elapsed time: 2018 msec(s).

 Note how long it's taking, presumably because it's going through 200K+
 tombstones?

That is likely the reason, yes.


 Here's the same using a fresh row key, note the timings:
 [default@rslog_production] get req_word_freq['test'];
 Returned 0 results.
 Elapsed time: 1 msec(s).
 [default@rslog_production] incr req_word_freq['test']['test'];
 Value incremented.
 [default@rslog_production] get req_word_freq['test'];
 = (counter=test, value=1)
 Returned 1 results.
 Elapsed time: 6 msec(s).

 Incidentally, I then tried out deleting the column and I don't
 understand why the value is 2 at the end:
 [default@rslog_production] del req_word_freq['test'];
 row removed.
 [default@rslog_production] get req_word_freq['test'];
 Returned 0 results.
 Elapsed time: 1 msec(s).
 [default@rslog_production] incr req_word_freq['test']['test'];
 Value incremented.
 [default@rslog_production] get req_word_freq['test'];
 = (counter=test, value=2)
 Returned 1 results.
 Elapsed time: 1 msec(s).

 All this is on a single node system, running the cassandra-cli on the
 system itself. The CF is as follows:
 [default@rslog_production] describe req_word_freq;
    ColumnFamily: req_word_freq
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator:
 org.apache.cassandra.db.marshal.CounterColumnType
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds / keys to save : 0.0/0/all
      Row Cache Provider:
 org.apache.cassandra.cache.SerializingCacheProvider
      Key cache size / save period in seconds: 20.0/14400
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: []
      Compaction Strategy:
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

 I must be missing something...
 Thorsten



Can't run cleanup

2011-11-30 Thread Viktor Jevdokimov
Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any 
node.

Log reports: Cleanup cannot run before a node has joined the ring

New nodes has joined (one by one), all nodes up  running, reading, writing...
Not sending/receiving any streams on any node for more than 12 hours.
Nodetool's info/ring/tpstats/netstats for all nodes looks fine.

Restart don't help.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

Cleanup in a write-only environment

2011-11-30 Thread David McNelis
In my understanding Cleanup is meant to help clear out data that has  been
removed.  If you have an environment where data is only ever added (the
case for the production system I'm working with), is there a point to
automating cleanup?   I understand that if we were to ever purge a segment
of data from our cluster we'd certainly want to run it, or after added a
new node and adjusting the tokens.

So I want to make sure I'm not missing something here and that there  would
be other  reasons to run cleanup regularly?

-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


RE: Can't run cleanup

2011-11-30 Thread Viktor Jevdokimov
Nodetool repair also don't start on all nodes, log is reporting:
INFO 15:57:51,070 Starting repair command #2, repairing 0 ranges.
INFO 15:57:51,070 Repair command #2 completed successfully
Regular read repairs are working as reads and writes.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Wednesday, November 30, 2011 15:14
To: user@cassandra.apache.org
Subject: Can't run cleanup

Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any 
node.

Log reports: Cleanup cannot run before a node has joined the ring

New nodes has joined (one by one), all nodes up  running, reading, writing...
Not sending/receiving any streams on any node for more than 12 hours.
Nodetool's info/ring/tpstats/netstats for all nodes looks fine.

Restart don't help.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo5507.pnginline: dm-exco2d8.pnginline: tweet465.png

[RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Sylvain Lebresne
As indicated in a preceding mail (http://goo.gl/R1r1V), the release of 1.0.4
unfortunately shipped with two important regressions. The Cassandra team is
pleased to announce the release of Apache Cassandra version 1.0.5 that comes
to fix those two issues[1], but is identical to 1.0.4 otherwise.

Cassandra 1.0.5 can be de downloaded in the usual places, i.e:

 http://cassandra.apache.org/download/

We sincerely apologize for any inconvenience caused by 1.0.4. As always,
please pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

Have fun!

[1]: http://goo.gl/Fod0B (CHANGES.txt)
[2]: http://goo.gl/gtUvs (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


RE: [RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Michael Vaknine
The files are not on the site
The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
was not found on this server.

Thanks,
Michael

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: Wednesday, November 30, 2011 9:11 PM
To: user@cassandra.apache.org
Subject: [RELEASE] Apache Cassandra 1.0.5 released

As indicated in a preceding mail (http://goo.gl/R1r1V), the release of 1.0.4
unfortunately shipped with two important regressions. The Cassandra team is
pleased to announce the release of Apache Cassandra version 1.0.5 that comes
to fix those two issues[1], but is identical to 1.0.4 otherwise.

Cassandra 1.0.5 can be de downloaded in the usual places, i.e:

 http://cassandra.apache.org/download/

We sincerely apologize for any inconvenience caused by 1.0.4. As always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Have fun!

[1]: http://goo.gl/Fod0B (CHANGES.txt)
[2]: http://goo.gl/gtUvs (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA



Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Brandon Williams
On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote:
 The files are not on the site
 The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

It takes the mirrors some time to sync.

-Brandon


RE: [RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Michael Vaknine
Thanks,
The files are there already.

-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Wednesday, November 30, 2011 9:43 PM
To: user@cassandra.apache.org
Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com
wrote:
 The files are not on the site
 The requested URL
/apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

It takes the mirrors some time to sync.

-Brandon



Cassandra_Jobs on Twitter

2011-11-30 Thread Jeremy Hanna
For those interested in Apache Cassandra related jobs - either hiring or in
search of - there is now a @Cassandra_Jobs account on Twitter.  You can
either send posts to that account on twitter or send them to me at this
email address with a public link to the job posting and I will tweet them.

Cheers.


RE: [RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Michael Vaknine
Hi,
Upgrade 1.0.3 to 1.0.5
I have this errors
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt
ore.java:671)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:
745)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto
re.java:750)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja
va:119)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec
ondaryIndexManager.java:258)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex
es(SecondaryIndexManager.java:123)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi
on.java:151)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade
r.java:103)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.
java:184)
TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav
a:81)

Is this another regression?

Thanks
Michael

-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Wednesday, November 30, 2011 9:43 PM
To: user@cassandra.apache.org
Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com
wrote:
 The files are not on the site
 The requested URL
/apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

It takes the mirrors some time to sync.

-Brandon



data modeling question

2011-11-30 Thread Deno Vichas

hey all!

i'm started my first project using cassandra and some data model 
questions.  i'm working on an app that fetches stock market data.  i 
need to keep track of when i fetch a set of data for any given stock in 
any sector;  here's what i think my model should look like;


fetches : {
sector : {
 quote : {
timeuuid: {
symbol : ---
}
}
ticks : {
timeuuid: {
symbol : ---
}
}
fundamentals : {
timeuuid: {
symbol : ---
}
}
}
}


is there anything that less an ideal doing it this way versus creating 
separate CF per sector?how do you create Super CF inside of Super CF 
via the CLI?




thanks,
deno




Re: data modeling question

2011-11-30 Thread David McNelis
Personally I would create a separate column family for each basic area.
 For example

To organize my sectors and symbols I would create a column family where the
key is the sector name and the column names are the symbols for that
sector, i.e.:

sector : {
key: sector name
Column names: symbols
Column values: null
}

Then I would have a column family for quotes where I have the key as the
symbol, the column name as the timestamp, the value as the quote:

quote : {
key: symbol
column names:  timeuuid
column values:  quote at that time for that symbol
}

I would then use the same basic structure for your other column families,
ticks and fundamentals.  In general people tend to stay away from super
column families when possible for several reasons, but the most commonly
sited one is that when you get a SCF, the entire SCF must be deserialized
in order to access it.  So if you have a bunch of SCF, you're running a
risk of ending up needing to read in a lot more data than is necessary to
get the information you are looking for.

On Wed, Nov 30, 2011 at 2:57 PM, Deno Vichas d...@syncopated.net wrote:

  hey all!

 i'm started my first project using cassandra and some data model
 questions.  i'm working on an app that fetches stock market data.  i need
 to keep track of when i fetch a set of data for any given stock in any
 sector;  here's what i think my model should look like;

 fetches : {
 sector : {
  quote : {
 timeuuid: {
 symbol : ---
 }
 }
 ticks : {
 timeuuid: {
 symbol : ---
 }
 }
 fundamentals : {
 timeuuid: {
 symbol : ---
 }
 }
 }
 }


 is there anything that less an ideal doing it this way versus creating
 separate CF per sector?how do you create Super CF inside of Super CF
 via the CLI?



 thanks,
 deno





-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: Cleanup in a write-only environment

2011-11-30 Thread Nick Bailey
I believe you are mis-understanding what cleanup does. Cleanup is used
to remove data from a node that the node no longer owns. For example
when you move a node in the ring, it changes responsibility and gets
new data, but does not automatically delete the data it used to be
responsible for but no longer is. In this situation, you run cleanup
to delete all of that old data.

Data that has been deleted/expired will get removed automatically as
compaction runs.

On Wed, Nov 30, 2011 at 7:24 AM, David McNelis
dmcne...@agentisenergy.com wrote:
 In my understanding Cleanup is meant to help clear out data that has  been
 removed.  If you have an environment where data is only ever added (the case
 for the production system I'm working with), is there a point to automating
 cleanup?   I understand that if we were to ever purge a segment of data from
 our cluster we'd certainly want to run it, or after added a new node and
 adjusting the tokens.

 So I want to make sure I'm not missing something here and that there  would
 be other  reasons to run cleanup regularly?

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 c: 219.384.5143

 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.




Re: data modeling question

2011-11-30 Thread Deno Vichas
with the quote CF below how would one query for all keys that have a 
column name value that have a timeuuid of later than x minutes?  i need 
to be able to find all symbols that have not been fetch in x minutes by 
sector.  i know i get list of symbol by sector from my sector CF.


thanks,
deno

On 11/30/2011 1:07 PM, David McNelis wrote:


Then I would have a column family for quotes where I have the key as 
the symbol, the column name as the timestamp, the value as the quote:


quote : {
key: symbol
column names:  timeuuid
column values:  quote at that time for that symbol
}






Re: data modeling question

2011-11-30 Thread David McNelis
You wouldn't query for all the keys that have a column name x exactly.
 Instead what you would do is for sector x grab your list of symbols S.
 Then you would get the last column for each of those symbols (which you do
in different ways depending on the API), and then test if that date is
within your threshold.  If not, it goes into your list of symbols to fetch.


Alternatively, you could iterate over the symbols grabbing data where the
date is between range A and B, if you get an empty set / no columns
returned, then you need to re-pull for that symbol.  Does that make sense?

Either way you end up hitting on each of the individual symbols.  Maybe
someone else has a better idea of how to structure the data for that
particular use case.

On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas d...@syncopated.net wrote:

 with the quote CF below how would one query for all keys that have a
 column name value that have a timeuuid of later than x minutes?  i need to
 be able to find all symbols that have not been fetch in x minutes by
 sector.  i know i get list of symbol by sector from my sector CF.

 thanks,
 deno


 On 11/30/2011 1:07 PM, David McNelis wrote:


 Then I would have a column family for quotes where I have the key as the
 symbol, the column name as the timestamp, the value as the quote:

 quote : {
key: symbol
column names:  timeuuid
column values:  quote at that time for that symbol
 }






-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: [RELEASE] Apache Cassandra 1.0.5 released

2011-11-30 Thread Jonathan Ellis
I don't think so.  That code hasn't changed in a long time.  Is it reproducible?

On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com wrote:
 Hi,
 Upgrade 1.0.3 to 1.0.5
 I have this errors
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449
 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt
 ore.java:671)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:
 745)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto
 re.java:750)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja
 va:119)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec
 ondaryIndexManager.java:258)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex
 es(SecondaryIndexManager.java:123)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi
 on.java:151)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade
 r.java:103)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.
 java:184)
 TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav
 a:81)

 Is this another regression?

 Thanks
 Michael

 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com]
 Sent: Wednesday, November 30, 2011 9:43 PM
 To: user@cassandra.apache.org
 Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released

 On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com
 wrote:
 The files are not on the site
 The requested URL
 /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz
 was not found on this server.

 It takes the mirrors some time to sync.

 -Brandon




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


can not create a column family named 'index'

2011-11-30 Thread Shu Zhang
Hi, just wondering if this is intentional:

[default@test] create column family index;
Syntax error at position 21: mismatched input 'index' expecting set null
[default@test] create column family idx;
b9aae960-1bb2-11e1--bf27a177f2f6
Waiting for schema agreement...
... schemas agree across the cluster

Thanks,
Shu


Re: Cleanup in a write-only environment

2011-11-30 Thread Edward Capriolo
Your understanding of nodetool cleanup is not correct. cleanup is used only
after cluster balancing like adding or removing nodes. It removes data that
does not belong on the node anymore (in older versions it removed hints as
well)

Your debate is needing to run companion . In a write only workload you
should let cassandra do its normal connection.(in most cases)

On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com
wrote:
 In my understanding Cleanup is meant to help clear out data that has
 been removed.  If you have an environment where data is only ever added
(the case for the production system I'm working with), is there a point to
automating cleanup?   I understand that if we were to ever purge a segment
of data from our cluster we'd certainly want to run it, or after added a
new node and adjusting the tokens.
 So I want to make sure I'm not missing something here and that there
 would be other  reasons to run cleanup regularly?

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.




Re: Cleanup in a write-only environment

2011-11-30 Thread David McNelis
Thanks, folks.

I think I must have read compaction, thought cleanup, and gotten muddled
from there.

David
On Nov 30, 2011 6:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Your understanding of nodetool cleanup is not correct. cleanup is used
 only after cluster balancing like adding or removing nodes. It removes data
 that does not belong on the node anymore (in older versions it removed
 hints as well)

 Your debate is needing to run companion . In a write only workload you
 should let cassandra do its normal connection.(in most cases)

 On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com
 wrote:
  In my understanding Cleanup is meant to help clear out data that has
  been removed.  If you have an environment where data is only ever added
 (the case for the production system I'm working with), is there a point to
 automating cleanup?   I understand that if we were to ever purge a segment
 of data from our cluster we'd certainly want to run it, or after added a
 new node and adjusting the tokens.
  So I want to make sure I'm not missing something here and that there
  would be other  reasons to run cleanup regularly?
 
  --
  David McNelis
  Lead Software Engineer
  Agentis Energy
  www.agentisenergy.com
  c: 219.384.5143
  A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.
 
 


Re: data modeling question

2011-11-30 Thread Deno Vichas

here's what i ended up, this seems to work for me.


@Test
public void readAndWriteSettingTTL() throws Exception {
int ttl = 2;
String columnFamily = Quote;
SetString symbols = new HashSetString(){{
add(appl);
add(goog);
add(ibm);
add(csco);
}};

UUID timeUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis();

MutatorString mutator = HFactory.createMutator(_keyspace, 
_stringSerializer);
for(String symbol : symbols) 
addInsertionToMutator(columnFamily, timeUUID, mutator, symbol, ttl);

mutator.execute();

RangeSlicesQueryString, UUID, String rangeSlicesQuery = 
HFactory.createRangeSlicesQuery(_keyspace, _stringSerializer, 
_uuidSerializer, _stringSerializer);

rangeSlicesQuery.setColumnFamily(columnFamily);
rangeSlicesQuery.setKeys(, );
rangeSlicesQuery.setRange(null, null, false, 1);
QueryResultOrderedRowsString, UUID, String result = 
rangeSlicesQuery.execute();


UUID uuid = 
result.get().getList().get(0).getColumnSlice().getColumns().get(0).getName();


Assert.assertEquals(UUID should be the same, timeUUID, uuid);
Assert.assertEquals(We should have 4 records, 4, 
result.get().getList().size());


Thread.sleep(5000); // wait till TTL hits to make sure keys are 
getting flushed.


QueryResultOrderedRowsString, UUID, String result2 = 
rangeSlicesQuery.execute();

for(RowString, UUID, String row : result2.get().getList()) {
Assert.assertEquals(We should have no records, 0, 
row.getColumnSlice().getColumns().size());

}

}

private void addInsertionToMutator(String columnFamily, UUID 
columnName, MutatorString mutator, String symbol, int ttl) {
mutator.addInsertion(symbol, columnFamily, 
HFactory.createColumn(columnName, , ttl, _uuidSerializer, 
_stringSerializer));

}


On 11/30/2011 1:56 PM, David McNelis wrote:
You wouldn't query for all the keys that have a column name x exactly. 
 Instead what you would do is for sector x grab your list of symbols 
S.  Then you would get the last column for each of those symbols 
(which you do in different ways depending on the API), and then test 
if that date is within your threshold.  If not, it goes into your list 
of symbols to fetch.


Alternatively, you could iterate over the symbols grabbing data where 
the date is between range A and B, if you get an empty set / no 
columns returned, then you need to re-pull for that symbol.  Does that 
make sense?


Either way you end up hitting on each of the individual symbols. 
 Maybe someone else has a better idea of how to structure the data for 
that particular use case.


On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas d...@syncopated.net 
mailto:d...@syncopated.net wrote:


with the quote CF below how would one query for all keys that have
a column name value that have a timeuuid of later than x minutes?
 i need to be able to find all symbols that have not been fetch in
x minutes by sector.  i know i get list of symbol by sector from
my sector CF.

thanks,
deno


On 11/30/2011 1:07 PM, David McNelis wrote:


Then I would have a column family for quotes where I have the
key as the symbol, the column name as the timestamp, the value
as the quote:

quote : {
   key: symbol
   column names:  timeuuid
   column values:  quote at that time for that symbol
}






--
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com http://www.agentisenergy.com
c: 219.384.5143

/A Smart Grid technology company focused on helping consumers of 
energy control an often under-managed resource./







read repair and column range queries

2011-11-30 Thread Thorsten von Eicken
Looking at the docs, I can't conclusively answer this question:

Suppose I make this CQL query with consistency factor 1 and read-repair
100%:
select 'a'..'z' from cf where key = 'xyz' limit 5;
Suppose the node I connect to has the key and responds with (improvised
syntax):
['a'-0, 'c'-2, 'e'-4, 'g'-6, 'i'-8]
Suppose another node has a column 'b'-1, would this be caught by the
read repair?

The question really boils down to whether the digest query being sent is
the same as the one above, or whether it's more of the form select a,
c, e, g, i from cf where key = xyz and thus only checks whether the
column values are in agreement.

Thanks!
Thorsten