Re: Unbalanced ring in Cassandra 0.8.4

2012-06-19 Thread Nick Bailey
No. Cleanup will scan each sstable to remove data that is no longer
owned by that specific node. It won't compact the sstables together
however.

On Tue, Jun 19, 2012 at 11:11 PM, Raj N  wrote:
> But wont that also run a major compaction which is not recommended anymore.
>
> -Raj
>
>
> On Sun, Jun 17, 2012 at 11:58 PM, aaron morton 
> wrote:
>>
>> Assuming you have been running repair, it' can't hurt.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 17/06/2012, at 4:06 AM, Raj N wrote:
>>
>> Nick, do you think I should still run cleanup on the first node.
>>
>> -Rajesh
>>
>> On Fri, Jun 15, 2012 at 3:47 PM, Raj N  wrote:
>>>
>>> I did run nodetool move. But that was when I was setting up the cluster
>>> which means I didn't have any data at that time.
>>>
>>> -Raj
>>>
>>>
>>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey  wrote:

 Did you start all your nodes at the correct tokens or did you balance
 by moving them? Moving nodes around won't delete unneeded data after
 the move is done.

 Try running 'nodetool cleanup' on all of your nodes.

 On Fri, Jun 15, 2012 at 12:24 PM, Raj N  wrote:
 > Actually I am not worried about the percentage. Its the data I am
 > concerned
 > about. Look at the first node. It has 102.07GB data. And the other
 > nodes
 > have around 60 GB(one has 69, but lets ignore that one). I am not
 > understanding why the first node has almost double the data.
 >
 > Thanks
 > -Raj
 >
 >
 > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey 
 > wrote:
 >>
 >> This is just a known problem with the nodetool output and multiple
 >> DCs. Your configuration is correct. The problem with nodetool is
 >> fixed
 >> in 1.1.1
 >>
 >> https://issues.apache.org/jira/browse/CASSANDRA-3412
 >>
 >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N 
 >> wrote:
 >> > Hi experts,
 >> >     I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have
 >> > assigned
 >> > tokens using the first strategy(adding 1) mentioned here -
 >> >
 >> > http://wiki.apache.org/cassandra/Operations?#Token_selection
 >> >
 >> > But when I run nodetool ring on my cluster, this is the result I
 >> > get -
 >> >
 >> > Address         DC  Rack  Status State   Load        Owns    Token
 >> >
 >> >  113427455640312814857969558651062452225
 >> > 172.17.72.91    DC1 RAC13 Up     Normal  102.07 GB   33.33%  0
 >> > 45.10.80.144    DC2 RAC5  Up     Normal  59.1 GB     0.00%   1
 >> > 172.17.72.93    DC1 RAC18 Up     Normal  59.57 GB    33.33%
 >> >  56713727820156407428984779325531226112
 >> > 45.10.80.146    DC2 RAC7  Up     Normal  59.64 GB    0.00%
 >> > 56713727820156407428984779325531226113
 >> > 172.17.72.95    DC1 RAC19 Up     Normal  69.58 GB    33.33%
 >> >  113427455640312814857969558651062452224
 >> > 45.10.80.148    DC2 RAC9  Up     Normal  59.31 GB    0.00%
 >> > 113427455640312814857969558651062452225
 >> >
 >> >
 >> > As you can see the first node has considerably more load than the
 >> > others(almost double) which is surprising since all these are
 >> > replicas
 >> > of
 >> > each other. I am running Cassandra 0.8.4. Is there an explanation
 >> > for
 >> > this
 >> > behaviour?
 >> > Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be
 >> > the
 >> > cause for this?
 >> >
 >> > Thanks
 >> > -Raj
 >
 >
>>>
>>>
>>
>>
>


Re: Unbalanced ring in Cassandra 0.8.4

2012-06-19 Thread Raj N
But wont that also run a major compaction which is not recommended anymore.

-Raj

On Sun, Jun 17, 2012 at 11:58 PM, aaron morton wrote:

> Assuming you have been running repair, it' can't hurt.
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/06/2012, at 4:06 AM, Raj N wrote:
>
> Nick, do you think I should still run cleanup on the first node.
>
> -Rajesh
>
> On Fri, Jun 15, 2012 at 3:47 PM, Raj N  wrote:
>
>> I did run nodetool move. But that was when I was setting up the cluster
>> which means I didn't have any data at that time.
>>
>> -Raj
>>
>>
>> On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey  wrote:
>>
>>> Did you start all your nodes at the correct tokens or did you balance
>>> by moving them? Moving nodes around won't delete unneeded data after
>>> the move is done.
>>>
>>> Try running 'nodetool cleanup' on all of your nodes.
>>>
>>> On Fri, Jun 15, 2012 at 12:24 PM, Raj N  wrote:
>>> > Actually I am not worried about the percentage. Its the data I am
>>> concerned
>>> > about. Look at the first node. It has 102.07GB data. And the other
>>> nodes
>>> > have around 60 GB(one has 69, but lets ignore that one). I am not
>>> > understanding why the first node has almost double the data.
>>> >
>>> > Thanks
>>> > -Raj
>>> >
>>> >
>>> > On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey 
>>> wrote:
>>> >>
>>> >> This is just a known problem with the nodetool output and multiple
>>> >> DCs. Your configuration is correct. The problem with nodetool is fixed
>>> >> in 1.1.1
>>> >>
>>> >> https://issues.apache.org/jira/browse/CASSANDRA-3412
>>> >>
>>> >> On Fri, Jun 15, 2012 at 9:59 AM, Raj N 
>>> wrote:
>>> >> > Hi experts,
>>> >> > I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have
>>> assigned
>>> >> > tokens using the first strategy(adding 1) mentioned here -
>>> >> >
>>> >> > http://wiki.apache.org/cassandra/Operations?#Token_selection
>>> >> >
>>> >> > But when I run nodetool ring on my cluster, this is the result I
>>> get -
>>> >> >
>>> >> > Address DC  Rack  Status State   LoadOwnsToken
>>> >> >
>>> >> >  113427455640312814857969558651062452225
>>> >> > 172.17.72.91DC1 RAC13 Up Normal  102.07 GB   33.33%  0
>>> >> > 45.10.80.144DC2 RAC5  Up Normal  59.1 GB 0.00%   1
>>> >> > 172.17.72.93DC1 RAC18 Up Normal  59.57 GB33.33%
>>> >> >  56713727820156407428984779325531226112
>>> >> > 45.10.80.146DC2 RAC7  Up Normal  59.64 GB0.00%
>>> >> > 56713727820156407428984779325531226113
>>> >> > 172.17.72.95DC1 RAC19 Up Normal  69.58 GB33.33%
>>> >> >  113427455640312814857969558651062452224
>>> >> > 45.10.80.148DC2 RAC9  Up Normal  59.31 GB0.00%
>>> >> > 113427455640312814857969558651062452225
>>> >> >
>>> >> >
>>> >> > As you can see the first node has considerably more load than the
>>> >> > others(almost double) which is surprising since all these are
>>> replicas
>>> >> > of
>>> >> > each other. I am running Cassandra 0.8.4. Is there an explanation
>>> for
>>> >> > this
>>> >> > behaviour? Could
>>> https://issues.apache.org/jira/browse/CASSANDRA-2433 be
>>> >> > the
>>> >> > cause for this?
>>> >> >
>>> >> > Thanks
>>> >> > -Raj
>>> >
>>> >
>>>
>>
>>
>
>


Re: GCInspector works every 10 seconds!

2012-06-19 Thread Rob Coli
On Mon, Jun 18, 2012 at 12:07 AM, Jason Tang  wrote:
> After I enable key cache and row cache, the problem gone, I guess it because
> we have lots of data in SSTable, and it takes more time, memory and cpu to
> search the data.

The Key Cache is usually a win if added like this. The Row cache is
less likely to be. If I were you I would check your row cache hit
rates to make sure you are actually getting win. :)

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Snapshot failing on JSON files in 1.1.0

2012-06-19 Thread Rob Coli
On Tue, Jun 19, 2012 at 8:55 PM, Rob Coli  wrote:
> On Tue, Jun 19, 2012 at 2:55 AM, Alain RODRIGUEZ  wrote:
>> Unable to create hard link from
>> /raid0/cassandra/data/cassa_teads/stats_product-hc-233-Data.db to
>> /raid0/cassandra/data/cassa_teads/snapshots/1340099026781/stats_product-hc-233-Data.db
>
> Are you able to create this hard link via the filesystem? I am conjecturing 
> not.

FWIW, erno being given by OS and passed through Java is "1" :

http://freespace.sourceforge.net/errno/linux.html
"
 1 EPERM+Operation not permitted
"

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Snapshot failing on JSON files in 1.1.0

2012-06-19 Thread Rob Coli
On Tue, Jun 19, 2012 at 2:55 AM, Alain RODRIGUEZ  wrote:
> Unable to create hard link from
> /raid0/cassandra/data/cassa_teads/stats_product-hc-233-Data.db to
> /raid0/cassandra/data/cassa_teads/snapshots/1340099026781/stats_product-hc-233-Data.db

Are you able to create this hard link via the filesystem? I am conjecturing not.

Is "snapshots" perhaps on a different mountpoint than the directory
you are trying to create a snapshot via hardlinks?

=Rob
PS - boy, 9 emails in the thread.. full of log output, sure don't miss
them not being bottom-quoted to every email... :)

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: cassandra secondary index with

2012-06-19 Thread Yuhan Zhang
Hi Jonathan, thanks for the reference. will read up on it.

Yuhan


Re: cassandra secondary index with

2012-06-19 Thread Jonathan Ellis
That this will get you *worse* performance than just doing a seq scan would.

Details as to why this is, are here:
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

On Tue, Jun 19, 2012 at 2:48 PM, Yuhan Zhang  wrote:
> To anwser my own question:
>
> There should be at least on "equal" expression in the indexed query to
> combine with a "gte".
> so, I just added an trivial column that stays constant for equal comparison.
> and it works.
>
> not sure why this requirement exists.
>
> Thank you.
>
> Yuhan
>
>
> On Tue, Jun 19, 2012 at 12:23 PM, Yuhan Zhang  wrote:
>>
>> Hi all,
>>
>> I'm trying to search by the secondary index of cassandra with "greater
>> than or equal". but reached an exception stating:
>> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
>> InvalidRequestException(why:No indexed columns present in index clause with
>> operator EQ)
>>
>> However, the same column family with the same column, work when the search
>> expression is an "equal". I'm using the Hector java client.
>> The secondary index type has been set to: {column_name: sport,
>> validation_class: DoubleType, index_type:KEYS }
>>
>> here's the code reaching the exception:
>>
>> public QueryResult>
>> getIndexedSlicesGTE(String columnFamily, String columnName, double value,
>> String... columns) {
>>         Keyspace keyspace = getKeyspace();
>>         StringSerializer se = CassandraStorage.getStringExtractor();
>>
>>         IndexedSlicesQuery indexedSlicesQuery =
>> createIndexedSlicesQuery(keyspace, se, se, DoubleSerializer.get());
>>         indexedSlicesQuery.setColumnFamily(columnFamily);
>>         indexedSlicesQuery.setStartKey("");
>>         if(columns != null)
>>             indexedSlicesQuery.setColumnNames(columns);
>>         else {
>>             indexedSlicesQuery.setRange("", "", true, MAX_RECORD_NUMBER);
>>         }
>>
>> indexedSlicesQuery.setRowCount(CassandraStorage.MAX_RECORD_NUMBER);
>>         indexedSlicesQuery.addGteExpression(columnName, value);
>> // this doesn't work :(
>>         //indexedSlicesQuery.addEqualsExpression(columnName, value);    //
>> this works!
>>         QueryResult> result =
>> indexedSlicesQuery.execute();
>>
>>         return result;
>>     }
>>
>>
>> Is there any column_meta setting that is required in order to make GTE
>> comparison works on secondary index?
>>
>> Thank you.
>>
>> Yuhan Zhang
>>
>>
>>
>
>
>
> --
> Yuhan Zhang
> Application Developer
> OneScreen Inc.
> yzh...@onescreen.com
> www.onescreen.com
>
> The information contained in this e-mail is for the exclusive use of the
> intended recipient(s) and may be confidential, proprietary, and/or legally
> privileged. Inadvertent disclosure of this message does not constitute a
> waiver of any privilege.  If you receive this message in error, please do
> not directly or indirectly print, copy, retransmit, disseminate, or
> otherwise use the information. In addition, please delete this e-mail and
> all copies and notify the sender.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Row caching in Cassandra 1.1 by column family

2012-06-19 Thread Jonathan Ellis
rows_cached is actually obsolete in 1.1.  New hotness explained here:
http://www.datastax.com/dev/blog/caching-in-cassandra-1-1

On Mon, Jun 18, 2012 at 7:43 PM, Chris Burroughs
 wrote:
> Check out the "rows_cached" CF attribute.
>
> On 06/18/2012 06:01 PM, Oleg Dulin wrote:
>> Dear distinguished colleagues:
>>
>> I don't want all of my CFs cached, but one in particular I do.
>>
>> How can I configure that ?
>>
>> Thanks,
>> Oleg
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Rules for Major Compaction

2012-06-19 Thread Jonathan Ellis
On Tue, Jun 19, 2012 at 2:30 PM, Edward Capriolo  wrote:
> You final two sentences are good ground rules. In our case we have
> some column families that have high churn, for example a gc_grace
> period of 4 days but the data is re-written completely every day.
> Write activity over time will eventually cause tombstone removal but
> we can expedite the process by forcing a major at night. Because the
> tables are not really growing the **warning** below does not apply.

Note that Cassandra 1.2 will automatically compact sstables that have
more than a configurable amount of expired data (default 20%).  So you
won't have to force a major for this use case anymore.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Rules for Major Compaction

2012-06-19 Thread Raj N
Thanks Ed. I am on 0.8.4. So I don't have Leveled option, only SizeTiered.
I have a strange problem. I have a 6 node cluster(DC1=3, DC2=3). One of the
nodes has 105 GB data where as every other node has 60 GB in spite of each
one being a replica of the other.  And I am contemplating whether I should
be running compact/cleanup on the node with 105GB. Btw side question, does
it make sense to run it just for 1 node or is it advisable to run it for
all? This node also giving me some issues lately. Last night during some
heavy load, I got a lot of TimedOutExceptions from this node. The node was
also flapping. I could see in the logs that it could see the peers dying
ans coming back up, utlimately throwing UnavailableException(and sometimes
TimedOutException) on my requests. I use JNA mlockAll. So the JVM is
definitely not swapping. I see a full GC running(according to GCInspector)
for 15 secondsaround the same time. But even after the GC, requests were
timing out. Cassandra runs with Xmx8G, Xmn800M. Total RAM on the machine
62GB. I don't use any meaningful Key cache or row cache and rely on OS file
cache. Top shows VIRT as 116G(which makes sense since I have 105GB data).
 Have you seen any issues with data this size on a node?

-Raj

On Tue, Jun 19, 2012 at 3:30 PM, Edward Capriolo wrote:

> Hey my favorite question! It is a loaded question and it depends on
> your workload. The answer has evolved over time.
>
> In the old days <0.6.5 the only way to remove tombstones was major
> compaction. This is not true in any modern version.
>
> (Also in the old days you had to run cleanup to clear hints)
>
> Cassandra now has two compaction strategies SizeTiered and Leveled.
> Leveled DB can not be manually compacted.
>
>
> You final two sentences are good ground rules. In our case we have
> some column families that have high churn, for example a gc_grace
> period of 4 days but the data is re-written completely every day.
> Write activity over time will eventually cause tombstone removal but
> we can expedite the process by forcing a major at night. Because the
> tables are not really growing the **warning** below does not apply.
>
> **Warning** this creates one large sstable. Which is not always
> desirable, because it fiddles with the heuristics of SizeTiered
> (having one big table and other smaller ones).
>
> The updated answer is "You probably do not want to run major
> compactions, but some use cases could see some benefits"
>
> On Tue, Jun 19, 2012 at 10:51 AM, Raj N  wrote:
> > DataStax recommends not to run major compactions. Edward Capriolo's
> > Cassandra High Performance book suggests that major compaction is a good
> > thing. And should be run on a regular basis. Are there any ground rules
> > about running major compactions? For example, if you have write-once
> kind of
> > data that is never updated  then it probably makes sense to not run major
> > compaction. But if you have data which can be deleted or overwritten
> does it
> > make sense to run major compaction on a regular basis?
> >
> > Thanks
> > -Raj
>


Re: release of cassandra-unit 1.1.0.1

2012-06-19 Thread Yuhan Zhang
Hi Jeremy,

Glad to see the update. It would be nice if secondary index in
cassandra-unit supports DoubleType.

Yuhan

On Wed, Jun 13, 2012 at 1:32 PM, Jérémy SEVELLEC wrote:

> Hi all,
>
> cassandra-unit 1.1.0.1 is now release. cassandra-unit helps you writing
> isolated Junit Test using cassandra (starting an embedded cassandra
> instance, load data from a dataset, ...) in a Test Driven Development style
> or not :-).
>
> The artifact is published on the public maven repo.
>
> Main new features are  :
> - updating to hector 1.1-0 and cassandra-all 1.1.1
> - allow to set more options on column family
>
> You can see all the detailed content of this release here :
> https://github.com/jsevellec/cassandra-unit/wiki/changelog
>
> cassandra-unit documentation :
> https://github.com/jsevellec/cassandra-unit/wiki
> cassandra-unit examples :
> https://github.com/jsevellec/cassandra-unit-examples
>
> This can perhaps help...
>
> Regards,
>
> --
> Jérémy
>



-- 
Yuhan Zhang
Application Developer
OneScreen Inc.
yzh...@onescreen.com 
www.onescreen.com

The information contained in this e-mail is for the exclusive use of the
intended recipient(s) and may be confidential, proprietary, and/or legally
privileged. Inadvertent disclosure of this message does not constitute a
waiver of any privilege.  If you receive this message in error, please do
not directly or indirectly print, copy, retransmit, disseminate, or
otherwise use the information. In addition, please delete this e-mail and
all copies and notify the sender.


Re: cassandra secondary index with

2012-06-19 Thread Yuhan Zhang
To anwser my own question:

There should be at least on "equal" expression in the indexed query to
combine with a "gte".
so, I just added an trivial column that stays constant for equal
comparison. and it works.

not sure why this requirement exists.

Thank you.

Yuhan

On Tue, Jun 19, 2012 at 12:23 PM, Yuhan Zhang  wrote:

> Hi all,
>
> I'm trying to search by the secondary index of cassandra with "greater
> than or equal". but reached an exception stating:
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:No indexed columns present in index clause with
> operator EQ)
>
> However, the same column family with the same column, work when the search
> expression is an "equal". I'm using the Hector java client.
> The secondary index type has been set to: {column_name: sport,
> validation_class: DoubleType, index_type:KEYS }
>
> here's the code reaching the exception:
>
> public QueryResult>
> getIndexedSlicesGTE(String columnFamily, String columnName, double value,
> String... columns) {
> Keyspace keyspace = getKeyspace();
> StringSerializer se = CassandraStorage.getStringExtractor();
>
> IndexedSlicesQuery indexedSlicesQuery =
> createIndexedSlicesQuery(keyspace, se, se, DoubleSerializer.get());
> indexedSlicesQuery.setColumnFamily(columnFamily);
> indexedSlicesQuery.setStartKey("");
> if(columns != null)
> indexedSlicesQuery.setColumnNames(columns);
> else {
> indexedSlicesQuery.setRange("", "", true, MAX_RECORD_NUMBER);
> }
> indexedSlicesQuery.setRowCount(CassandraStorage.MAX_RECORD_NUMBER);
> indexedSlicesQuery.addGteExpression(columnName, value);
> // this doesn't work :(
> //indexedSlicesQuery.addEqualsExpression(columnName, value);//
> this works!
> QueryResult> result =
> indexedSlicesQuery.execute();
>
> return result;
> }
>
>
> Is there any column_meta setting that is required in order to make GTE
> comparison works on secondary index?
>
> Thank you.
>
> Yuhan Zhang
>
>
>
>


-- 
Yuhan Zhang
Application Developer
OneScreen Inc.
yzh...@onescreen.com 
www.onescreen.com

The information contained in this e-mail is for the exclusive use of the
intended recipient(s) and may be confidential, proprietary, and/or legally
privileged. Inadvertent disclosure of this message does not constitute a
waiver of any privilege.  If you receive this message in error, please do
not directly or indirectly print, copy, retransmit, disseminate, or
otherwise use the information. In addition, please delete this e-mail and
all copies and notify the sender.


Re: Rules for Major Compaction

2012-06-19 Thread Edward Capriolo
Hey my favorite question! It is a loaded question and it depends on
your workload. The answer has evolved over time.

In the old days <0.6.5 the only way to remove tombstones was major
compaction. This is not true in any modern version.

(Also in the old days you had to run cleanup to clear hints)

Cassandra now has two compaction strategies SizeTiered and Leveled.
Leveled DB can not be manually compacted.


You final two sentences are good ground rules. In our case we have
some column families that have high churn, for example a gc_grace
period of 4 days but the data is re-written completely every day.
Write activity over time will eventually cause tombstone removal but
we can expedite the process by forcing a major at night. Because the
tables are not really growing the **warning** below does not apply.

**Warning** this creates one large sstable. Which is not always
desirable, because it fiddles with the heuristics of SizeTiered
(having one big table and other smaller ones).

The updated answer is "You probably do not want to run major
compactions, but some use cases could see some benefits"

On Tue, Jun 19, 2012 at 10:51 AM, Raj N  wrote:
> DataStax recommends not to run major compactions. Edward Capriolo's
> Cassandra High Performance book suggests that major compaction is a good
> thing. And should be run on a regular basis. Are there any ground rules
> about running major compactions? For example, if you have write-once kind of
> data that is never updated  then it probably makes sense to not run major
> compaction. But if you have data which can be deleted or overwritten does it
> make sense to run major compaction on a regular basis?
>
> Thanks
> -Raj


cassandra secondary index with

2012-06-19 Thread Yuhan Zhang
Hi all,

I'm trying to search by the secondary index of cassandra with "greater than
or equal". but reached an exception stating:
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:No indexed columns present in index clause with
operator EQ)

However, the same column family with the same column, work when the search
expression is an "equal". I'm using the Hector java client.
The secondary index type has been set to: {column_name: sport,
validation_class: DoubleType, index_type:KEYS }

here's the code reaching the exception:

public QueryResult>
getIndexedSlicesGTE(String columnFamily, String columnName, double value,
String... columns) {
Keyspace keyspace = getKeyspace();
StringSerializer se = CassandraStorage.getStringExtractor();

IndexedSlicesQuery indexedSlicesQuery =
createIndexedSlicesQuery(keyspace, se, se, DoubleSerializer.get());
indexedSlicesQuery.setColumnFamily(columnFamily);
indexedSlicesQuery.setStartKey("");
if(columns != null)
indexedSlicesQuery.setColumnNames(columns);
else {
indexedSlicesQuery.setRange("", "", true, MAX_RECORD_NUMBER);
}
indexedSlicesQuery.setRowCount(CassandraStorage.MAX_RECORD_NUMBER);
indexedSlicesQuery.addGteExpression(columnName, value);
// this doesn't work :(
//indexedSlicesQuery.addEqualsExpression(columnName, value);//
this works!
QueryResult> result =
indexedSlicesQuery.execute();

return result;
}


Is there any column_meta setting that is required in order to make GTE
comparison works on secondary index?

Thank you.

Yuhan Zhang


Unable to update CFs with duplicate index names

2012-06-19 Thread Wenjun Che
Hello

We started using cassandra at version 0.7, which allowed duplicate
names for indexes.  We upgraded to version 0.8.10 a while ago and
everything has been working fine.  Now I am not able to run 'update
column family' on CF with duplicate index names with other CFs.

If I update the CF with same index names, I am getting "Duplicate
index name userId".  If I update with different index names or without
index names, I am getting "Cannot modify index name".

The only info I can find is
https://issues.apache.org/jira/browse/CASSANDRA-2903, but it does not
say anything about existing duplicate indexes.

Thanks


Rules for Major Compaction

2012-06-19 Thread Raj N
DataStax recommends not to run major compactions. Edward Capriolo's
Cassandra High Performance book suggests that major compaction is a good
thing. And should be run on a regular basis. Are there any ground rules
about running major compactions? For example, if you have write-once kind
of data that is never updated  then it probably makes sense to not run
major compaction. But if you have data which can be deleted or overwritten
does it make sense to run major compaction on a regular basis?

Thanks
-Raj


Re: Snapshot failing on JSON files in 1.1.0

2012-06-19 Thread Alain RODRIGUEZ
Hi again,

apt-get install libjna-java installed nothing, I was already up to date.

I made the symbolic link jna.jar to target jna-3.4.1.jar (downloaded @
the given link) instead of jna-3.2.4.jar.

I could restart with the 'JNA mlockall successful' message.

I am still unable to snapshot my data.

I got the following output :

Exception in thread "main" java.io.IOError: java.io.IOException:
Unable to create hard link from
/raid0/cassandra/data/cassa_teads/stats_product-hc-233-Data.db to
/raid0/cassandra/data/cassa_teads/snapshots/1340099026781/stats_product-hc-233-Data.db
(errno 1)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at org.apache.cassandra.db.Table.snapshot(Table.java:210)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1710)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Unable to create hard link from
/raid0/cassandra/data/cassa_teads/stats_product-hc-233-Data.db to
/raid0/cassandra/data/cassa_teads/snapshots/1340099026781/stats_product-hc-233-Data.db
(errno 1)
at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:158)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:857)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1412)
... 32 more

Logs tell me this :

ERROR 09:43:46,840 Unable to create hard link
com.sun.jna.LastErrorException: [1]ÃX
at org.apache.cassandra.utils.CLibrary.link(Native Method)
at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:145)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:857)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1412)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at org.apache.cassandra.db.Table.snapshot(Table.java:210)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1710)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)

Re: Problem with streaming with sstableloader into ubuntu node

2012-06-19 Thread aaron morton
The code is processing the file name, without the path and appears to be 
correct.

Can you show the full error (including any other output) and the directory / 
files you are running the bulk load against when in windows ?

Bulk load expects
keyspace/
column_family/
sstable-file

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/06/2012, at 3:13 AM, Nury Redjepow wrote:

> Okay,  we investigated the problem and found the source of proble in package 
> org.apache.cassandra.io.sstable;
> 
> public class Descriptor
> 
> public static Pair fromFilename(File directory, String 
> name)
> {
> // tokenize the filename
> StringTokenizer st = new StringTokenizer(name, String.valueOf(separator));
> String nexttok;
> 
> if bulkloader running from windows and cassandra running under Ubuntu, 
> directory is 
> ("KeySpaceName\\ColumnFamilyName\\KeySpaceName-ColumnFamilyName-hc-177-Data.db"
>   
> 
> so at next rows  
> String ksname = st.nextToken();
> String cfname = st.nextToken();
> 
> ksname becomes "KeySpaceName\\ColumnFamilyName\\KeySpaceName"
> 
> 
> Sincerely, Nury.
> 
> 
> 
> 
> Mon, 18 Jun 2012 15:40:17 +1200 от aaron morton :
> Cross platform clusters are not really supported. 
> 
> That said it sounds like a bug. If you can create some steps to reproduce it 
> please create a ticket here https://issues.apache.org/jira/browse/CASSANDRA 
> it may get looked it. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/06/2012, at 12:41 AM, Nury Redjepow wrote:
> 
>> Good day, everyone
>> 
>> We are using sstableloader to bulk insert data into cassandra. 
>> 
>> Script is executed on developers machine with Windows to Single Node 
>> Cassandra. 
>> 
>> "%JAVA_HOME%\bin\java" -ea -cp %CASSANDRA_CLASSPATH% -Xmx256M 
>> -Dlog4j.configuration=log4j-tools.properties 
>> org.apache.cassandra.tools.BulkLoader -d 10.0.3.37 --debug -v 
>> "DestinationPrices/PricesByHotel" 
>> 
>> This works fine if destination cassandra is working under windows, but 
>> doesn't work with ubuntu instance. Cli is able to connect, but sstable seem 
>> to have problem with keyspace name. Logs in ubuntu instance show error 
>> messages like:
>> 
>> ERROR [Thread-41] 2012-06-15 16:05:47,620 AbstractCassandraDaemon.java (line 
>> 134) Exception in thread Thread[Thread-41,5,main]
>> java.lang.AssertionError: Unknown keyspace 
>> DestinationPrices\PricesByHotel\DestinationPrices
>> 
>> 
>> In our schema we have keyspace DestinationPrices, and column family 
>> PricesByHotel. Somehow it's not accepted properly.
>> 
>> So my question is, how should I specify keyspace name in command, to make it 
>> work correctly with Ubuntu?
>> 
> 
> 



Re: Change of behaviour in multiget_slice query for unknown keys between 0.7 and 1.1?

2012-06-19 Thread aaron morton
Nothing has changed in the server, try the Hector user group. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/06/2012, at 12:02 PM, Edward Sargisson wrote:

> Hi all,
> Was there a change of behaviour in multiget_slice query in Cassandra or 
> Hector between 0.7 and 1.1 when dealing with a key that doesn't exist?
> 
> We've just upgraded and our in memory unit test is failing (although just on 
> my machine). The test code is looking for a key that doesn't exist and 
> expects to get null. Instead it gets a ColumnSlice with a single column 
> called val. If there were something there then we'd expect columns with names 
> like bytes, int or string. Other rows in the column family have those columns 
> as well as val.
> 
> Is there a reason for this behaviour?
> I'd like to see if there was an explanation before I change the unit test for 
> it.
> 
> Many thanks in advance,
> Edward
> 
> -- 
> Edward Sargisson
> senior java developer
> Global Relay
> 
> edward.sargis...@globalrelay.net
> 
> 
> 866.484.6630 
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  
> (+65.3158.1301)
> 
> Global Relay Archive supports email, instant messaging, BlackBerry, 
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook 
> and more.   
> 
> Ask about Global Relay Message — The Future of Collaboration in the Financial 
> Services World
> 
> All email sent to or from this address will be retained by Global Relay’s 
> email archiving system. This message is intended only for the use of the 
> individual or entity to which it is addressed, and may contain information 
> that is privileged, confidential, and exempt from disclosure under applicable 
> law.  Global Relay will not be liable for any compliance or technical 
> information provided herein.  All trademarks are the property of their 
> respective owners.