Re: SimpleAuthenticator / SimpleAuthorization missing

2011-10-20 Thread Yi Yang
See:
https://issues.apache.org/jira/browse/CASSANDRA-2922


On Thu, Oct 20, 2011 at 4:08 AM, Pierre Chalamet pie...@chalamet.netwrote:

 Hello,

 SimpleAuthenticator  SimpleAuthorization just disappear in release
 1.0.0...

 Will this stay like this or is it a release bug ?

 Thanks,
 - Pierre



Re: ebs or ephemeral

2011-10-10 Thread Yi Yang
Agree, EBS systems are not so good for cassandra systems and during previous 
conversations in this mail list, people tend to use ephemeral.

從我的 BlackBerry® 無線裝置

-Original Message-
From: Sasha Dolgy sdo...@gmail.com
Date: Mon, 10 Oct 2011 10:03:26 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Re: ebs or ephemeral

just catching the tail end of this discussion.  aaron, in your previous
email, you said And an explanation of why we normally avoid ephemeral. 
 shouldn't this be, avoiding EBS?  EBS was a nightmare for us in terms
of performance.

On Mon, Oct 10, 2011 at 9:23 AM, aaron morton aa...@thelastpickle.comwrote:

 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes.

 see http://thelastpickle.com/2011/06/13/Down-For-Me/
 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/10/2011, at 9:37 PM, Madalina Matei wrote:

 Hi Aaron,

 For a 6 nodes cluster, what RF can we use in order to support 2 failed
 nodes?
 From the article that you sent i understood avoid EMS and use ephemeral.
 am i missing anything?

 Thank you so much for your help,
 Madaina
 On Fri, Oct 7, 2011 at 9:15 AM, aaron morton aa...@thelastpickle.comwrote:

 Data Stax have pre build AMI's here
 http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami


 And an explanation of why we normally avoid ephemeral.

 Also, I would go with 6 nodes. You will then be able to handle up to 2
 failed nodes.

 Hope that helps.





Re: ebs or ephemeral

2011-10-07 Thread Yi Yang
Obviously ephemeral. It has higher IO availability, will not affect your 
Ethernet IO performance, and it is free (included in instance price)
and the redundancy is provided by cassandra itself.

從我的 BlackBerry® 無線裝置

-Original Message-
From: Madalina Matei madalinaima...@gmail.com
Date: Fri, 7 Oct 2011 09:02:06 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: ebs or ephemeral

Hi,

 I'm looking to deploy a 5 nodes cluster in EC2 with RF3 and QUORUM CL.

 Could you please advice me on EBS vs ephemeral storage ?

Cheers,
Madalina



Re: Why is mutation stage increasing ??

2011-10-05 Thread Yi Yang
Well what client are you using? And can you give a hint to your node hardware?

從我的 BlackBerry® 無線裝置

-Original Message-
From: Philippe watche...@gmail.com
Date: Wed, 5 Oct 2011 10:33:21 
To: useruser@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Why is mutation stage increasing ??

Hello,
I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as
to what is going on ?
It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats
showed huge increasing MutationStages (in the hundreds of thousands).
I restarted one node and it took a while to reply GB of commitlog. I've
shutdown all clients that write to the cluster and it's just weird

All nodes are still showing huge MutationStages including the new one and
it's either increasing or stable. The pending count is stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub
compactions (all at 100%) on the 3rd one. This is a scrub I did last week
when I encountered assertion errors.
Netstats shows no streams being exchanged at any node but each on is
expecting a few Responses.

Any ideas ?
Thanks

For example (increased to 567062 while I was writing this email)
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 018372664517
0 0
RequestResponseStage  0 010731370183
0 0
MutationStage32565879  295492216
0 0
ReadRepairStage   0 0  23654
0 0
ReplicateOnWriteStage 0 07733659
0 0
GossipStage   0 03502922
0 0
AntiEntropyStage  0 0   1631
0 0
MigrationStage0 0  0
0 0
MemtablePostFlusher   0 0   5716
0 0
StreamStage   0 0 10
0 0
FlushWriter   0 0   5714
0   499
FILEUTILS-DELETE-POOL 0 0773
0 0
MiscStage 0 0   1266
0 0
FlushSorter   0 0  0
0 0
AntiEntropySessions   0 0 18
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 0 0   1798
0 0


Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0 1223769753
Responses   n/a 4 1627481305



Re: Cassandra JVM heap size

2011-10-03 Thread Yi Yang
Someone has just talked about the heap size in this mail list, says that bigger 
heap size will result into a longer GC phase, that could probably be one of the 
reason not using larger heap size.

But I have really heard of some others using Cassandra with some 60 gigabytes 
of heap size.

從我的 BlackBerry® 無線裝置

-Original Message-
From: Ramesh Natarajan rames...@gmail.com
Date: Mon, 3 Oct 2011 21:47:08 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Cassandra JVM heap size

I was reading an article @
http://www.acunu.com/products/choosing-cassandra/and it mentions
cassandra cannot benefit from more than 8GB allocated to JVM
heap.  Is this true?  Are these cassandra installations with larger heap
sizes? We are planning to have a cluster of 6 nodes with each node running
with about 100 GB or so RAM. Will this be a problem?

thanks
Ramesh

from http://www.acunu.com/products/choosing-cassandra/

Memory Ceiling

Cassandra typically cannot benefit from more than 8GB of RAM allocated to
the Java heap, imposing a hard limit on data size. Taking advantage of big
servers with lots of memory or many disks is no problem for Acunu. Thereʼs
no memory ceiling for Acunu and as a result, no data ceiling either. Need to
use larger servers? Go ahead.



Re: release mmap memory through jconsole?

2011-09-30 Thread Yi Yang
It is meaningless to release such memory. The counting includes the data you 
reached in the SSTable. Those data locates on your hard drive. So it is not the 
RAM spaces you have actually used.

-Y.
--Original Message--
From: Yang
To: user@cassandra.apache.org
ReplyTo: user@cassandra.apache.org
Subject: release mmap memory through jconsole?
Sent: Oct 1, 2011 12:40 AM

I gave an -Xmx50G to my Cassandra java processs, now top shows its
virtual memory address space is 82G, is there
a way to release that memory through JMX ?

Thanks
Yang

?? BlackBerry?0?3 ?o???b??

Re: release mmap memory through jconsole?

2011-09-30 Thread Yi Yang
Is it? Heard that twitter uses 60G, if I have remembered correctly.

--Original Message--
From: Norman Maurer
To: user@cassandra.apache.org
To: i...@iyyang.com
Subject: Re: release mmap memory through jconsole?
Sent: Oct 1, 2011 12:55 AM

I would also not use such a big heap. I think most people will tell
you that 12G -16G is max to use.

Bye,
Norman

2011/9/30 Yi Yang i...@iyyang.com:
 It is meaningless to release such memory. The counting includes the data you 
 reached in the SSTable. Those data locates on your hard drive. So it is not 
 the RAM spaces you have actually used.

 -Y.
 --Original Message--
 From: Yang
 To: user@cassandra.apache.org
 ReplyTo: user@cassandra.apache.org
 Subject: release mmap memory through jconsole?
 Sent: Oct 1, 2011 12:40 AM

 I gave an -Xmx50G to my Cassandra java processs, now top shows its
 virtual memory address space is 82G, is there
 a way to release that memory through JMX ?

 Thanks
 Yang

 從我的 BlackBerry(R) 無線裝置

從我的 BlackBerry® 無線裝置

Re: Is LexicalUUID a good option for generating Ids

2011-09-29 Thread Yi Yang
I don't know if I understand correctly that UUIDs are good unless you have a 
specific reading pattern.   In the latter case you can develop a better 
compound row key.

Yi
從我的 BlackBerry® 無線裝置

-Original Message-
From: Ramesh S investt...@gmail.com
Date: Thu, 29 Sep 2011 16:26:05 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Re: Is LexicalUUID a good option for generating Ids

Thanks Aaron.
Appreciate your valuable input/advice.

regards,
Ramesh

On Thu, Sep 29, 2011 at 4:21 PM, aaron morton aa...@thelastpickle.comwrote:

 UUID will be fine, LexicalUUID should be used for version 2,3,4 and 5
 UUID's. TimeUUID for version 1.

 A

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/09/2011, at 5:48 AM, Ramesh S wrote:

  We have to assign Id for each item in our database. Item is linked to geo
 location and hence would need hundreds of millions of Ids.
  So is LexicalUUID a good option ?
 
  regards,
  Ramesh





Re: create super column family for

2011-09-29 Thread Yi Yang
Which version are you using? In my memory 0.8.3 cannot do it correctly but 
later versions fixed the bug.


從我的 BlackBerry® 無線裝置

-Original Message-
From: Ramesh S investt...@gmail.com
Date: Thu, 29 Sep 2011 15:23:29 
To: user@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: create super column family for

I am trying to create a super column family using Cli command.
But I am not getting it.

The structure is

SCFProductCategory
SuperColumnName#ProductType
RowKey#productCatId
+subProdName
+lenght
+width

I tried a lot many ways but I can't find the right way to get this done.
Something like this give me error - mismatched input 'column' expecting
Identifier
create column family ProductCategory
with column_type = 'Super'
and comparator = UTF8Type
with column family productCatId
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: subProdName, validation_class: UTF8Type}
{column_name: lenght, validation_class: UTF8Type}
{column_name: width, validation_class: UTF8Type}
];

Appreciate any help

regards
Ramesh



Re: How can I patch a single issue

2011-08-23 Thread Yi Yang
Thanks Jonathan, and thanks Peter.

How do u guys use the mail list? I'm using a mail client and this e-mail didn't 
group up until i found it today...

On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote:

 I think this is what you want:
 https://github.com/stuhood/cassandra/tree/file-format-and-promotion
 
 On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 https://issues.apache.org/jira/browse/CASSANDRA-674
 But when I downloaded the patch file I can't find the correct trunk to
 patch...
 
 Check it out from git (or svn) and apply to trunk. I'm not sure
 whether it still applies cleanly; given the size of the patch I
 wouldn't be surprised if some rebasing is necessary. You might try a
 trunk from further back in time (around the time Stu submitted the
 patch).
 
 I'm not quite sure what you're actual problem is though, if it's
 source code access then the easiest route is probably to check it out
 from https://github.com/apache/cassandra
 
 --
 / Peter Schuller (@scode on twitter)
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: How can I patch a single issue

2011-08-23 Thread Yi Yang
@Jonathan:
I patched CASSANDRA 2530 on this version, and tested it for our financial 
related case.   It really improved a lot on disk consumption, using only 20% of 
original space for financing-related data storage.   The performance is better 
than MySQL and also it consumes only 1x more than MySQL, much better than 
previous versions.

On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote:

 I think this is what you want:
 https://github.com/stuhood/cassandra/tree/file-format-and-promotion
 
 On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 https://issues.apache.org/jira/browse/CASSANDRA-674
 But when I downloaded the patch file I can't find the correct trunk to
 patch...
 
 Check it out from git (or svn) and apply to trunk. I'm not sure
 whether it still applies cleanly; given the size of the patch I
 wouldn't be surprised if some rebasing is necessary. You might try a
 trunk from further back in time (around the time Stu submitted the
 patch).
 
 I'm not quite sure what you're actual problem is though, if it's
 source code access then the easiest route is probably to check it out
 from https://github.com/apache/cassandra
 
 --
 / Peter Schuller (@scode on twitter)
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



How can I patch a single issue

2011-08-18 Thread Yi Yang
Hi

I'm trying to test a single issue:
https://issues.apache.org/jira/browse/CASSANDRA-674

But when I downloaded the patch file I can't find the correct trunk to patch...

Anyone can help me with it?  Thanks

Steve


Re: Cassandra for numerical data set

2011-08-16 Thread Yi Yang

Thanks Aaron.

 2)
 I'm doing batch writes to the database (pulling data from multiple resources 
 and put them together).   I wish to know if there's some better methods to 
 improve the writing efficiency since it's just about the same speed as 
 MySQL, when writing sequentially.   Seems like the commitlog requires a huge 
 mount of disk IO comparing with my test machine can afford.
 Have a look at http://www.datastax.com/dev/blog/bulk-loading
This is a great tool for me.   I'll try on this tool since it will require much 
lower bandwidth cost and disk IO.

 
 3)
 In my case, each row is read randomly with the same chance.   I have around 
 0.5M rows in total.   Can you provide some practical advices on optimizing 
 the row cache and key cache?   I can use up to 8 gig of memory on test 
 machines.
 If your data set small enough to fit in memory ? . You may also be interested 
 in the row_cache_provider setting for column families, see the CLI help for 
 create column family and the IRowCacheProvider interface. You can replace the 
 caching strategy if you want to.  
The dataset is about 150 Gig storing as CSV and estimated as 1.3T storing as 
SSTable.   Hence I don't think it can fit into memory.I'll try the caching 
strategy a little bit but I think it can improve my case a little bit.

I'm now looking into some native compression on SSTable, just patched the 
CASSANDRA-47 and found there is a huge performance penalty in my use case, and 
I haven't figured out the reason yet.   I suppose CASSANDRA-647 will solve it 
better, however I seek there's a number of tickets working at a similar issue, 
including CASSANDRA-1608 etc.   Is that because cassandra really cost a huge 
disk space?

Well my target is to simply get the 1.3T compressed to 700 Gig so that I can 
fit it into a single server, while keeping the same level of performance.

Best,
Steve


On Aug 16, 2011, at 2:27 PM, aaron morton wrote:

 
 
 Hope that helps. 
 
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/08/2011, at 12:44 PM, Yi Yang wrote:
 
 Dear all,
 
 I wanna report my use case, and have a discussion with you guys.
 
 I'm currently working on my second Cassandra project.   I got into somehow a 
 unique use case: storing traditional, relational data set into Cassandra 
 datastore, it's a dataset of int and float numbers, no more strings, no more 
 other data and the column names are much longer than the value itself.   
 Besides, row-key is the md-5 hash ver3 UUID of some other data.
 
 1)
 I did some workaround to make it save some disk space however it still takes 
 approximately 12-15x more disk space than MySQL.   I looked into Cassandra 
 SSTable internal, did some optimizing on selecting better data serializer 
 and also hashed the column name into one byte.   That made the current 
 database having ~6x overhead on disk space comparing with MySQL, which I 
 think it might be acceptable.
 
 I'm currently interested into CASSANDRA-674 and will also test CASSANDRA-47 
 in the coming days.   I'll keep you updated on my testing.   But I'm willing 
 to hear your idea on saving disk space.
 
 2)
 I'm doing batch writes to the database (pulling data from multiple resources 
 and put them together).   I wish to know if there's some better methods to 
 improve the writing efficiency since it's just about the same speed as 
 MySQL, when writing sequentially.   Seems like the commitlog requires a huge 
 mount of disk IO comparing with my test machine can afford.
 
 3)
 In my case, each row is read randomly with the same chance.   I have around 
 0.5M rows in total.   Can you provide some practical advices on optimizing 
 the row cache and key cache?   I can use up to 8 gig of memory on test 
 machines.
 
 Thanks for your help.
 
 
 Best,
 
 Steve
 
 
 



Re: Cassandra adding 500K + Super Column Family

2011-08-16 Thread Yi Yang
Sounds like it's a similar case as mine.   The files are definitely, extremely 
big, 10x space overhead should be a good case if you are just putting values 
into it.

I'm currently testing CASSANDRA-674 and hopes the better SSTable can solve the 
space overhead problem.   Please follow my e-mail today and I'll continuously 
work on it today.

If your values are integer and floats, with column name containing ~4 
characters, as estimated from my case it will cost you 1~2TB of disk space.

Best,
Steve

On Aug 16, 2011, at 4:20 PM, aaron morton wrote:

 Are you planning to create 500,000 Super Column Families or 500,000 rows in a 
 single Super Column Family ? 
 
 The former is a somewhat crazy. Cassandra schemas typically have up to a few 
 tens of Column Families. Each column family involves a certain amount of 
 memory overhead, this is now automatically managed in Cassandra 0.8 (see 
 http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/)
 
 if I understand correctly you have 500K entities with 6k columns each. A 
 simple first approach to modelling this would be to use a Standard CF with a 
 row for each entity. However the best model is the one that serves your read 
 requests best. 
 
 Also for background the sub columns in a super column are not indexed see 
 http://wiki.apache.org/cassandra/CassandraLimitations . You would probably 
 run into this problem if you had 6000 sub columns in a super column. 
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 17/08/2011, at 12:53 AM, Renato Bacelar da Silveira wrote:
 
 I am wondering about a certain volume situation.
 
 I currently load a Keyspace with a certain amount of SCFs.
 
 Each SCF (Super Column Family) represents an entity.
 
 Each Entity may have up to 6000 values.
 
 I am planning to have 500,000 Entities (SCF) with
 6000 Columns (within Super Columns - number of Super Columns
 unknown), and was wondering how much resources something
 like this would require?
 
 I am struggling to have 10,000 SCF with 30 Columns (within SuperColumns),
 I get very large files, and reach a 4Gb heapspace limit very quickly on
 a single node. I use Garbage Collection where needed.
 
 Is there some secret to load 500,000 Super Column Families?
 
 Regards.
 -- 
 Renato da Silveira
 Senior Developer
 



Re: Cassandra for numerical data set

2011-08-16 Thread Yi Yang
BTW,
If I'm going to insert a SCF row with ~400 columns and ~50 subcolumns under 
each column, how often should I do a mutation? per column or per row?


On Aug 16, 2011, at 3:24 PM, Yi Yang wrote:

 
 Thanks Aaron.
 
 2)
 I'm doing batch writes to the database (pulling data from multiple 
 resources and put them together).   I wish to know if there's some better 
 methods to improve the writing efficiency since it's just about the same 
 speed as MySQL, when writing sequentially.   Seems like the commitlog 
 requires a huge mount of disk IO comparing with my test machine can afford.
 Have a look at http://www.datastax.com/dev/blog/bulk-loading
 This is a great tool for me.   I'll try on this tool since it will require 
 much lower bandwidth cost and disk IO.
 
 
 3)
 In my case, each row is read randomly with the same chance.   I have around 
 0.5M rows in total.   Can you provide some practical advices on optimizing 
 the row cache and key cache?   I can use up to 8 gig of memory on test 
 machines.
 If your data set small enough to fit in memory ? . You may also be 
 interested in the row_cache_provider setting for column families, see the 
 CLI help for create column family and the IRowCacheProvider interface. You 
 can replace the caching strategy if you want to.  
 The dataset is about 150 Gig storing as CSV and estimated as 1.3T storing as 
 SSTable.   Hence I don't think it can fit into memory.I'll try the 
 caching strategy a little bit but I think it can improve my case a little bit.
 
 I'm now looking into some native compression on SSTable, just patched the 
 CASSANDRA-47 and found there is a huge performance penalty in my use case, 
 and I haven't figured out the reason yet.   I suppose CASSANDRA-647 will 
 solve it better, however I seek there's a number of tickets working at a 
 similar issue, including CASSANDRA-1608 etc.   Is that because cassandra 
 really cost a huge disk space?
 
 Well my target is to simply get the 1.3T compressed to 700 Gig so that I can 
 fit it into a single server, while keeping the same level of performance.
 
 Best,
 Steve
 
 
 On Aug 16, 2011, at 2:27 PM, aaron morton wrote:
 
 
 
 Hope that helps. 
 
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/08/2011, at 12:44 PM, Yi Yang wrote:
 
 Dear all,
 
 I wanna report my use case, and have a discussion with you guys.
 
 I'm currently working on my second Cassandra project.   I got into somehow 
 a unique use case: storing traditional, relational data set into Cassandra 
 datastore, it's a dataset of int and float numbers, no more strings, no 
 more other data and the column names are much longer than the value itself. 
   Besides, row-key is the md-5 hash ver3 UUID of some other data.
 
 1)
 I did some workaround to make it save some disk space however it still 
 takes approximately 12-15x more disk space than MySQL.   I looked into 
 Cassandra SSTable internal, did some optimizing on selecting better data 
 serializer and also hashed the column name into one byte.   That made the 
 current database having ~6x overhead on disk space comparing with MySQL, 
 which I think it might be acceptable.
 
 I'm currently interested into CASSANDRA-674 and will also test CASSANDRA-47 
 in the coming days.   I'll keep you updated on my testing.   But I'm 
 willing to hear your idea on saving disk space.
 
 2)
 I'm doing batch writes to the database (pulling data from multiple 
 resources and put them together).   I wish to know if there's some better 
 methods to improve the writing efficiency since it's just about the same 
 speed as MySQL, when writing sequentially.   Seems like the commitlog 
 requires a huge mount of disk IO comparing with my test machine can afford.
 
 3)
 In my case, each row is read randomly with the same chance.   I have around 
 0.5M rows in total.   Can you provide some practical advices on optimizing 
 the row cache and key cache?   I can use up to 8 gig of memory on test 
 machines.
 
 Thanks for your help.
 
 
 Best,
 
 Steve
 
 
 
 



Cassandra for numerical data set

2011-08-15 Thread Yi Yang
Dear all,

I wanna report my use case, and have a discussion with you guys.

I'm currently working on my second Cassandra project.   I got into somehow a 
unique use case: storing traditional, relational data set into Cassandra 
datastore, it's a dataset of int and float numbers, no more strings, no more 
other data and the column names are much longer than the value itself.   
Besides, row-key is the md-5 hash ver3 UUID of some other data.

1)
I did some workaround to make it save some disk space however it still takes 
approximately 12-15x more disk space than MySQL.   I looked into Cassandra 
SSTable internal, did some optimizing on selecting better data serializer and 
also hashed the column name into one byte.   That made the current database 
having ~6x overhead on disk space comparing with MySQL, which I think it might 
be acceptable.

I'm currently interested into CASSANDRA-674 and will also test CASSANDRA-47 in 
the coming days.   I'll keep you updated on my testing.   But I'm willing to 
hear your idea on saving disk space.

2)
I'm doing batch writes to the database (pulling data from multiple resources 
and put them together).   I wish to know if there's some better methods to 
improve the writing efficiency since it's just about the same speed as MySQL, 
when writing sequentially.   Seems like the commitlog requires a huge mount of 
disk IO comparing with my test machine can afford.

3)
In my case, each row is read randomly with the same chance.   I have around 
0.5M rows in total.   Can you provide some practical advices on optimizing the 
row cache and key cache?   I can use up to 8 gig of memory on test machines.

Thanks for your help.


Best,

Steve




Re: column metadata and sstable

2011-08-11 Thread Yi Yang
Thanks Aaron,

This is the same as I've thought. But I'm wondering if there's come triggers 
that can hash the column name in order to save disk space. Or do you think it's 
better to have this feature?

Best,
Steve



On Aug 6, 2011, at 7:06 PM, aaron morton wrote:

 AFAIK it just makes it easier for client API's to understand what data type 
 to use. e.g. it can give your code a long rather than a str / byte array . 
 
 Personally I'm on the fence about using it. It has some advantages to the 
 client, but given the server does not really need the information it feels a 
 little like additional coupling that's not needed . 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6 Aug 2011, at 11:58, Yi Yang wrote:
 
 Dear all,
 
 I'm wondering what's the advantage of assigning column metadata when NOT 
 using secondary indices.
 
 I've gone through the SSTable internal and found out it won't do such 
 conversion.Thus I think the only advantage we got via column metadata is 
 a data validation type, am I correct?
 
 Thanks.
 Steve
 



column metadata and sstable

2011-08-05 Thread Yi Yang
Dear all,

I'm wondering what's the advantage of assigning column metadata when NOT using 
secondary indices.

I've gone through the SSTable internal and found out it won't do such 
conversion.Thus I think the only advantage we got via column metadata is a 
data validation type, am I correct?

Thanks.
Steve


Re: Schema Disagreement

2011-08-05 Thread Yi Yang
Thanks Aaron.
On Aug 2, 2011, at 3:04 AM, aaron morton wrote:

 Hang on, using brain now. 
 
 That is triggering a small bug in the code see 
 https://issues.apache.org/jira/browse/CASSANDRA-2984
 
 For not just remove the column meta data. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 2 Aug 2011, at 21:19, aaron morton wrote:
 
 What do you see when you run describe cluster; in the cassandra-cli ? Whats 
 the exact error you get and is there anything in the server side logs ?
 
 Have you added other CF's before adding this one ? Did the schema agree 
 before starting this statement?
 
 I ran the statement below on the current trunk and it worked. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 2 Aug 2011, at 12:08, Dikang Gu wrote:
 
 I thought the schema disagree problem was already solved in 0.8.1...
 
 On possible solution is to decommission the disagree node and rejoin it.
 
 
 On Tue, Aug 2, 2011 at 8:01 AM, Yi Yang yy...@me.com wrote:
 Dear all,
 
 I'm always meeting mp with schema disagree problems while trying to create 
 a column family like this, using cassandra-cli:
 
 create column family sd
with column_type = 'Super'
and key_validation_class = 'UUIDType'
and comparator = 'LongType'
and subcomparator = 'UTF8Type'
and column_metadata = [
{
column_name: 'time',
validation_class : 'LongType'
},{
column_name: 'open',
validation_class : 'FloatType'
},{
column_name: 'high',
validation_class : 'FloatType'
},{
column_name: 'low',
validation_class : 'FloatType'
},{
column_name: 'close',
validation_class : 'FloatType'
},{
column_name: 'volumn',
validation_class : 'LongType'
},{
column_name: 'splitopen',
validation_class : 'FloatType'
},{
column_name: 'splithigh',
validation_class : 'FloatType'
},{
column_name: 'splitlow',
validation_class : 'FloatType'
},{
column_name: 'splitclose',
validation_class : 'FloatType'
},{
column_name: 'splitvolume',
validation_class : 'LongType'
},{
column_name: 'splitclose',
validation_class : 'FloatType'
}
]
 ;
 
 I've tried to erase everything and restart Cassandra but this still 
 happens.   But when I clear the column_metadata section this no more 
 disagreement error.   Do you have any idea why this happens?
 
 Environment: 2 VMs, using the same harddrive, Cassandra 0.8.1, Ubuntu 10.04
 This is for testing only.   We'll move to dedicated servers later.
 
 Best regards,
 Yi
 
 
 
 -- 
 Dikang Gu
 
 0086 - 18611140205
 
 
 



Schema Disagreement

2011-08-01 Thread Yi Yang
Dear all,

I'm always meeting mp with schema disagree problems while trying to create a 
column family like this, using cassandra-cli:

create column family sd
with column_type = 'Super' 
and key_validation_class = 'UUIDType'
and comparator = 'LongType'
and subcomparator = 'UTF8Type'
and column_metadata = [
{
column_name: 'time', 
validation_class : 'LongType'
},{
column_name: 'open', 
validation_class : 'FloatType'
},{
column_name: 'high', 
validation_class : 'FloatType'
},{
column_name: 'low', 
validation_class : 'FloatType'
},{
column_name: 'close', 
validation_class : 'FloatType'
},{
column_name: 'volumn', 
validation_class : 'LongType'
},{
column_name: 'splitopen', 
validation_class : 'FloatType'
},{
column_name: 'splithigh', 
validation_class : 'FloatType'
},{
column_name: 'splitlow', 
validation_class : 'FloatType'
},{
column_name: 'splitclose', 
validation_class : 'FloatType'
},{
column_name: 'splitvolume',
validation_class : 'LongType'
},{
column_name: 'splitclose',
validation_class : 'FloatType'
}
]
;

I've tried to erase everything and restart Cassandra but this still happens.   
But when I clear the column_metadata section this no more disagreement error.   
Do you have any idea why this happens?

Environment: 2 VMs, using the same harddrive, Cassandra 0.8.1, Ubuntu 10.04
This is for testing only.   We'll move to dedicated servers later.

Best regards,
Yi


Re: [RELEASE] 0.8.0

2011-06-08 Thread Yi Yang
Is there anyone willing to upgrade the libcassandra for C++, to support new 
features in 0.8.0?
Or has anyone started to work on it?

Thanks


On Jun 3, 2011, at 7:36 AM, Eric Evans wrote:

 
 I am very pleased to announce the official release of Cassandra 0.8.0.
 
 If you haven't been paying attention to this release, this is your last
 chance, because by this time tomorrow all your friends are going to be
 raving, and you don't want to look silly.
 
 So why am I resorting to hyperbole?  Well, for one because this is the
 release that debuts the Cassandra Query Language (CQL).  In one fell
 swoop Cassandra has become more than NoSQL, it's MoSQL.
 
 Cassandra also has distributed counters now.  With counters, you can
 count stuff, and counting stuff rocks.
 
 A kickass use-case for Cassandra is spanning data-centers for
 fault-tolerance and locality, but doing so has always meant sending data
 in the clear, or tunneling over a VPN.   New for 0.8.0, encryption of
 intranode traffic.
 
 If you're not motivated to go upgrade your clusters right now, you're
 either not easily impressed, or you're very lazy.  If it's the latter,
 would it help knowing that rolling upgrades between releases is now
 supported?  Yeah.  You can upgrade your 0.7 cluster to 0.8 without
 shutting it down.
 
 You see what I mean?  Then go read the release notes[1] to learn about
 the full range of awesomeness, then grab a copy[2] and become a
 (fashionably )early adopter.
 
 Drivers for CQL are available in Python[3], Java[3], and Node.js[4].
 
 As usual, a Debian package is available from the project's APT
 repository[5].
 
 Enjoy!
 
 
 [1]: http://goo.gl/CrJqJ (NEWS.txt)
 [2]: http://cassandra.debian.org/download
 [3]: http://www.apache.org/dist/cassandra/drivers
 [4]: https://github.com/racker/node-cassandra-client
 [5]: http://wiki.apache.org/cassandra/DebianPackaging
 
 -- 
 Eric Evans
 eev...@rackspace.com