Re: Data Modeling- another question

2012-08-28 Thread samal
yes, you are right, it depend on use cases.
I suggested it is a better choice not only choice. JSON will be better if
any filed change re-write whole data without reading.
I tend to use JSON more, where my data does not change or very rarely, Like
storing demoralized JSON data for analytic purpose.
I prefer CF and [:scoped] method for frequently updating filed.
{
this.user.cart.category.p1.name:''
this.user.cart.category.p1.unit:''
this.user.cart.category.p1.desc:''
this.user.cart.category.p2.name:''
this.user.cart.category.p2.unit:''
this.user.cart.category.p2.desc:''
}

Yes you are right, Its really about understating app data and its behavior,
not JSON or column, according to that designing DM.

On Tue, Aug 28, 2012 at 12:20 PM, Guy Incognito dnd1...@gmail.com wrote:

  i would respectfully disagree, what you have said is true but it really
 depends on the use case.

 1) do you expect to be doing updates to individual fields of an item, or
 will you always update all fields at once?  if you are doing separate
 updates then the first is definitely easier to handle updates.
 2) do you expect to do paging of the list?  this will be easier with the
 json approach, as in the first your item may span across a page boundary -
 not an insurmountable problem by any means, but more complicated
 nonetheless.  this is not
 an issue obviously if all your items have the same number of fields.
 3) do you expect to read or delete multiple items individually?  you may
 have to do multiple reads/deletes of a row if the items are not adjacent to
 each other as you cannot do 'disjoint' slices of columns at the moment.
 with the json approach you can just specify individual columns and you're
 done.  again this is less of an issue if items have a known set of fields,
 but your list of columns to read/delete may get quite large fairly quickly

 the first is definitely better if you want to update individual fields,
 read-then-write is not a good idea in cassandra.  but it is more
 complicated for most usage scenarios, so you have to work out if you really
 need the extra flexibility.


 On 24/08/2012 13:54, samal wrote:

 First is better choice, each filed can be updated separately(write only).
 Second you have to take care json yourself (read first-modify-then write).

 On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:

 Hi,

 Suppose I have a column family to associate a user to a dynamic list of
 items. I want to store 5-10 key  information about the item,  no specific
 sorting requirements are there.
 I have two options

 A) use composite columns
 UserId1 : {
  itemid1:Name = Betty Crocker,
  itemid1:Descr = Cake
 itemid1:Qty = 5
  itemid2:Name = Nutella,
  itemid2:Descr = Choc spread
 itemid2:Qty = 15
 }

 B) use a json with the data
 UserId1 : {
  itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
  itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
 }

 Which do you suggest would be better?


 Regards,
 Roshni

 This email and any files transmitted with it are confidential and
 intended solely for the individual or entity to whom they are addressed. If
 you have received this email in error destroy it immediately. *** Walmart
 Confidential ***






Re: Data Modeling- another question

2012-08-24 Thread samal
First is better choice, each filed can be updated separately(write only).
Second you have to take care json yourself (read first-modify-then write).

On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Hi,

 Suppose I have a column family to associate a user to a dynamic list of
 items. I want to store 5-10 key  information about the item,  no specific
 sorting requirements are there.
 I have two options

 A) use composite columns
 UserId1 : {
  itemid1:Name = Betty Crocker,
  itemid1:Descr = Cake
 itemid1:Qty = 5
  itemid2:Name = Nutella,
  itemid2:Descr = Choc spread
 itemid2:Qty = 15
 }

 B) use a json with the data
 UserId1 : {
  itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
  itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
 }

 Which do you suggest would be better?


 Regards,
 Roshni

 This email and any files transmitted with it are confidential and intended
 solely for the individual or entity to whom they are addressed. If you have
 received this email in error destroy it immediately. *** Walmart
 Confidential ***



Re: Effect of rangequeries with RandomPartitioner

2012-07-09 Thread samal
inline resp.

On Mon, Jul 9, 2012 at 10:18 AM, prasenjit mukherjee
prasen@gmail.comwrote:

 Thanks Aaron for your response. Some follow up
 questions/assumptions/clarifications :

 1. With RandomPartitioner, on a given node, are the keys  sorted by
 their hash_values or original/unhashed keys  ?


hash value,


 2. With RandomPartitioner, on a given node, are the columns (for a
 given key)   always sorted by their column_names ?


yes, depends on comparator.


 3. From what I understand,  token = hash(key) for a RandomPartitioner,
 and hence any key-range queries will return bogus results.


correct.


 Although I
 believe column-range-queries should succeed even in RP if they are
 always sorted by column_names.

 correct, depends on comparator.


-Thanks,
 Prasenjit

 On Mon, Jul 9, 2012 at 12:17 AM, aaron morton aa...@thelastpickle.com
 wrote:
  for background
  http://wiki.apache.org/cassandra/FAQ#range_rp
 
  It maps the start key to a token, and then scans X rows from their on CL
  number of nodes. Rows are stored in token order.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 7/07/2012, at 11:52 PM, prasenjit mukherjee wrote:
 
  Wondering how a rangequery request is handled if RP is used.  Will the
  receiving node do a fan-out to all the nodes in the ring or it will
  just execute the rangequery on its own local partition ?
 
  --
  Sent from my mobile device
 
 



Re: Supercolumn behavior on writes

2012-06-13 Thread samal
 You can't 'invent' columns on the fly, everything has

 to be declared when you declare the column family.


 That' s incorrect. You can define name on fly. Validation must be define
when declaring CF


Re: Supercolumn behavior on writes

2012-06-13 Thread samal
I have just check on datastax blog,  CQL3 does not support, I am not
aware.

But as a whole we can via client lib using cql.

On Thu, Jun 14, 2012 at 9:12 AM, Dave Brosius dbros...@mebigfatguy.comwrote:

  Via thrift, or a high level client on thrift, see as an example

 http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1


 On 06/13/2012 11:08 PM, Greg Fausak wrote:

 Interesting.

 How do you do it?

 I have a version 2 CF, that works fine.
 A version 3 table won't let me invent columns that
 don't exist yet. (for composite tables).  What's the trick?

 cqlsh -3 cas1
 use onplus;
 cqlsh:onplus select * from at_event where ac_event_id = 7690254;
  ac_event_id | ac_creation  | ac_event_type | ac_id | ev_sev
 -+--+---+---+
  7690254 | 2011-07-23 00:11:47+ | SERV.CPE.CONN |   \N |  5
 cqlsh:onplus update at_event set wingy = 'toto' where ac_event_id = 7690254;
 Bad Request: Unknown identifier wingy

 This is what I used to create it:
 //
 // create the event column family, this contains the static
 // part of the definition.  many additional columns can be specified
 // in the port from relational, these would be mainly the at_event table
 //

 use onplus;

 create columnfamily
 at_event
 (
 ac_event_id int PRIMARY KEY,
 ac_event_type text,
 ev_sev int,
 ac_id text,
 ac_creation timestamp
 ) with compression_parameters:sstable_compression = ''
 ;

 -g




 On Wed, Jun 13, 2012 at 9:36 PM, samal samalgo...@gmail.com 
 samalgo...@gmail.com wrote:

   You can't 'invent' columns on the fly, everything has

  to be declared when you declare the column family.


   That' s incorrect. You can define name on fly. Validation must be define
 when declaring CF






Re: about multitenant datamodel

2012-06-05 Thread samal
why do you think so? I'll let users create ristricted CFs, and limit a
number of CFs which users create.

 is it still a bad one?

 Ok, get it, you want to limit the cf user can create (assume) 2, what
about 10k shared users creating 2 cf each= 20k CF ~~20GB memory used with
no data in it. Do you think it is good one?

 I can think of your data model like , S3 or shared hosting is limit
Keysapce and cf to fixed number.
In Cassandra key and column name is very powerful, you can do anything you
want, design DM anyway you want.

Here is the approach I probably will take.

   - Limit the user to key, user cannot create/delete cf,
   - All user will share same cf.
   - Give unique signature (which MUST NOT clash)  to each user like
*username==anyothermarker::[[actual
   key name].n]* utf8 only
   - Each user will always prefix this signature in all cf when inserting
   and reading data.
   - Like S3 bucket check signature before creating new one for new user.
   - Each key for user will be like bucket, all columns can be bucket data.

Eg
1)
profileCF{

  *user1==123456::*profile{
 /* user1 profile*/
  } ,
  *user2==444::*profile{
 /* user2 profile*/
  } ,
}

2)
actvityCF{

  *user1==123456::*activity{
 /* user1 activity columns here*/
  } ,
  *user2==**444**::*activity{
 /* user2 activity columns here*/
  } ,
}

marker cf that will keep all unique  signature fro users. So it can be
queried while creating new one.

bucketMarkerCF{
 *user2==**444*:{
username:
 }
 *user1==2323*:{
username:
 }

}

problem with this approach is user may not have liberty to define their own
data model. Good for fixed pattern data: logger, hits, geodata.

/Samal





 On Thu, 31 May 2012 06:44:05 +0900, aaron morton aa...@thelastpickle.com
 wrote:

  - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
 cassandra creates 1,000 keyspaces…)

 It's not keyspaces, but the number of column families.

 Without storing any data each CF uses about 1MB of ram. When they start
 storing and reading data they use more.

 IMHO a model that allows external users to create CF's is a bad one.

 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/05/2012, at 12:52 PM, Toru Inoko wrote:

  Hi, all.

 I'm designing data api service(like cassandra.io but not using
 dedicated server for each user) on cassandra 1.1 on which users can do
 DML/DDL method like cql.
 Followings are api which users can use( almost same to cassandra api).
 - create/read/delete ColumnFamilies/Rows/Columns

 Now I'm thinking about multitenant datamodel on that.
 My data model like the following.
 I'm going to prepare a keyspace for each user as a user's tenant space.

 | keyspace1 | --- | column family |
 |(for user1)|  |
  ...

 | keyspace2 | --- | column family |
 |(for user2)|  |
  ...

 Followings are my question!
 - Is this data model a good for multitenant?
 - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
 cassandra creates 1,000 keyspaces...)

 please, help.
 thank you in advance.

 Toru Inoko.




 --
 --**-
 SCSK株式会社
 技術・品質・情報グループ 技術開発部
 先端技術課

 猪子 徹(Toru Inoko)
 tel   : 03-6438-3544
 mail  : in...@ms.scsk.jp
 --**-




 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




Re: How to include two nodes in Java code using Hector

2012-06-05 Thread samal
I don't use hector, don't know much about internals, this may help

*  Cluster cluster = HFactory.getOrCreateCluster(
TestCluster,host1:9160,host2:9160,host3:9160)*

If you have 2 node cluster with RF=2, your data will be present in both
node. And if consistency level 2 is used both node must be UP to read and
write.

It doesn't matter which node you connect, if your data is present in
cluster it will be read directly or through coordinator node.

Read hector doc-
http://hector-client.github.com/hector/build/html/documentation.html

/Samal

On Wed, Jun 6, 2012 at 8:35 AM, Prakrati Agrawal 
prakrati.agra...@mu-sigma.com wrote:

  But the data is distributed on the nodes ( meaning 50% of data is on one
 node and 50% of data is on another node) so I need to specify the node ip
 address somewhere in the code. But where do I specify that is what I am
 clueless about. Please help me

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
 *Sent:* Tuesday, June 05, 2012 5:51 PM
 *To:* user@cassandra.apache.org
 *Subject:* RE: How to include two nodes in Java code using Hector

 ** **

 Use Consistency Level =2.

 ** **

 Regards

 Harsh

 ** **

 *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
 *Sent:* Tuesday, June 05, 2012 4:08 PM
 *To:* user@cassandra.apache.org
 *Subject:* How to include two nodes in Java code using Hector

 ** **

 Dear all

 ** **

 I am using a two node Cassandra cluster. How do I code in Java using
 Hector to get data from both the nodes. Please help

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



Re: Adding a new node to Cassandra cluster

2012-06-04 Thread samal
If you use thrift API, you have to maintain lot of low level code by
yourself which is already being polished by HLC  hector, pycassa also with
HLC your can easily switch between thrift and growing CQL.

On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote:

 You might consider using a higher level client (like Hector indeed). If
 you don't want this you will have to write your own connection pool. For
 start take a look at Hector. But keep in mind that you might be
 reinventing the wheel.


 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Hi,

 ** **

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

 ** **

 Hi there,

 ** **

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

 ** **

 Good luck!

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




Re: Cassandra Data Archiving

2012-05-31 Thread samal
I believe you are talking about HDD space, consumed by user generated
data which is no longer required after 15 days or may required.
First case to use TTL which you don't wan to use. 2nd as aaron pointed
snapshotting data, but data still exist in cluster, only used for back up.

I think of like using column family bucket, 15 day a bucket , 2 bucket a
month.

Creating new cf every 15th day with time-stamp marker trip_offer_cf_[ts
-ts%(86400*15)], caching cf name in app for 15 days, after 15th day old cf
bucket will be read only, no write goes into it, snapshotting that
old_cf_bucket _data, and deleting that cf few days later, this will keep cf
count fixed.

current cf count=n,
bucket cf count= b*n

using separate cluster old data analytic.

/Samal

On Fri, Jun 1, 2012 at 9:58 AM, Harshvardhan Ojha 
harshvardhan.o...@makemytrip.com wrote:

  Problem statement:

 We are keeping daily generated data(user generated content)  in
 Cassandra, but our application is using only 15 days old data. So how can
 we archive data older than 15 days so that we can reduce load on
 Cassandra ring.

 ** **

 Note : we can’t apply TTL, as this data may be needed in future.

 ** **

 ** **

 *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Friday, June 01, 2012 6:57 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Cassandra Data Archiving

 ** **

 I'm not sure on your needs, but the simplest thing to consider is
 snapshotting and copying off node. 

 ** **

 Cheers

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:



 

 I need to archive my Cassandra data into another  permanent storage .

  

 Two intent

  

 1.To shed the unused data from the Live data.

  

 2.To use the archived data for getting some analytics out or a potential
 source of DataWarehouse.

  

 Any recommendations for the same in terms of strategies or tools to use.**
 **

  

 Regards,

 *Shubham Srivastava* *|* Technical Lead - Technology Development

 +91 124 4910 548   |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase
 1, Gurgaon, Haryana - 122 016, India

 image001.gif*What's new?* My Trip Rewards - An exclusive loyalty
 program for MakeMyTrip customers. https://rewards.makemytrip.com/MTR

 image002.gif http://www.makemytrip.com/

 image003.gifhttp://www.makemytrip.com/support/gurgaon-travel-agent-office.php
 *Office Map*

 image004.gifhttp://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1
 *Facebook*

 image005.gif http://twitter.com/makemytripdeals
 *Twitter*

  

 ** **



Re: Query on how to count the total number of rowkeys and columns in them

2012-05-24 Thread samal
default count is 100, set this to some max value, but this won't guarantee
actual count.

Something like paging can help in counting. Get the last key as start in
second query, end as null, count as some value. But this will port data to
client where as we only need count.

Other solution may be (if count is very necessary) having separate counter
CF, incr whenever key is inserted in other CF.

I will not use Thrift API, clients library is very mature [1]  CQL is also
very good.

[1]
http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_range

/Samal

On Thu, May 24, 2012 at 11:52 AM, Prakrati Agrawal 
prakrati.agra...@mu-sigma.com wrote:

  Hi

 ** **

 I am trying to learn Cassandra and I have one doubt. I am using the Thrift
 API, to count the number of row keys I am using KeyRange to specify the row
 keys. To count all of them, I specify the start and end as “new byte[0]”.
 But the count is set to 100 by default. How do I use this method to count
 the keys if I don’t know the actual number of keys in my Cassandra
 database? Please help me

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



Re: RE Ordering counters in Cassandra

2012-05-22 Thread samal
In some cases Cassandra is really good and in some cases it is not.

The way I see your approach is your are recording all of your events in
single key is it? Not recommended. It can go really big also if your have
cluster of servers, It will hit only one server all the time make it
overwhelm, and rest will sit ideal, take a nap.

I will do like, I will figure out what are similar events that is
occurring, and then bucket by those event.

eg: if event is occurred from IOS or Andriod. I will bucket by IOS and
android KEY, so here counter will give me all events occurred from  IOS or
andriod.

KEY, concat can also be use to filter out more deep: IOS#safari,
andriod#chrome.

Less number of columns will help to reverse index more efficiently.

/Samal

On Mon, May 21, 2012 at 11:53 PM, Tamar Fraenkel ta...@tok-media.comwrote:

 Indeed I took the not delete approach. If time bucket rows are not that
 big, this is a good temporary solution.
 I just finished implementation and testing now on a small staging
 environment. So far so good.
 Tamar

 Sent from my iPod

 On May 21, 2012, at 9:11 PM, Filippo Diotalevi fili...@ntoklo.com wrote:

  Hi Tamar,
 the solution you propose is indeed a temporary solution, but it might be
 the best one.

 Which approach did you follow?
 I'm a bit concerned about the deletion approach, since in case of
 concurrent writes on the same counter you might lose the pointer to the
 column to delete.

 --
 Filippo Diotalevi


 On Monday, 21 May 2012 at 18:51, Tamar Fraenkel wrote:

 I also had a similar problem. I have a temporary solution, which is not
 best, but may be of help.
 I have the coutner cf to count events, but apart from that I hold leaders
 CF:

 leaders = {
   // key is time bucket
   // values are composites(rank, event) ordered by
   // descending order of the rank
   // set relevant TTL on columns
   time_bucket1 : {
 composite(1000,event1) : 
 composite(999, event2) : 
   },
   ...
 }

 Whenever I increment counter for a specific event, I add a column in the
 time bucket row of the leaders CF, with the new value of the counter and
 the event name.
 There are two ways to go here, either delete the old column(s) for that
 event (with lower counters) from leaders CF. Or let them be.
 If you choose to delete, there is the complication of not having getAndSetfor 
 counters, so you may end up not deleting all the old columns.
 If you choose not to  delete old column, and live with duplicate columns
 for events (each with different count), it will make your query to
 retrieve leaders run longer.
 Anyway, when you need to retrieve the leaders, you can do slice query 
 onleaders CF and ignore
 duplicates events using client (I use Java). This will happen less if you
 do delete old columns.

 Another option is not to use Cassandra for that purpose, http://redis.io/ is
 a nice tool

 Will be happy to hear you comments.
 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Mon, May 21, 2012 at 8:05 PM, Filippo Diotalevi fili...@ntoklo.comwrote:

 Hi Romain,
 thanks for your suggestion.

 When you say  build every day a ranking in a dedicated CF by iterating
 over events: do you mean
 - load all the columns for the specified row key
 - iterate over each column, and write a new column in the inversed index
 ?

 That's my current approach, but since I have many of these wide rows (1
 per day), the process is extremely slow as it involves moving an entire row
 from Cassandra to client, inverting every column, and sending the data back
 to create the inversed index.

 --
 Filippo Diotalevi


 On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:


 If I understand you've got a data model which looks like this:

 CF Events:
 row1: { event1: 1050, event2: 1200, event3: 830, ... }

 You can't query on column values but you can build every day a ranking in
 a dedicated CF by iterating over events:

 create column family Ranking
 with comparator = 'LongType(reversed=true)'
 ...

 CF Ranking:
 rank: { 1200: event2, 1050: event1, 830: event3, ... }

 Then you can make a top ten or whatever you want because counter values
 will be sorted.


 Filippo Diotalevi fili...@ntoklo.com a écrit sur 21/05/2012 16:59:43 :

  Hi,
  I'm trying to understand what's the best design for a simple
  ranking use cases.
  I have, in a row, a good number (10k - a few 100K) of counters; each
  one is counting the occurrence of an event. At the end of day, I
  want to create a ranking of the most occurred event.
 
  What's the best approach to perform this task?
  The brute force approach of retrieving the row and ordering it
  doesn't work well (the call usually goes timeout, especially is
  Cassandra is also under load); I also don't know in advance the full
  set of event names (column names), so it's difficult to slice the get
 call.
 
  Is there any trick

Re: Number of keyspaces

2012-05-22 Thread samal
Not ideally, now cass has global memtable tuning. Each cf correspond to
memory  in ram. Year wise cf means it will be in read only state for next
year, memtable  will still consume ram.
On 22-May-2012 5:01 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Tue, May 22, 2012 at 9:19 PM, aaron morton aa...@thelastpickle.comwrote:

 It's more the number of CF's than keyspaces.


 Oh - does increasing the number of Column Families affect performance ?

 The design we are working on at the moment is considering using a Column
 Family per year. We were thinking this would isolate compactions to a more
 manageable size as we don't update previous years.

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it depends
 on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I
 remember this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




Re: Astyanax Error

2012-05-22 Thread samal
Host not found in client.
On 22-May-2012 4:34 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote:

 Hi All,

 Can any one suggest me why i am getting this error in Astyanax
 NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
 attempts=0] No hosts to borrow from


 Thanks In Advance
 Abhijit



Re: RE Ordering counters in Cassandra

2012-05-22 Thread samal
Secondary index is not supported for counters plus you must know column
name to support secondary index on regular column.
On 22-May-2012 5:34 PM, Filippo Diotalevi fili...@ntoklo.com wrote:

  Thanks for all the answers, they definitely helped.

 Just out of curiosity, is there any underlying architectural reason why
 it's not possible to order a row based on its counters values? or is it
 something that might be in the roadmap in the future?

 --
 Filippo Diotalevi

 On Tuesday, 22 May 2012 at 08:48, Romain HARDOUIN wrote:


 I mean iterate over each column -- more precisly: *bunches of columns*
 using slices -- and write new columns in the inversed index.
 Tamar's data model is made for real time analysis. It's maybe overdesigned
 for a daily ranking.
 I agree with Samal, you should split your data across the space of tokens.
 Only CF Ranking feeding would be affected, not the top N queries.

 Filippo Diotalevi fili...@ntoklo.com a écrit sur 21/05/2012 19:05:28 :

  Hi Romain,
  thanks for your suggestion.
 
  When you say  build every day a ranking in a dedicated CF by
  iterating over events: do you mean
  - load all the columns for the specified row key
  - iterate over each column, and write a new column in the inversed index
  ?
 
  That's my current approach, but since I have many of these wide rows
  (1 per day), the process is extremely slow as it involves moving an
  entire row from Cassandra to client, inverting every column, and
  sending the data back to create the inversed index.





Re: supercolumns with TTL columns not being compacted correctly

2012-05-22 Thread samal
Data will remain till next compaction but won't be available. Compaction
will delete old sstable create new one.
On 22-May-2012 5:47 PM, Pieter Callewaert pieter.callewa...@be-mobile.be
wrote:

  Hi,

 ** **

 I’ve had my suspicions some months, but I think I am sure about it.

 Data is being written by the SSTableSimpleUnsortedWriter and loaded by the
 sstableloader.

 The data should be alive for 31 days, so I use the following logic:

 ** **

 int ttl = 2678400;

 long timestamp = System.currentTimeMillis() * 1000;

 long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl *
 1000));

 ** **

 And using this to write it:

 ** **

 sstableWriter.newRow(bytes(entry.id));

 sstableWriter.newSuperColumn(bytes(superColumn));

 sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage),
 timestamp, ttl, expirationTimestampMS);

 sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp,
 ttl, expirationTimestampMS);

 ** **

 This works perfectly, data can be queried until 31 days are passed, then
 no results are given, as expected.

 But the data is still on disk until the sstables are being recompacted:***
 *

 ** **

 One of our nodes (we got 6 total) has the following sstables:

 [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G

 -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19
 /data/MapData007/HOS-hc-125620-Data.db

 -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17
 /data/MapData007/HOS-hc-163141-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17
 /data/MapData007/HOS-hc-172106-Data.db

 -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50
 /data/MapData007/HOS-hc-181902-Data.db

 -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37
 /data/MapData007/HOS-hc-191448-Data.db

 -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41
 /data/MapData007/HOS-hc-193842-Data.db

 -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03
 /data/MapData007/HOS-hc-196210-Data.db

 -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20
 /data/MapData007/HOS-hc-196779-Data.db

 -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33
 /data/MapData007/HOS-hc-58572-Data.db

 -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59
 /data/MapData007/HOS-hc-61630-Data.db

 -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46
 /data/MapData007/HOS-hc-63857-Data.db

 -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41
 /data/MapData007/HOS-hc-87900-Data.db

 ** **

 As you can see, the following files should be invalid:

 /data/MapData007/HOS-hc-58572-Data.db

 /data/MapData007/HOS-hc-61630-Data.db

 /data/MapData007/HOS-hc-63857-Data.db

 ** **

 Because they are all written more than an moth ago. gc_grace is 0 so this
 should also not be a problem.

 ** **

 As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db.
 

 Expected behavior should be an empty file is being written because all
 data in the sstable should be invalid:

 ** **

 Compactionstats is giving:

 compaction typekeyspace   column family bytes compacted bytes
 total  progress

Compaction  MapData007 HOS
 11518215662532355279724 2.16%

 ** **

 And when I ls the directory I find this:

 -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12
 /data/MapData007/HOS-tmp-hc-196898-Data.db

 ** **

 The sstable is being 1-on-1 copied to a new one. What am I missing here?**
 **

 TTL works perfectly, but is it giving a problem because it is in a super
 column, and so never to be deleted from disk?

 ** **

 Kind regards

 Pieter Callewaert | Web  IT engineer

  Be-Mobile NV http://www.be-mobile.be/ | 
 TouringMobilishttp://www.touringmobilis.be/
 

  Technologiepark 12b - 9052 Ghent - Belgium

 Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 |  Cell + 32 473 777 121

 ** **



Re: Astyanax Error

2012-05-22 Thread samal
Are you able to connect through cli?
Can you share your client code?
On 22-May-2012 5:59 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote:

 Samal,


 But I am setting up the Host.

 On Tue, May 22, 2012 at 5:30 PM, samal samalgo...@gmail.com wrote:

 Host not found in client.
 On 22-May-2012 4:34 PM, Abhijit Chanda abhijit.chan...@gmail.com
 wrote:

 Hi All,

 Can any one suggest me why i am getting this error in Astyanax
 NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
 attempts=0] No hosts to borrow from


 Thanks In Advance
 Abhijit




 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395




Re: Cassandra 0.8.5: Column name mystery in create column family command

2012-05-22 Thread samal
Change your comparator to utf8type.
On 22-May-2012 4:32 PM, Roshan Dawrani roshandawr...@gmail.com wrote:

 Hi,

 I use Cassandra 0.8.5 and am suddenly noticing some strange behavior. I
 run a create column family command with some column meta-data and it runs
 fine, but when I do describe keyspace, it shows me different column names
 for those index columns.

 a) Here is what I run:
 create column family UserTemplate with comparator=BytesType and
 column_metadata=[{*column_name: userid*, validation_class: UTF8Type,
 index_type: KEYS, index_name: TemplateUserIdIdx}, {*column_name: type*,
 validation_class: UTF8Type, index_type: KEYS, index_name:
 TemplateTypeIdx}];

 b) This is what describe keyspace shows:
 ColumnFamily: UserTemplate
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   ...
   ...
   Built indexes: [UserTemplate.TemplateTypeIdx,
 UserTemplate.TemplateUserIdIdx]
   Column Metadata:
 *Column Name: ff*
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateUserIdIdx
   Index Type: KEYS
 *Column Name: 0dfffaff*
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateTypeIdx
   Index Type: KEYS

 Does anyone see why this must be happening? I have created many such
 column families before and never run into this issue.

 --
 Roshan
 http://roshandawrani.wordpress.com/




Re: Cassandra 0.8.5: Column name mystery in create column family command

2012-05-22 Thread samal
I an not able to reproduce this in cli.
On 22-May-2012 8:12 PM, Roshan Dawrani roshandawr...@gmail.com wrote:

 Can you please let me know why? Because I have created very similar column
 familes many times with comparator = BytesType, and never run into this
 issue before.

 Here is an example:

 
 ColumnFamily: Client
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
   ...
   ...
   Built indexes: [Client.ACUserIdIdx]
   Column Metadata:
 Column Name: userid (757365726964)
   Validation Class: org.apache.cassandra.db.marshal.LexicalUUIDType
   Index Name: ACUserIdIdx
   Index Type: KEYS
 

 On Tue, May 22, 2012 at 6:16 PM, samal samalgo...@gmail.com wrote:

 Change your comparator to utf8type.
 On 22-May-2012 4:32 PM, Roshan Dawrani roshandawr...@gmail.com wrote:

 Hi,

 I use Cassandra 0.8.5 and am suddenly noticing some strange behavior. I
 run a create column family command with some column meta-data and it runs
 fine, but when I do describe keyspace, it shows me different column names
 for those index columns.

 a) Here is what I run:
 create column family UserTemplate with comparator=BytesType and
 column_metadata=[{*column_name: userid*, validation_class: UTF8Type,
 index_type: KEYS, index_name: TemplateUserIdIdx}, {*column_name: type*,
 validation_class: UTF8Type, index_type: KEYS, index_name:
 TemplateTypeIdx}];

 b) This is what describe keyspace shows:
 ColumnFamily: UserTemplate
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   ...
   ...
   Built indexes: [UserTemplate.TemplateTypeIdx,
 UserTemplate.TemplateUserIdIdx]
   Column Metadata:
 *Column Name: ff*
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateUserIdIdx
   Index Type: KEYS
 *Column Name: 0dfffaff*
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateTypeIdx
   Index Type: KEYS

 Does anyone see why this must be happening? I have created many such
 column families before and never run into this issue.

 --
 Roshan
 http://roshandawrani.wordpress.com/




 --
 Roshan
 http://roshandawrani.wordpress.com/




Re: supercolumns with TTL columns not being compacted correctly

2012-05-22 Thread samal
Thanks I didn't knew  two stage removal process.
On 23-May-2012 2:20 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Correction: the first compaction after expiration + gcgs can remove
 it, even if it hasn't been turned into a tombstone previously.

 On Tue, May 22, 2012 at 9:37 AM, Jonathan Ellis jbel...@gmail.com wrote:
  Additionally, it will always take at least two compaction passes to
  purge an expired column: one to turn it into a tombstone, and a second
  (after gcgs) to remove it.
 
  On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita mor.y...@gmail.com
 wrote:
  Data will not be deleted when those keys appear in other stables
 outside of
  compaction. This is to prevent obsolete data from appearing again.
 
  yuki
 
  On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:
 
  Hi Samal,
 
 
 
  Thanks for your time looking into this.
 
 
 
  I force the compaction by using forceUserDefinedCompaction on only that
  particular sstable. This gurantees me the new sstable being written only
  contains the data from the old sstable.
 
  The data in the sstable is more than 31 days old and gc_grace is 0, but
  still the data from the sstable is being written to the new one, while
 I am
  100% sure all the data is invalid.
 
 
 
  Kind regards,
 
  Pieter Callewaert
 
 
 
  From: samal [mailto:samalgo...@gmail.com]
  Sent: dinsdag 22 mei 2012 14:33
  To: user@cassandra.apache.org
  Subject: Re: supercolumns with TTL columns not being compacted correctly
 
 
 
  Data will remain till next compaction but won't be available. Compaction
  will delete old sstable create new one.
 
  On 22-May-2012 5:47 PM, Pieter Callewaert 
 pieter.callewa...@be-mobile.be
  wrote:
 
  Hi,
 
 
 
  I’ve had my suspicions some months, but I think I am sure about it.
 
  Data is being written by the SSTableSimpleUnsortedWriter and loaded by
 the
  sstableloader.
 
  The data should be alive for 31 days, so I use the following logic:
 
 
 
  int ttl = 2678400;
 
  long timestamp = System.currentTimeMillis() * 1000;
 
  long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl *
  1000));
 
 
 
  And using this to write it:
 
 
 
  sstableWriter.newRow(bytes(entry.id));
 
  sstableWriter.newSuperColumn(bytes(superColumn));
 
  sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs),
  timestamp, ttl, expirationTimestampMS);
 
  sstableWriter.addExpiringColumn(nameCov,
 bytes(entry.observationCoverage),
  timestamp, ttl, expirationTimestampMS);
 
  sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp,
 ttl,
  expirationTimestampMS);
 
 
 
  This works perfectly, data can be queried until 31 days are passed,
 then no
  results are given, as expected.
 
  But the data is still on disk until the sstables are being recompacted:
 
 
 
  One of our nodes (we got 6 total) has the following sstables:
 
  [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G
 
  -rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19
  /data/MapData007/HOS-hc-125620-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17
  /data/MapData007/HOS-hc-163141-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17
  /data/MapData007/HOS-hc-172106-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50
  /data/MapData007/HOS-hc-181902-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37
  /data/MapData007/HOS-hc-191448-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41
  /data/MapData007/HOS-hc-193842-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03
  /data/MapData007/HOS-hc-196210-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20
  /data/MapData007/HOS-hc-196779-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33
  /data/MapData007/HOS-hc-58572-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59
  /data/MapData007/HOS-hc-61630-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46
  /data/MapData007/HOS-hc-63857-Data.db
 
  -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41
  /data/MapData007/HOS-hc-87900-Data.db
 
 
 
  As you can see, the following files should be invalid:
 
  /data/MapData007/HOS-hc-58572-Data.db
 
  /data/MapData007/HOS-hc-61630-Data.db
 
  /data/MapData007/HOS-hc-63857-Data.db
 
 
 
  Because they are all written more than an moth ago. gc_grace is 0 so
 this
  should also not be a problem.
 
 
 
  As a test, I use forceUserSpecifiedCompaction on the
 HOS-hc-61630-Data.db.
 
  Expected behavior should be an empty file is being written because all
 data
  in the sstable should be invalid:
 
 
 
  Compactionstats is giving:
 
  compaction typekeyspace   column family bytes compacted
 bytes
  total  progress
 
 Compaction  MapData007 HOS
 11518215662
  532355279724 2.16%
 
 
 
  And when I ls the directory I find this:
 
  -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12
  /data/MapData007/HOS-tmp-hc-196898-Data.db
 
 
 
  The sstable

Re: Composite Column

2012-05-17 Thread samal
It is like using your super column inside columns name.

empKey{
  employee1+name:XX,
  employee1+addr:X,
  employee2+name:X,
  employee2+addr:X
}

Here all of your employee details are attached to one domain i.e. all of
employee1 details will be *employee1+[anytihng.n numbers of
column]*

comaprator=CompositeType(UTF8Type1,UTF8Type2,...,n)

/Samal
On Thu, May 17, 2012 at 10:40 AM, Abhijit Chanda
abhijit.chan...@gmail.comwrote:

 Aaron,

 Actually Aaron i am looking for a scenario on super columns being replaced
 by composite column.
 Say this is a data model using super column
 rowKey{
   superKey1 {
 Name,
 Address,
 City,.
   }
 }

 Actually i am having confusion how exactly the data model will look if we
 use composite column instead of super column.

 Thanks,
 Abhijit



 On Wed, May 16, 2012 at 2:56 PM, aaron morton aa...@thelastpickle.comwrote:

 Abhijit,
 Can you explain the data model a bit more.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/05/2012, at 10:32 PM, samal wrote:

 It is just column with JSON value

 On Tue, May 15, 2012 at 4:00 PM, samal samalgo...@gmail.com wrote:

 I have not used CC but yes you can.
 Below is not composite column. It is not not column with JSON hash
 value. Column value can be anything you like.
 date inside value are not indexed.


 On Tue, May 15, 2012 at 9:27 AM, Abhijit Chanda 
 abhijit.chan...@gmail.com wrote:

 Is it possible to create this data model with the help of composite
 column.

 User_Keys_By_Last_Name = {
   Engineering : {anderson, 1 : ac1263, anderson, 2 : 724f02, ... 
 },
   Sales : { adams, 1 : b32704, alden, 1 : 1553bd, ... },
 }

 I am using Astyanax. Please suggest...
 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395







 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395




Re: Composite Column

2012-05-17 Thread samal
zCassandra data model is best known for demoralize data. You must have
entry for column name in other column family.

Example is just for posterity, you should not put everything in single row,
very inefficient.

I do not use composite column. The way I do and others do is use complex
column.

rowkey=username
coulmns=subsets of user

employee1={
   name:
.
.
   previous_job_xxx_company:null
   previous_job_yyy_company:null
}
employee2={
   name:
.
.
   previous_job::aaa_company:null
   previous_job::yyy_company:null
}

Here you get entire row, as row size is small, filter the similar details
by marker *[previous_job::* is marker here and *xxx_company* is real value
which we need, column value is not required(that depends on requirement)].

Their is very good presentation by datastax folk
http://www.datastax.com/2011/07/video-data-modeling-workshop-from-cassandra-sf-2011and
Joe
http://www.youtube.com/watch?v=EBjWlH4NPMA , it will help you understand
data model.

@samalgorai

On Thu, May 17, 2012 at 12:29 PM, Abhijit Chanda
abhijit.chan...@gmail.comwrote:

 Samal,

 Thanks buddy for interpreting. Now suppose i am inserting data in a column
 family using this data model dynamically, as a result columnNames will be
 dynamic. Now consider there is a entry for *employee1* *name*d Smith,
 and i want to retrieve that value?

 Regards,
 Abhijit

 On Thu, May 17, 2012 at 12:03 PM, samal samalgo...@gmail.com wrote:

 It is like using your super column inside columns name.

 empKey{
   employee1+name:XX,
   employee1+addr:X,
   employee2+name:X,
   employee2+addr:X
 }

 Here all of your employee details are attached to one domain i.e. all of
 employee1 details will be *employee1+[anytihng.n numbers of
 column]*

 comaprator=CompositeType(UTF8Type1,UTF8Type2,...,n)

 /Samal

 On Thu, May 17, 2012 at 10:40 AM, Abhijit Chanda 
 abhijit.chan...@gmail.com wrote:

 Aaron,

 Actually Aaron i am looking for a scenario on super columns being
 replaced by composite column.
 Say this is a data model using super column
 rowKey{
   superKey1 {
 Name,
 Address,
 City,.
   }
 }

 Actually i am having confusion how exactly the data model will look if
 we use composite column instead of super column.

 Thanks,
 Abhijit



 On Wed, May 16, 2012 at 2:56 PM, aaron morton 
 aa...@thelastpickle.comwrote:

 Abhijit,
 Can you explain the data model a bit more.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/05/2012, at 10:32 PM, samal wrote:

 It is just column with JSON value

 On Tue, May 15, 2012 at 4:00 PM, samal samalgo...@gmail.com wrote:

 I have not used CC but yes you can.
 Below is not composite column. It is not not column with JSON hash
 value. Column value can be anything you like.
 date inside value are not indexed.


 On Tue, May 15, 2012 at 9:27 AM, Abhijit Chanda 
 abhijit.chan...@gmail.com wrote:

 Is it possible to create this data model with the help of composite
 column.

 User_Keys_By_Last_Name = {
   Engineering : {anderson, 1 : ac1263, anderson, 2 : 724f02, 
 ... },
   Sales : { adams, 1 : b32704, alden, 1 : 1553bd, ... },
 }

 I am using Astyanax. Please suggest...
 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395







 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395





 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395




Re: How can I implement 'LIKE operation in SQL' on values while querying a column family in Cassandra

2012-05-15 Thread samal
You cannot extract via relative column value.
It can only extract via value if it has secondary index but exact column
value need to match.

as tamar suggested you can put value as column name , UTF8 comparator.

{
'name_abhijit'='abhijit'
'name_abhishek'='abhiskek'
'name_atul'='atul'
}

here you can do slice query on column name and get desired result.

/samal
On Tue, May 15, 2012 at 3:29 PM, selam selam...@gmail.com wrote:

 Mapreduce jobs may solve your problem  for batch processing


 On Tue, May 15, 2012 at 12:49 PM, Abhijit Chanda 
 abhijit.chan...@gmail.com wrote:

 Tamar,

 Can you please illustrate little bit with some sample code. It highly
 appreciable.

 Thanks,


 On Tue, May 15, 2012 at 10:48 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 I don't think this is possible, the best you can do is prefix, if your
 order is alphabetical. For example I have a CF with comparator UTF8Type,
 and then I can do slice query and bring all columns that start with the
 prefix, and end with the prefix where you replace the last char with
 the next one in order (i.e. aaa-aab).

 Hope that helps.

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Tue, May 15, 2012 at 7:56 AM, Abhijit Chanda 
 abhijit.chan...@gmail.com wrote:

 I don't know the exact value on a column, but I want to do a partial
 matching to know all available values that matches.
 I want to do similar kind of operation that LIKE operator in SQL do.
 Any help is highly appreciated.

 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395





 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395




 --
 Saygılar  İyi Çalışmalar
 Timu EREN ( a.k.a selam )

tokLogo.png

Re: Composite Column

2012-05-15 Thread samal
I have not used CC but yes you can.
Below is not composite column. It is not not column with JSON hash value.
Column value can be anything you like.
date inside value are not indexed.

On Tue, May 15, 2012 at 9:27 AM, Abhijit Chanda
abhijit.chan...@gmail.comwrote:

 Is it possible to create this data model with the help of composite column.

 User_Keys_By_Last_Name = {
   Engineering : {anderson, 1 : ac1263, anderson, 2 : 724f02, ... },
   Sales : { adams, 1 : b32704, alden, 1 : 1553bd, ... },
 }

 I am using Astyanax. Please suggest...
 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395




Re: Composite Column

2012-05-15 Thread samal
It is just column with JSON value

On Tue, May 15, 2012 at 4:00 PM, samal samalgo...@gmail.com wrote:

 I have not used CC but yes you can.
 Below is not composite column. It is not not column with JSON hash value.
 Column value can be anything you like.
 date inside value are not indexed.


 On Tue, May 15, 2012 at 9:27 AM, Abhijit Chanda abhijit.chan...@gmail.com
  wrote:

 Is it possible to create this data model with the help of composite
 column.

 User_Keys_By_Last_Name = {
   Engineering : {anderson, 1 : ac1263, anderson, 2 : 724f02, ... },
   Sales : { adams, 1 : b32704, alden, 1 : 1553bd, ... },
 }

 I am using Astyanax. Please suggest...
 --
 Abhijit Chanda
 Software Developer
 VeHere Interactive Pvt. Ltd.
 +91-974395





Re: timezone time series data model

2012-05-02 Thread samal
this will work.I have tried both gave one day unique bucket.

I just realized, If I sync all clients to one zone then date will remain
same for all.

One Zone date will give materialize view to row.

On Mon, Apr 30, 2012 at 11:43 PM, samal samalgo...@gmail.com wrote:

 hhmm. I will try both. thanks


 On Mon, Apr 30, 2012 at 11:29 PM, Tyler Hobbs ty...@datastax.com wrote:

 Err, sorry, I should have said ts - (ts % 86400).  Integer division does
 something similar.


 On Mon, Apr 30, 2012 at 12:39 PM, samal samalgo...@gmail.com wrote:

 thanks I didn't noticed.
 run script for 5  minutes = divide seems to produce result ,modulo is
 still changing. If divide is ok will do the trick.
 I will run this script on Singapore, East coast server, and New delhi
 server whole night today.

 ==
 unix  =   1335806983422
 unix /1000=   1335806983.422
 Divid i/86400 =   15460.728969907408
 Divid i/86400 INT =   15460
 Modulo i%86400=   62983
 ==
 ==
 unix  =   1335806985421
 unix /1000=   1335806985.421
 Divid i/86400 =   15460.72899306
 Divid i/86400 INT =   15460
 Modulo i%86400=   62985
 ==
 ==
 unix  =   1335806987422
 unix /1000=   1335806987.422
 Divid i/86400 =   15460.729016203704
 Divid i/86400 INT =   15460
 Modulo i%86400=   62987
 ==
 ==
 unix  =   1335806989422
 unix /1000=   1335806989.422
 Divid i/86400 =   15460.729039351852
 Divid i/86400 INT =   15460
 Modulo i%86400=   62989
 ==
 ==
 unix  =   1335806991421
 unix /1000=   1335806991.421
 Divid i/86400 =   15460.7290625
 Divid i/86400 INT =   15460
 Modulo i%86400=   62991
 ==
 ==
 unix  =   1335806993422
 unix /1000=   1335806993.422
 Divid i/86400 =   15460.729085648149
 Divid i/86400 INT =   15460
 Modulo i%86400=   62993
 ==
 ==
 unix  =   1335806995422
 unix /1000=   1335806995.422
 Divid i/86400 =   15460.729108796297
 Divid i/86400 INT =   15460
 Modulo i%86400=   62995
 ==
 ==
 unix  =   1335806997421
 unix /1000=   1335806997.421
 Divid i/86400 =   15460.72913195
 Divid i/86400 INT =   15460
 Modulo i%86400=   62997
 ==
 ==
 unix  =   1335806999422
 unix /1000=   1335806999.422
 Divid i/86400 =   15460.729155092593
 Divid i/86400 INT =   15460
 Modulo i%86400=   62999
 ==


 On Mon, Apr 30, 2012 at 10:44 PM, Tyler Hobbs ty...@datastax.comwrote:

 getTime() returns the number of milliseconds since the epoch, not the
 number of seconds: http://www.w3schools.com/jsref/jsref_gettime.asp

 If you divide that number by 1000, it should work.


 On Mon, Apr 30, 2012 at 11:28 AM, samal samalgo...@gmail.com wrote:

 I did it with node.js but it is changing after some interval.

 code
 setInterval(function(){
   var d =new Date().getTime();
   console.log(== );
   console.log(unix =  ,d);
   i=parseInt(d)
   console.log(Divid i/86400=  ,i/86400);
   console.log(Modulo i%86400= ,i%86400);
   console.log(== );
 },2000);

 /code
 Am I doing wrong?


 On Mon, Apr 30, 2012 at 9:54 PM, Tyler Hobbs ty...@datastax.comwrote:

 Correct, that's exactly what I'm saying.


 On Mon, Apr 30, 2012 at 10:37 AM, samal samalgo...@gmail.com wrote:

 thanks tyler for reply.

 are you saying  user1uuid_*{ts%86400}* would lead to unique day
 bucket which will be timezone {NZ to US} independent? I will try.


 On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs ty...@datastax.comwrote:

 Don't use dates or datestamps as the buckets for your row keys, use
 a unix timestamp modulo whatever size you want your bucket to be 
 instead.
 Timestamps don't involve time zones or any of that nonsense.

 So, instead of having keys like user1uuid_30042012, the second
 half would be replaced the current unix timestamp mod 86400 (the 
 number of
 seconds in a day).


 On Mon, Apr 30, 2012 at 1:46 AM, samal samalgo...@gmail.comwrote:

 Hello List,

 I need suggestion/ recommendation on time series data.

 I have requirement where users belongs to different timezone and
 they can subscribe to global group.
 When users at specific timezone send update to group it is
 available to every user in different timezone.

 I am using GroupSubscribedUsers CF where all update to group are
 push to Each User time line, and key is timelined by 
 useruuid_date(one
 day update of all groups) and columns are group updates.

 GroupSubscribedUsers ={
 user1uuid_30042012:{//this user belongs to same timezone
  timeuuid1:JSON[group1update1

timezone time series data model

2012-04-30 Thread samal
Hello List,

I need suggestion/ recommendation on time series data.

I have requirement where users belongs to different timezone and they can
subscribe to global group.
When users at specific timezone send update to group it is available to
every user in different timezone.

I am using GroupSubscribedUsers CF where all update to group are push to
Each User time line, and key is timelined by useruuid_date(one day update
of all groups) and columns are group updates.

GroupSubscribedUsers ={
user1uuid_30042012:{//this user belongs to same timezone
 timeuuid1:JSON[group1update1]
 timeuuid2:JSON[group2update2]
 timeuuid3:JSON[group1update2]
timeuuid4:JSON[group4update1]
   },
  user2uuid_30042012:{//this user belongs to different timezone where date
has changed already  to 1may but  30 april is getting update
 timeuuid1:JSON[group1update1]
 timeuuid2:JSON[group2update2]
 timeuuid3:JSON[group1update2]
timeuuid4:JSON[group4update1]
timeuuid5:JSON[groupNupdate1]
   },

}

I have noticed  this approach is good for single time zone when different
timezone come into picture it breaks.

I am thinking of like when user pushed update to group -get user who is
subscribed to group-check user timezone-push time series in user time
zone. So for one user update will be on 30april where as other may have on
29april and 1may, using timestamps i can find out hours ago update came.

Is there any better approach?


Thanks,

Samal


Re: Data model question, storing Queue Message

2012-04-30 Thread samal
On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis msega...@gmail.com wrote:

 Hi Aaron,

 Thank you for your answer, I was beginning to think that my question would
 never be answered ;-)

 Actually, this is what I was going for, except one thing, instead of
 partitioning row per month, I though about partitioning per day, like that
 everyday I launch the cleaning tool, and it will delete the day from X
 month earlier.


USE TTL feature of column as it will remove column after TTL is over (no
need for manual job).

I guess that will reduce the workload drastically, does it have any
 downside comparing to month partitioning?


key belongs to particular node , so depending on size of your data day or
month wise partitioning matters. Other wise it can lead to Fat row which
will cause system problem.



 At one point I was going to do something like the twissandra example,
 Having a CF per User's queue, and another CF per day storing every
 message's ID of the day, in that way If I want to delete them, I only look
 into this row, and delete them using ID's for deleting them in the User's
 queue CF… Is that a good way to do ? Or should I stick with the first
 implementation ?

 Best regards,

 Morgan.

 Le 30 avr. 2012 à 05:52, aaron morton a écrit :

 Message Queue is often not a great use case for Cassandra. For information
 on how to handle high delete workloads see
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

 It hard to create a model without some idea of the data load, but I would
 suggest you start with:

 CF: UserMessages
 Key: ReceiverID
 Columns : column name = TimeUUID ; column value = message ID and Body

 That will order the messages by time.

 Depending on load (and to support deleting a previous months messages) you
 may want to partition the rows by month:

 CF: UserMessagesMonth
 Key: ReceiverID+MM
 Columns : column name = TimeUUID ; column value = message ID and Body

 Everything the same as before. But now a user has a row for each month and
 which you can delete as a whole. This also helps avoid very big rows.

 I really don't think that storage will be an issue, I have 2TB per nodes,
 messages are 1KB limited.

 I would suggest you keep the per node limit to 300 to 400 GB. It can take
 a long time to compact, repair and move the data when it gets above 400GB.

 Hope that helps.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:

 Hi everyone !

 I'm fairly new to cassandra and I'm not quite yet familiarized with column
 oriented NoSQL model.
 I have worked a while on it, but I can't seems to find the best model for
 what I'm looking for.

 I have a Erlang software that let user connecting and communicate with
 each others, when an user (A) sends
 a message to a disconnected user (B), it stores it on the database and
 wait for the user (B) to connect and retrieve
 the message queue, and deletes it.

 Here's some key point :
 - Users are identified by integer IDs
 - Each message are unique by combination of : Sender ID - Receiver ID -
 Message ID - time

 I have a queue Message, and here's the operations I would need to do as
 fast as possible :

 - Store from 1 to X messages per registered user
 - Get the number of stored messages per user (Can be a incremental
 variable updated at each store // this is often retrieved)
 - retrieve all messages from an user at once.
 - delete all messages from an user at once.
 - delete all messages that are older than Y months (from all users).

 I really don't think that storage will be an issue, I have 2TB per nodes,
 messages are 1KB limited.
 I'm really looking for speed rather than storage optimization.

 My configuration is 2 dedicated server which are both :
 - 4 x Intel i7 2.66 Ghz
 - 64 bits
 - 24 Go
 - 2 TB

 Thank you all.






Re: Data model question, storing Queue Message

2012-04-30 Thread samal
On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis msega...@gmail.com wrote:

 Hi Samal,

 Thanks for the TTL feature, I wasn't aware of it's existence.

 Day's partitioning will be less wider than month partitionning (about 30
 times less give or take ;-) )
 Per day it should have something like 100 000 messages stored, most of it
 would be retrieved so deleted before the TTL feature should come do it's
 work.


TTL is the last day column can exist in c-world after that it is deleted.
Deleting before TTL is fine.
Have you considered KAFKA http://incubator.apache.org/kafka/




 Le 30 avr. 2012 à 13:16, samal a écrit :



 On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis msega...@gmail.comwrote:

 Hi Aaron,

 Thank you for your answer, I was beginning to think that my question
 would never be answered ;-)

 Actually, this is what I was going for, except one thing, instead of
 partitioning row per month, I though about partitioning per day, like that
 everyday I launch the cleaning tool, and it will delete the day from X
 month earlier.


 USE TTL feature of column as it will remove column after TTL is over (no
 need for manual job).

  I guess that will reduce the workload drastically, does it have any
 downside comparing to month partitioning?


 key belongs to particular node , so depending on size of your data day or
 month wise partitioning matters. Other wise it can lead to Fat row which
 will cause system problem.



 At one point I was going to do something like the twissandra example,
 Having a CF per User's queue, and another CF per day storing every
 message's ID of the day, in that way If I want to delete them, I only look
 into this row, and delete them using ID's for deleting them in the User's
 queue CF… Is that a good way to do ? Or should I stick with the first
 implementation ?

 Best regards,

 Morgan.

 Le 30 avr. 2012 à 05:52, aaron morton a écrit :

 Message Queue is often not a great use case for Cassandra. For
 information on how to handle high delete workloads see
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

 It hard to create a model without some idea of the data load, but I would
 suggest you start with:

 CF: UserMessages
 Key: ReceiverID
 Columns : column name = TimeUUID ; column value = message ID and Body

 That will order the messages by time.

 Depending on load (and to support deleting a previous months messages)
 you may want to partition the rows by month:

 CF: UserMessagesMonth
 Key: ReceiverID+MM
 Columns : column name = TimeUUID ; column value = message ID and Body

 Everything the same as before. But now a user has a row for each month
 and which you can delete as a whole. This also helps avoid very big rows.

 I really don't think that storage will be an issue, I have 2TB per nodes,
 messages are 1KB limited.

 I would suggest you keep the per node limit to 300 to 400 GB. It can take
 a long time to compact, repair and move the data when it gets above 400GB.

 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:

 Hi everyone !

 I'm fairly new to cassandra and I'm not quite yet familiarized with
 column oriented NoSQL model.
 I have worked a while on it, but I can't seems to find the best model for
 what I'm looking for.

 I have a Erlang software that let user connecting and communicate with
 each others, when an user (A) sends
 a message to a disconnected user (B), it stores it on the database and
 wait for the user (B) to connect and retrieve
 the message queue, and deletes it.

 Here's some key point :
 - Users are identified by integer IDs
 - Each message are unique by combination of : Sender ID - Receiver ID -
 Message ID - time

 I have a queue Message, and here's the operations I would need to do as
 fast as possible :

 - Store from 1 to X messages per registered user
 - Get the number of stored messages per user (Can be a incremental
 variable updated at each store // this is often retrieved)
 - retrieve all messages from an user at once.
 - delete all messages from an user at once.
 - delete all messages that are older than Y months (from all users).

 I really don't think that storage will be an issue, I have 2TB per nodes,
 messages are 1KB limited.
 I'm really looking for speed rather than storage optimization.

 My configuration is 2 dedicated server which are both :
 - 4 x Intel i7 2.66 Ghz
 - 64 bits
 - 24 Go
 - 2 TB

 Thank you all.








Re: timezone time series data model

2012-04-30 Thread samal
thanks tyler for reply.

are you saying  user1uuid_*{ts%86400}* would lead to unique day bucket
which will be timezone {NZ to US} independent? I will try.

On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs ty...@datastax.com wrote:

 Don't use dates or datestamps as the buckets for your row keys, use a unix
 timestamp modulo whatever size you want your bucket to be instead.
 Timestamps don't involve time zones or any of that nonsense.

 So, instead of having keys like user1uuid_30042012, the second half
 would be replaced the current unix timestamp mod 86400 (the number of
 seconds in a day).


 On Mon, Apr 30, 2012 at 1:46 AM, samal samalgo...@gmail.com wrote:

 Hello List,

 I need suggestion/ recommendation on time series data.

 I have requirement where users belongs to different timezone and they can
 subscribe to global group.
 When users at specific timezone send update to group it is available to
 every user in different timezone.

 I am using GroupSubscribedUsers CF where all update to group are push to
 Each User time line, and key is timelined by useruuid_date(one day update
 of all groups) and columns are group updates.

 GroupSubscribedUsers ={
 user1uuid_30042012:{//this user belongs to same timezone
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
},
   user2uuid_30042012:{//this user belongs to different timezone where
 date has changed already  to 1may but  30 april is getting update
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
 timeuuid5:JSON[groupNupdate1]
},

 }

 I have noticed  this approach is good for single time zone when different
 timezone come into picture it breaks.

 I am thinking of like when user pushed update to group -get user who is
 subscribed to group-check user timezone-push time series in user time
 zone. So for one user update will be on 30april where as other may have on
 29april and 1may, using timestamps i can find out hours ago update came.

 Is there any better approach?


 Thanks,

 Samal





 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: timezone time series data model

2012-04-30 Thread samal
I did it with node.js but it is changing after some interval.

code
setInterval(function(){
  var d =new Date().getTime();
  console.log(== );
  console.log(unix =  ,d);
  i=parseInt(d)
  console.log(Divid i/86400=  ,i/86400);
  console.log(Modulo i%86400= ,i%86400);
  console.log(== );
},2000);

/code
Am I doing wrong?

On Mon, Apr 30, 2012 at 9:54 PM, Tyler Hobbs ty...@datastax.com wrote:

 Correct, that's exactly what I'm saying.


 On Mon, Apr 30, 2012 at 10:37 AM, samal samalgo...@gmail.com wrote:

 thanks tyler for reply.

 are you saying  user1uuid_*{ts%86400}* would lead to unique day bucket
 which will be timezone {NZ to US} independent? I will try.


 On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs ty...@datastax.com wrote:

 Don't use dates or datestamps as the buckets for your row keys, use a
 unix timestamp modulo whatever size you want your bucket to be instead.
 Timestamps don't involve time zones or any of that nonsense.

 So, instead of having keys like user1uuid_30042012, the second half
 would be replaced the current unix timestamp mod 86400 (the number of
 seconds in a day).


 On Mon, Apr 30, 2012 at 1:46 AM, samal samalgo...@gmail.com wrote:

 Hello List,

 I need suggestion/ recommendation on time series data.

 I have requirement where users belongs to different timezone and they
 can subscribe to global group.
 When users at specific timezone send update to group it is available to
 every user in different timezone.

 I am using GroupSubscribedUsers CF where all update to group are push
 to Each User time line, and key is timelined by useruuid_date(one day
 update of all groups) and columns are group updates.

 GroupSubscribedUsers ={
 user1uuid_30042012:{//this user belongs to same timezone
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
},
   user2uuid_30042012:{//this user belongs to different timezone where
 date has changed already  to 1may but  30 april is getting update
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
 timeuuid5:JSON[groupNupdate1]
},

 }

 I have noticed  this approach is good for single time zone when
 different timezone come into picture it breaks.

 I am thinking of like when user pushed update to group -get user who
 is subscribed to group-check user timezone-push time series in user time
 zone. So for one user update will be on 30april where as other may have on
 29april and 1may, using timestamps i can find out hours ago update came.

 Is there any better approach?


 Thanks,

 Samal





 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: timezone time series data model

2012-04-30 Thread samal
thanks I didn't noticed.
run script for 5  minutes = divide seems to produce result ,modulo is
still changing. If divide is ok will do the trick.
I will run this script on Singapore, East coast server, and New delhi
server whole night today.

==
unix  =   1335806983422
unix /1000=   1335806983.422
Divid i/86400 =   15460.728969907408
Divid i/86400 INT =   15460
Modulo i%86400=   62983
==
==
unix  =   1335806985421
unix /1000=   1335806985.421
Divid i/86400 =   15460.72899306
Divid i/86400 INT =   15460
Modulo i%86400=   62985
==
==
unix  =   1335806987422
unix /1000=   1335806987.422
Divid i/86400 =   15460.729016203704
Divid i/86400 INT =   15460
Modulo i%86400=   62987
==
==
unix  =   1335806989422
unix /1000=   1335806989.422
Divid i/86400 =   15460.729039351852
Divid i/86400 INT =   15460
Modulo i%86400=   62989
==
==
unix  =   1335806991421
unix /1000=   1335806991.421
Divid i/86400 =   15460.7290625
Divid i/86400 INT =   15460
Modulo i%86400=   62991
==
==
unix  =   1335806993422
unix /1000=   1335806993.422
Divid i/86400 =   15460.729085648149
Divid i/86400 INT =   15460
Modulo i%86400=   62993
==
==
unix  =   1335806995422
unix /1000=   1335806995.422
Divid i/86400 =   15460.729108796297
Divid i/86400 INT =   15460
Modulo i%86400=   62995
==
==
unix  =   1335806997421
unix /1000=   1335806997.421
Divid i/86400 =   15460.72913195
Divid i/86400 INT =   15460
Modulo i%86400=   62997
==
==
unix  =   1335806999422
unix /1000=   1335806999.422
Divid i/86400 =   15460.729155092593
Divid i/86400 INT =   15460
Modulo i%86400=   62999
==


On Mon, Apr 30, 2012 at 10:44 PM, Tyler Hobbs ty...@datastax.com wrote:

 getTime() returns the number of milliseconds since the epoch, not the
 number of seconds: http://www.w3schools.com/jsref/jsref_gettime.asp

 If you divide that number by 1000, it should work.


 On Mon, Apr 30, 2012 at 11:28 AM, samal samalgo...@gmail.com wrote:

 I did it with node.js but it is changing after some interval.

 code
 setInterval(function(){
   var d =new Date().getTime();
   console.log(== );
   console.log(unix =  ,d);
   i=parseInt(d)
   console.log(Divid i/86400=  ,i/86400);
   console.log(Modulo i%86400= ,i%86400);
   console.log(== );
 },2000);

 /code
 Am I doing wrong?


 On Mon, Apr 30, 2012 at 9:54 PM, Tyler Hobbs ty...@datastax.com wrote:

 Correct, that's exactly what I'm saying.


 On Mon, Apr 30, 2012 at 10:37 AM, samal samalgo...@gmail.com wrote:

 thanks tyler for reply.

 are you saying  user1uuid_*{ts%86400}* would lead to unique day bucket
 which will be timezone {NZ to US} independent? I will try.


 On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs ty...@datastax.comwrote:

 Don't use dates or datestamps as the buckets for your row keys, use a
 unix timestamp modulo whatever size you want your bucket to be instead.
 Timestamps don't involve time zones or any of that nonsense.

 So, instead of having keys like user1uuid_30042012, the second half
 would be replaced the current unix timestamp mod 86400 (the number of
 seconds in a day).


 On Mon, Apr 30, 2012 at 1:46 AM, samal samalgo...@gmail.com wrote:

 Hello List,

 I need suggestion/ recommendation on time series data.

 I have requirement where users belongs to different timezone and they
 can subscribe to global group.
 When users at specific timezone send update to group it is available
 to every user in different timezone.

 I am using GroupSubscribedUsers CF where all update to group are push
 to Each User time line, and key is timelined by useruuid_date(one day
 update of all groups) and columns are group updates.

 GroupSubscribedUsers ={
 user1uuid_30042012:{//this user belongs to same timezone
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
},
   user2uuid_30042012:{//this user belongs to different timezonewhere 
 date has changed already  to 1may but  30 april is getting update
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
 timeuuid5:JSON[groupNupdate1]
},

 }

 I have noticed  this approach is good for single time zone when
 different timezone come into picture it breaks.

 I am thinking of like when user

Re: timezone time series data model

2012-04-30 Thread samal
hhmm. I will try both. thanks

On Mon, Apr 30, 2012 at 11:29 PM, Tyler Hobbs ty...@datastax.com wrote:

 Err, sorry, I should have said ts - (ts % 86400).  Integer division does
 something similar.


 On Mon, Apr 30, 2012 at 12:39 PM, samal samalgo...@gmail.com wrote:

 thanks I didn't noticed.
 run script for 5  minutes = divide seems to produce result ,modulo is
 still changing. If divide is ok will do the trick.
 I will run this script on Singapore, East coast server, and New delhi
 server whole night today.

 ==
 unix  =   1335806983422
 unix /1000=   1335806983.422
 Divid i/86400 =   15460.728969907408
 Divid i/86400 INT =   15460
 Modulo i%86400=   62983
 ==
 ==
 unix  =   1335806985421
 unix /1000=   1335806985.421
 Divid i/86400 =   15460.72899306
 Divid i/86400 INT =   15460
 Modulo i%86400=   62985
 ==
 ==
 unix  =   1335806987422
 unix /1000=   1335806987.422
 Divid i/86400 =   15460.729016203704
 Divid i/86400 INT =   15460
 Modulo i%86400=   62987
 ==
 ==
 unix  =   1335806989422
 unix /1000=   1335806989.422
 Divid i/86400 =   15460.729039351852
 Divid i/86400 INT =   15460
 Modulo i%86400=   62989
 ==
 ==
 unix  =   1335806991421
 unix /1000=   1335806991.421
 Divid i/86400 =   15460.7290625
 Divid i/86400 INT =   15460
 Modulo i%86400=   62991
 ==
 ==
 unix  =   1335806993422
 unix /1000=   1335806993.422
 Divid i/86400 =   15460.729085648149
 Divid i/86400 INT =   15460
 Modulo i%86400=   62993
 ==
 ==
 unix  =   1335806995422
 unix /1000=   1335806995.422
 Divid i/86400 =   15460.729108796297
 Divid i/86400 INT =   15460
 Modulo i%86400=   62995
 ==
 ==
 unix  =   1335806997421
 unix /1000=   1335806997.421
 Divid i/86400 =   15460.72913195
 Divid i/86400 INT =   15460
 Modulo i%86400=   62997
 ==
 ==
 unix  =   1335806999422
 unix /1000=   1335806999.422
 Divid i/86400 =   15460.729155092593
 Divid i/86400 INT =   15460
 Modulo i%86400=   62999
 ==


 On Mon, Apr 30, 2012 at 10:44 PM, Tyler Hobbs ty...@datastax.com wrote:

 getTime() returns the number of milliseconds since the epoch, not the
 number of seconds: http://www.w3schools.com/jsref/jsref_gettime.asp

 If you divide that number by 1000, it should work.


 On Mon, Apr 30, 2012 at 11:28 AM, samal samalgo...@gmail.com wrote:

 I did it with node.js but it is changing after some interval.

 code
 setInterval(function(){
   var d =new Date().getTime();
   console.log(== );
   console.log(unix =  ,d);
   i=parseInt(d)
   console.log(Divid i/86400=  ,i/86400);
   console.log(Modulo i%86400= ,i%86400);
   console.log(== );
 },2000);

 /code
 Am I doing wrong?


 On Mon, Apr 30, 2012 at 9:54 PM, Tyler Hobbs ty...@datastax.comwrote:

 Correct, that's exactly what I'm saying.


 On Mon, Apr 30, 2012 at 10:37 AM, samal samalgo...@gmail.com wrote:

 thanks tyler for reply.

 are you saying  user1uuid_*{ts%86400}* would lead to unique day
 bucket which will be timezone {NZ to US} independent? I will try.


 On Mon, Apr 30, 2012 at 8:25 PM, Tyler Hobbs ty...@datastax.comwrote:

 Don't use dates or datestamps as the buckets for your row keys, use
 a unix timestamp modulo whatever size you want your bucket to be 
 instead.
 Timestamps don't involve time zones or any of that nonsense.

 So, instead of having keys like user1uuid_30042012, the second
 half would be replaced the current unix timestamp mod 86400 (the number 
 of
 seconds in a day).


 On Mon, Apr 30, 2012 at 1:46 AM, samal samalgo...@gmail.com wrote:

 Hello List,

 I need suggestion/ recommendation on time series data.

 I have requirement where users belongs to different timezone and
 they can subscribe to global group.
 When users at specific timezone send update to group it is
 available to every user in different timezone.

 I am using GroupSubscribedUsers CF where all update to group are
 push to Each User time line, and key is timelined by 
 useruuid_date(one
 day update of all groups) and columns are group updates.

 GroupSubscribedUsers ={
 user1uuid_30042012:{//this user belongs to same timezone
  timeuuid1:JSON[group1update1]
  timeuuid2:JSON[group2update2]
  timeuuid3:JSON[group1update2]
 timeuuid4:JSON[group4update1]
},
   user2uuid_30042012:{//this user belongs to different timezonewhere 
 date has changed already  to 1may but  30 april is getting update
  timeuuid1

Re: Cassandra and harddrives

2012-04-25 Thread samal
Each node need its own HDD for multiple copies. cant share it with others
node.

On Thu, Apr 26, 2012 at 8:52 AM, Benny Rönnhager 
benny.ronnha...@thrutherockies.com wrote:

 Hi!

 I am building a database with several hundred thousands of images.
 have just learned that HaProxy is a very good fronted to a couple of
 Cassandra nodes. I understand how that works but...

 Must every single node (mac mini) have it's own external harddrive with
 the same data (images) or can I just use one hard drive that can be
 accessed by all nodes?

 What is the recommended way to do this?

 Thanks in advance.

 Benny



Re: How to store a list of values?

2012-03-27 Thread samal
YEAH! agree, it only matter for time bucket data.

On Tue, Mar 27, 2012 at 12:31 PM, R. Verlangen ro...@us2.nl wrote:

 That's true, but it does not sound like a real problem to me.. Maybe
 someone else can shed some light upon this.


 2012/3/27 samal samalgo...@gmail.com



 On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra so
 it should work perfect. Except for the skill~ is preserverd for your list.


  In case skill~ is decided to change to skill:: , it need to be handle at
 app level. Or otherwise had t update in all row, read it first, modify it,
 insert new version and delete old version.




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




Re: How to store a list of values?

2012-03-26 Thread samal
I would take simple approach. create one other CF UserSkill  with row key
same as profile_cf key,
In user_skill cf will add skill as column name and value null. Columns can
be added or removed.

UserProfile={
  '*ben*'={
   blah :blah
   blah :blah
   blah :blah
 }
}

UserSkill={
  '*ben*'={
'java':''
'cassandra':''
  .
  .
  .
  'linux':''
  'skill':'infinity'
 }

}


On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in each
 profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a 
 repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this 
 as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's some
 idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only 
 supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben




Re: How to store a list of values?

2012-03-26 Thread samal
plus it is fully compatible with CQL.
SELECT * FROM UserSkill WHERE KEY='ben';

On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with row
 key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns can
 be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in each
 profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a 
 repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this 
 as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only 
 supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben





Re: How to store a list of values?

2012-03-26 Thread samal
On Mon, Mar 26, 2012 at 9:20 PM, Ben McCann b...@benmccann.com wrote:

 Thanks for the reply Samal.



  I did not realize that you could store a column with null value.

values can be null or any value like
[default@node] set hus['test']['wowq']='\{de\'.de\;\}\+\^anything';
Value inserted.
Elapsed time: 4 msec(s).
[default@node]
[default@node]
[default@node] get hus['test'];
= (column=wow, value={de.de;}, timestamp=133222503000)
= (column=wowq, value={de'.de;}+^anything, timestamp=133267425000)
Returned 2 results.
Elapsed time: 65 msec(s).
[default@node]


  Do you know if this solution would work with composite columns?  It seems
 super columns are being phased out in favor of composites, but I do not
 understand composites very well yet.

personally i have phased out Super Column year back, about CC didn't much
dig into it but know key and column name can be composite.

'ben'+'task1'={
   utf8+ascii:''
}


  I'm trying to figure out if there's any way to accomplish what you've
 suggested using Astyanax https://github.com/Netflix/astyanax.

  this is the simplest approach, should work with every client available
since it is independent CF, here two call is required.


 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with row
 key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns
 can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.com wrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith 
 a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







Re: How to store a list of values?

2012-03-26 Thread samal
 Save the skills in a single column in json format.  Job done.

Good if  it have fixed set of skills, then any add or delete changes need
handle in app. -read column first-reformat JOSN-update column (2 thrift
calls).

 skill~Java: null,
 skill~Cassandra: null
This is also good option, but any schema change will break it.


On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null.
 Columns can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would 
 not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine
 there's some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward 
 in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





Re: How to store a list of values?

2012-03-26 Thread samal
On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra so it
 should work perfect. Except for the skill~ is preserverd for your list.


 In case skill~ is decided to change to skill:: , it need to be handle at
app level. Or otherwise had t update in all row, read it first, modify it,
insert new version and delete old version.


Re: Re: Cassandra DataModeling recommendations

2011-12-05 Thread samal
On Mon, Dec 5, 2011 at 3:06 PM, pco...@cegetel.net wrote:

 Hi
 Thanks for the answer, as I read the book on Cassandra, I was not aware at
 that time on Composite Key which I recently discovered.


*Composite Type's are useful for handling data-versions.
*
* *

 * *You mentioned a TTL and let the database remove the date for me. I
 never read about that. Is it possible without an external batch ?


*Yes, TTL if set on column, auto delete column for you.*


 I will try to rephrase in any case my goal:

 Storage:
 - I would like to store for a user (identified by its id) several carts
 (BLOB).



 - Associated to these carts, I would like to attach metadata like
 expiration date and possibly others.

 Queries/tasks:
 - I would like to be able to retrieve all the carts of a given userId.


*I would use timeline with TTL for carts as separate CF. And cart_Id to
reverse index in userId CF with TTL set on columns.  *

- I would like to have a mean to remove expired carts.

 *set TTL on each column. *

1.
cartCF{
 *cart1_uuidkey:{
   metadata_column:ttl
 }
 cart2_uuidkey:{
  metadata_column:ttl
}
.
.
.cartN_uuidkey:{
  metadata_column:ttl
}*
}

2.
userIdCF:{
 *user1:{
id:user1 //*hack : to prevent unwanted behavior one column with no ttl.*
cart1:cart1_uuidkey:ttl
cart2:ttl
cart3:ttl
}
user2:{
id:user2
cart1:cartX_uuidkey:ttl
cart2:cart4:ttl
cart3:cartMttl

}*

}

/Samal


Re: node.js library?

2011-12-05 Thread samal
On Mon, Dec 5, 2011 at 7:59 PM, Norman Maurer
norman.mau...@googlemail.comwrote:

 As far as I know its the library that was developed by rackspace.

 See
 https://github.com/racker/node-cassandra-client



*No longer maintained. it is moved as separate project in apache-extras *


2011/12/5 Joe Stein crypt...@gmail.com

 Hey folks, so I have been noodling on using node.js as a new front end for
 the system I built for doing real time aggregate metrics within our
 distributed systems.

 Does anyone have experience or background story on this lib?
 http://code.google.com/a/apache-extras.org/p/cassandra-node/ it seems to
 be the most up to date one supporting CQL only (which should not be an
 issue) but was not sure if it is maintained or what the background story is
 on it and such?

 Any other experiences/horror stories/over the rainbow type stories with
 node.js  C* would be nice to hear.

 /*
 Joe Stein
 http://www.linkedin.com/in/charmalloc
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
 */



Re: Setting Key Validation Class

2011-12-05 Thread samal
key_validation_class is different from validation_class
validation are BytesType by default.

key_valdation_class = key
default_validation_class=column_values
comparator=column_name

default_validation_class is global scope of validation_class


On Mon, Dec 5, 2011 at 10:10 PM, Dinusha Dilrukshi
sdddilruk...@gmail.comwrote:

 Hi,

 I am using apache-cassandra-1.0.0 and I tried to insert/retrieve data in a
 column family using cassandra-jdbc program.
 Here is how I created 'USER' column family using cassandra-cli.

 create column family USER with comparator=UTF8Type
 and column_metadata=[{column_name: user_id, validation_class: UTF8Type,
 index_type: KEYS},
 {column_name: username, validation_class: UTF8Type, index_type: KEYS},
 {column_name: password, validation_class: UTF8Type}];

 But, when i try to insert data to USER column family it gives the error
 java.sql.SQLException: Mismatched types: java.lang.String cannot be cast
 to java.nio.ByteBuffer.

 Since I have set user_id as a KEY and it's validation_class as UTF8Type, I
 was expected Key Validation Class as UTF8Type.
 But when I look at the meta-data of USER column family it shows as Key
 Validation Class: org.apache.cassandra.db.marshal.BytesType which has
 cause for the above error.

 When I created USER column family as follows, it solves the above issue.

 create column family USER with comparator=UTF8Type and
 key_validation_class=UTF8Type
 and column_metadata=[{column_name: user_id, validation_class: UTF8Type,
 index_type: KEYS},
 {column_name: username, validation_class: UTF8Type, index_type: KEYS},
 {column_name: password, validation_class: UTF8Type}];

 Do we always need to define *key_validation_class* as in the above query
 ? Isn't it not enough to add validation classes for each column ?

 Regards,
 ~Dinusha~




Re: OutOfMemory Exception during bootstrap

2011-12-04 Thread samal
Lower your heap size, if you are testing multiple instance with single
node.

https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L64


On Sun, Dec 4, 2011 at 11:08 PM, Harald Falzberger
h.falzber...@gmail.comwrote:

 Hi,

 I'm trying to set up a test environment with 2 nodes on one physical
 machine with two ips. I configured both as adviced in the
 documentation:

 cluster_name: 'MyDemoCluster'
 initial_token: 0
 seed_provider:
 - seeds: IP1
 listen_address: IP1
 rpc_address: IP1

 cluster_name: 'MyDemoCluster'
 initial_token: 85070591730234615865843651857942052864
 seed_provider:
 - seeds: IP1
 listen_address: IP2
 rpc_address: IP2

 Node1 uses 7199 as JMX port, Node2 7198 because JMX by default is
 listening on all interfaces.

 When I bootstrap node2, on node1 following exception is thrown and
 node1 terminates. the same error occurs again if I try to restart
 node1 and node2 is still running.

 Does anyone of you have an idea why this happens? I'm starting each
 cassandra instance with 16GB RAM and my database is empty.

 Exception on Node1
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:597)
 at
 java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
 at
 java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1384)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
 at
 org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
 at
 org.apache.cassandra.concurrent.StageManager.multiThreadedStage(StageManager.java:58)
 at
 org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:44)
 at
 org.apache.cassandra.net.MessagingService.receive(MessagingService.java:512)
 at
 org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:159)



Re: Seeking advice on Schema and Caching

2011-11-16 Thread samal
 Edanuff + Beautiful People

I think row cache could be the best fit but it can take resource
depending on row size. It will only touch disk once (first time) in case of
SST, rest of the req for that row will be served from memory. Try
increasing row cache size and decreasing save period to appropriate value
*Row cache size / save period in seconds: *200/30
 one catch this is only good for small size row, as your one row contain
all entry with first 3 similar char, this can happen that one row could
become very large while other remain very thin.
eg:
 many ppl can have aditya name
adi{
{tya,1}
.
.
}

but only few ppl will have name with x or y.


On Thu, Nov 17, 2011 at 3:29 AM, Aditya ady...@gmail.com wrote:

 Thanks to samal who pointed to look at the composite columns. I am now
 using composite columns names containing username+userId  valueless
 column. Thus column names are now unique even for users with same name as
 userId is also attached to the same composite col name. Thus the
 supercolumn issue is resolved.
 But I am still seeking advice some on the caching strategy for these rows.
 Since while a user is doing the search, the DB will be queried multiple
 times because  I 'm not keeping the retrieved columns in the application
 layer. Thus I am thinking of caching this row so that the further queries
 be served through the cache. However the important point here is that I am
 using very fewer resources for this cache so that the rows remain in cache
 for a very short time so as to serve the needs only for a single search
 time interval like max 30 seconds. Is this approach correct.? That way I
 wont be putting unneccessary data in cache for a long time thus saving
 resources for other needs.


 On Wed, Nov 16, 2011 at 11:20 AM, samal samalgo...@gmail.com wrote:

 I think you can but I am not sure, I haven't tried that yet, Nothing harm
 in keeping value also it will be read in single query only.

 In 2nd case, yes 2 or more query required to get specific user details.
 As username is map to user_id's key(unique like UUID) and user_id key store
 actual details.


 On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan ady...@gmail.comwrote:

 Regarding the first option that you suggested through composite columns,
 can I store the username  id both in the column name and keep the column
 valueless?
 Will I be able to retrieve both the username and id from the composite
 col name ?

 Thanks a lot

 On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan ady...@gmail.comwrote:

 Got the first option that you suggested.

 However, In the second one, are you suggested to use, for e.g,
 key='Marcos'  store cols, for all users of that name, containing userId
 inside that row. That way it would have to read multiple rows while user is
 doing a single search.


 On Wed, Nov 16, 2011 at 10:47 AM, samal samalgo...@gmail.com wrote:


   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is made
 of first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row (with
 key=Mar).
  The column names are framed as remaining letters of the names.
 Thus, a name 'Marcos' will be stored within rowkey Mar  col name 
 cos.
 The id will be stored as column value. Since there could be many users 
 with
 same name. Thus I would have multple userIds(of users named Marcos) 
 to be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that
 ids of users with same name may fit as sub-columns inside a 
 super-column)
 but since supercolumns are not encouraged thus I want to use an 
 alternative
 schema for this usecase if possible. Could you suggest some ideas on 
 this ?
 


 Aditya,

 Have you any given thought on Composite columns [1]. I think it can
 help you solve your problem of multiple user with same name.

 mar:{
   {cos,unique_user_id}:unique_user_id,
   {cos,1}:1,
   {cos,2}:2,
   {cos,3}:3,

 //  {utf8,timeUUID}:timeUUID,
 }
 OR
 you can try wide rows indexing user name to ID's

 marcos{
user1:' ',
user2:' ',
user3:' '
 }

 [1]http://www.slideshare.net/edanuff/indexing-in-cassandra








Re: Seeking advice on Schema and Caching

2011-11-15 Thread samal
   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is made of
 first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row (with
 key=Mar).
  The column names are framed as remaining letters of the names. Thus, a
 name 'Marcos' will be stored within rowkey Mar  col name cos. The id
 will be stored as column value. Since there could be many users with same
 name. Thus I would have multple userIds(of users named Marcos) to be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that ids of
 users with same name may fit as sub-columns inside a super-column) but
 since supercolumns are not encouraged thus I want to use an alternative
 schema for this usecase if possible. Could you suggest some ideas on this ?
 


Aditya,

Have you any given thought on Composite columns [1]. I think it can help
you solve your problem of multiple user with same name.

mar:{
  {cos,unique_user_id}:unique_user_id,
  {cos,1}:1,
  {cos,2}:2,
  {cos,3}:3,

//  {utf8,timeUUID}:timeUUID,
}
OR
you can try wide rows indexing user name to ID's

marcos{
   user1:' ',
   user2:' ',
   user3:' '
}

[1]http://www.slideshare.net/edanuff/indexing-in-cassandra


Re: Seeking advice on Schema and Caching

2011-11-15 Thread samal
I think you can but I am not sure, I haven't tried that yet, Nothing harm
in keeping value also it will be read in single query only.

In 2nd case, yes 2 or more query required to get specific user details. As
username is map to user_id's key(unique like UUID) and user_id key store
actual details.

On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan ady...@gmail.com wrote:

 Regarding the first option that you suggested through composite columns,
 can I store the username  id both in the column name and keep the column
 valueless?
 Will I be able to retrieve both the username and id from the composite col
 name ?

 Thanks a lot

 On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan ady...@gmail.com wrote:

 Got the first option that you suggested.

 However, In the second one, are you suggested to use, for e.g,
 key='Marcos'  store cols, for all users of that name, containing userId
 inside that row. That way it would have to read multiple rows while user is
 doing a single search.


 On Wed, Nov 16, 2011 at 10:47 AM, samal samalgo...@gmail.com wrote:


   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is made
 of first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row (with
 key=Mar).
  The column names are framed as remaining letters of the names. Thus,
 a name 'Marcos' will be stored within rowkey Mar  col name cos. The 
 id
 will be stored as column value. Since there could be many users with same
 name. Thus I would have multple userIds(of users named Marcos) to be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that ids
 of users with same name may fit as sub-columns inside a super-column) but
 since supercolumns are not encouraged thus I want to use an alternative
 schema for this usecase if possible. Could you suggest some ideas on this 
 ?
 


 Aditya,

 Have you any given thought on Composite columns [1]. I think it can help
 you solve your problem of multiple user with same name.

 mar:{
   {cos,unique_user_id}:unique_user_id,
   {cos,1}:1,
   {cos,2}:2,
   {cos,3}:3,

 //  {utf8,timeUUID}:timeUUID,
 }
 OR
 you can try wide rows indexing user name to ID's

 marcos{
user1:' ',
user2:' ',
user3:' '
 }

 [1]http://www.slideshare.net/edanuff/indexing-in-cassandra






Re: Apache Cassandra Hangout in Mumbai-Pune area (India)

2011-11-13 Thread samal
Let's catch up. I am available in Mumbai.
Using C* in dev env. Love to share or hear experience's.

On Fri, Nov 11, 2011 at 10:25 PM, Adi adi.pan...@gmail.com wrote:

 Hey GeekTalks/any other cassandra users around Mumbai/Pune,

 I will be around Mumbai from last week of Nov through Third week of
 December. I have actively used/deployed a couple of cassandra clusters
 and a bunch of hadoop projects over the past year. I am keenly
 interested in meeting any cassandra/hadoop users and sharing my
 experience.
 Do get in touch with me if any of you would like to host a meetup/user
 group meeting.

 -Adi



 On Mon, Mar 21, 2011 at 9:02 AM, Geek Talks geektalks@gmail.com
 wrote:
  Hi,
 
  Anyone interested joining in Apache Cassandra hangout/meetup nearby
  mumbai-pune area.
 
   Share/teach your exp with Apache Cassandra, problems/issue you faced
 during
  deployment.
   Excited and heard about its buzz, want to learn more about NoSQL
 cassandra.
 
  Regards,
  GeekTalks
 



Re: Cassandra Certification

2011-08-14 Thread samal
Does it really make sense?
If yes, I think Apache Cassandra Project (ASF) should offer Open
Certification. Other entity can offer courses, training materials.


Re: 5 node cluster - Recommended seed configuration.

2011-08-08 Thread samal
It is recommended that seed list should be same in all server so all server
on same state.
It should be Lan IP not Loopback IP.
In seed node, auto bootstrap should be false
2 seed should be enough.

In your case it should be like:
node1:
seeds: node1, autobootstrap=false
node2:
seeds: node1,autobootstrap=true
node3:
seeds: node1, autobootstrap=true
node4:
seeds: node1, autobootstrap=true
node5:
seeds: node1, autobootstrap=true

or

node1:
seeds: node1,node2, autobootstrap=false
node2:
seeds: node1,node2,autobootstrap=false (set it false after bootstrap)
node3:
seeds: node1,node2, autobootstrap=true
node4:
seeds: node1, node2,autobootstrap=true
node5:
seeds: node1,node2, autobootstrap=true

/Samal


On Tue, Aug 9, 2011 at 9:16 AM, Selva Kumar wwgse...@yahoo.com wrote:

 We have a 5 node Cassandra cluster. We use version 0.7.4. What is the
 recommended seed configuration. Here are some configurations, i have
 noticed.

 Example 1:
 ---
 One node being seed to itself.
 node1:
 seeds: node1, autobootstrap=false
 node2:
 seeds: node1, node3, autobootstrap=true
 node3:
 seeds: node2, node4, autobootstrap=true
 node4:
 seeds: node3, node5, autobootstrap=true
 node5:
 seeds: node1, node2, autobootstrap=true

 Example 2:
 ---
 node1:
  seeds: node5,node2, autobootstrap=true
 node2:
 seeds: node1, node3, autobootstrap=true
 node3:
 seeds: node2, node4, autobootstrap=true
 node4:
 seeds: node3, node5, autobootstrap=true
 node5:
 seeds: node1, node2, autobootstrap=true

 Thanks
 Selva





Re: Sample Cassandra project in Tomcat

2011-08-03 Thread samal
I don't know much about this, may help you..

http://www.codefreun.de/apolloUI/
http://www.codefreun.de/apollo/



On Wed, Aug 3, 2011 at 3:36 PM, CASSANDRA learner 
cassandralear...@gmail.com wrote:

 Hiii,

 Can any one pleaze send me any sample application which is (.war)
 implemented in java/jsp and cassandra db (Tomcat)



Re: Installation Exception

2011-08-03 Thread samal
did u compile source code? :)
you have downloaded source code not binary.

try with binary.

On Wed, Aug 3, 2011 at 9:14 PM, Eldad Yamin elda...@gmail.com wrote:

 Hi,
 I'm trying to install Cassandra on Amazon EC2 without success, this is what
 I did:

1. Created new Small EC2 instance (this is just for testing), running
Ubuntu OS - custom AIM (ami-596f3c1c) from:
http://uec-images.ubuntu.com/releases/11.04/release/
2. Installed Java:
# sudo add-apt-repository deb http://archive.canonical.com/ lucid
partner
# sudo apt-get update
# sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
openjdk-6-jre
3. Upgraded:
# sudo apt-get upgrade
4. Downloaded Cassandra:
# cd /usr/src/
# sudo wget

 http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz

# sudo tar xvfz apache-cassandra-*
# cd apache-cassandra-*
5. Config (according to README.txt)
# sudo mkdir -p /var/log/cassandra
# sudo chown -R `whoami` /var/log/cassandra
# sudo mkdir -p /var/lib/cassandra
# sudo chown -R `whoami` /var/lib/cassandra
6. RUN CASSANDRA
# bin/cassandra -f

 The I got Exception:
 ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$
 bin/cassandra -f
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/cassandra/thrift/CassandraDaemon
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.thrift.CassandraDaemon
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon.
 Program will exit.


 Any idea what is wrong?
 Thanks!



Re: Nodetool ring not showing all nodes in cluster

2011-08-02 Thread samal
 ERROR 08:53:47,678 Internal error processing batch_mutate
 java.lang.IllegalStateException: replication factor (3) exceeds number
 of endpoints (1)

You already answered
It always keeps showing only one node and mentions that it is handling 100%
of the load.



 On Tue, Aug 2, 2011 at 7:21 AM, Aishwarya Venkataraman
 cyberai...@gmail.com wrote:
  Replies inline.
 
  Thanks,
  Aishwarya
 
  On Tue, Aug 2, 2011 at 7:12 AM, Sorin Julean sorin.jul...@gmail.com
 wrote:
  Hi,
 
   Until someone answers  with more details, few questions:
   1. did you moved the system keyspace as well ?
  Yes. But I deleted the LocationInfo* files under the system folder.
  Shall I go ahead and delete the entire system folder ?
 
   2. the gossip IP of the new nodes are the same as the old ones ?
  No. The Ip is different.
 
   3. which cassandra version are you running ?
  I am using 0.8.1
 
 
  If 1. is yes and 2. is no, for a quick fix: take down the cluster,
 remove
  system keyspace, bring the cluster up and bootstrap the nodes.
 
 
  Kind regards,
  Sorin
 
 
  On Tue, Aug 2, 2011 at 2:53 PM, Aishwarya Venkataraman
  cyberai...@gmail.com wrote:
 
  Hello,
 
  I recently migrated 400 GB of data that was on a different cassandra
  cluster (3 node with RF= 3) to a new cluster. I have a 3 node  cluster
  with replication factor set to three. When I run nodetool ring, it
  does not show me all the nodes in the cluster. It always keeps showing
  only one node and mentions that it is handling 100% of the load. But
  when I look at the logs, the nodes are able to talk to each other via
  the gossip protocol. Why does this happen ? Can you tell me what I am
  doing wrong ?
 
  Thanks,
  Aishwarya
 
 
 



Re: Nodetool ring not showing all nodes in cluster

2011-08-02 Thread samal
ERROR 08:53:47,678 Internal error processing batch_mutate
 java.lang.IllegalStateException: replication factor (3) exceeds number
 of endpoints (1)

 You already answered
 It always keeps showing only one node and mentions that it is handling
 100% of the load.


Cluster think only one node is present in ring, it doesn't agree RF=3  it is
expecting RF=1.

Original Q: I m not exactly sure what is the problem. But

Does nodetool ring show all the host?
What is your seed list?
Is bootstrapped node has seed ip of its own?

AFAIK gossip work even without actively joining a ring.


 On Tue, Aug 2, 2011 at 7:21 AM, Aishwarya Venkataraman
 cyberai...@gmail.com wrote:
  Replies inline.
 
  Thanks,
  Aishwarya
 
  On Tue, Aug 2, 2011 at 7:12 AM, Sorin Julean sorin.jul...@gmail.com
 wrote:
  Hi,
 
   Until someone answers  with more details, few questions:
   1. did you moved the system keyspace as well ?
  Yes. But I deleted the LocationInfo* files under the system folder.
  Shall I go ahead and delete the entire system folder ?
 
   2. the gossip IP of the new nodes are the same as the old ones ?
  No. The Ip is different.
 
   3. which cassandra version are you running ?
  I am using 0.8.1
 
 
  If 1. is yes and 2. is no, for a quick fix: take down the cluster,
 remove
  system keyspace, bring the cluster up and bootstrap the nodes.
 
 
  Kind regards,
  Sorin
 
 
  On Tue, Aug 2, 2011 at 2:53 PM, Aishwarya Venkataraman
  cyberai...@gmail.com wrote:
 
  Hello,
 
  I recently migrated 400 GB of data that was on a different cassandra
  cluster (3 node with RF= 3) to a new cluster. I have a 3 node  cluster
  with replication factor set to three. When I run nodetool ring, it
  does not show me all the nodes in the cluster. It always keeps showing
  only one node and mentions that it is handling 100% of the load. But
  when I look at the logs, the nodes are able to talk to each other via
  the gossip protocol. Why does this happen ? Can you tell me what I am
  doing wrong ?
 
  Thanks,
  Aishwarya
 
 
 





Re: Read process

2011-07-27 Thread samal
from ROW CACHE {if enabled} --KEY CACHE--MEMTABLE--SSTABLE

On Wed, Jul 27, 2011 at 1:19 PM, CASSANDRA learner 
cassandralear...@gmail.com wrote:

 Hi,

 I am having one doubt regarding reads. The data will be stored in
 commitlog,memtable,sstables right.. While reading the data may be available
 in all the three right, then from where the reads happens,, form commit log?
 or from Memtable ? or from SSTables.. Please explain friends

 Thnks



Re: Cassandra training in Bangalore, India

2011-07-21 Thread samal
As per my knowledge, there is not such expert training available in India as
of now.
As Sameer said there is enough online material available from where you can
learn.I have been playing with Cassandra since beginning. We can plan for
Meetup/learning session near Mumbai/Pune region.


Re: Memtables stored in which location

2011-07-21 Thread samal
SSTable is stored on disk not memtable.

Memtable is memory representation of data, which is on flush to create
SSTable on disk.

This is the location where SSTable is stored
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L71


Where as Commitlog which is back up (log) for memtable replaying store in
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L75
location.

Once the all memtable is flushed to disk, new commit log segment is created.

On Thu, Jul 21, 2011 at 1:12 PM, Abdul Haq Shaik 
abdulsk.cassan...@gmail.com wrote:

 Hi,

 Can you please let me know where exactly the memtables are getting stored.
 I wanted to know the physical location



Re: Memtables stored in which location

2011-07-21 Thread samal
Any ways , some where memtable has to be stored right, like we say memtable
data is flushed to create sstable on disk.

 Exactly from which location or memory it will be getting from. is it like
 an objects streams or like it is storing the values in commitlog.


A Memtable is Cassandra's in-memory representation of key/value pairs.


 my next question is , data is written to commit log. all the data is
 available here, and the sstable are getting created on disk, then where and
 when these memtables are coming into picture


Commitlog is append only file which record write sequentially, more[2], can
be thought as check sum file, which to used to recalculate data for
memtables in case of crash.
A write first hits the *CommitLog*, then Cassandra stores/writes values to
in-memory data structures called Memtables. The Memtables are flushed to
disk whenever one of the configurable thresholds is met.[3]
http://wiki.apache.org/cassandra/MemtableThresholds
For each column family there is corresponding memtable.
There is generally one commitlog file for all CF.

SSTables are immutable once written to disk cannot be modified. It will only
be replaced by new SSTable after compaction


[1]http://wiki.apache.org/cassandra/ArchitectureOverview
[2]http://wiki.apache.org/cassandra/ArchitectureCommitLog
[3]http://wiki.apache.org/cassandra/MemtableThresholds


Re: Range query ordering with CQL JDBC

2011-07-18 Thread samal
I haven't used CQL functionality much, but thirft client

I think what I encounter is exactly this problem!

If you want to query over key, you can index keys to other CF, get the
column names (that is key of other CF ). and then query actual CF with keys.

switch away from the random partitioner.

switching away is not a good choice, RP is very good for load distribution.


Re: gossiper problem

2011-07-14 Thread samal
well I am not a JVM guru, but it seem server has memory problem.


 13 10:44:57,748 Gossiper.java (line 579) InetAddress /10.63.61.74 is now
 UP

  INFO [Timer-0] 2011-07-13 15:56:44,630 Gossiper.java (line 181)
 InetAddress /10.63.61.74 is now dead.

  INFO [GMFD:1] 2011-07-13 15:56:44,653 Gossiper.java (line 579) InetAddress
 /10.63.61.74 is now UP

  INFO [Timer-0] 2011-07-13 16:03:24,391 Gossiper.java (line 181)
 InetAddress /10.63.61.72 is now dead.


It is swapping due to memory need, recommended!! disable swap. rather die
with OOM than swapping.


INFO [GC inspection] 2011-07-13 03:12:06,153 GCInspector.java (line 110) GC
 for ConcurrentMarkSweep: 1097 ms, 371528920 reclaimed leaving 17677528 used;
 max is 118784

  INFO [GC inspection] 2011-07-13 03:12:07,351 GCInspector.java (line 110)
 GC for ParNew: 466 ms, 20619976 reclaimed leaving 157240232 used; max is
 118784

  INFO [GC inspection] 2011-07-13 03:25:54,378 GCInspector.java (line 110)
 GC for ParNew: 283 ms, 26850072 reclaimed leaving 154180424 used; max is
 118784

  INFO [GC inspection] 2011-07-13 06:29:58,092 GCInspector.java (line 110)
 GC for ParNew: 538 ms, 17358792 reclaimed leaving




My cassandra version is **0.6.3**, and the configuration about gc on
 storage_conf.xml is 

 GCGraceSeconds864000/GCGraceSeconds




 JVM configuration is as following:

 JVM_OPTS= \

 -ea \

 -Xms**256M** \

 -Xmx**1G** \

 -XX:+UseParNewGC \



 Can I decrease the JVM_OPTS to –Xms**128M** –Xmx**512M** to avoid swap,
 the data saved in cassandra is small, I do not need so much memory.


Reducing max head size wont solve problem, i think it will do more
swapping.
data only does not only count for memory requirement, but no. of memtables,
as each CF has separate memtable and its size, compaction, caching, read


You should upgrade to 0.7 or later.


/samal


Re: Key_Cache @ Row_Cache

2011-07-13 Thread samal

 Can you give me a bit idea how key_cache and row_cache effects on
 performance of cassandra. How these things works in different scenario
 depending upon the data size?

  While reading, if row_cached is set, it check for row_cache first then
key_cached, memtable  disk.

row_cache store all data on memory, need tuning, generally
lowered preferred
key_cache store only key and location of row in memory, higher is preferred

if row if frequently read it is good to cache it but row size matters large
row size can eat too much memory.

Also this may help
http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches

/Samal


Re: One node down but it thinks its fine...

2011-07-13 Thread samal
Check seed ip is same in all node and should not be loopback ip on cluster.

On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski ray.slakin...@gmail.comwrote:

 One of our nodes, which happens to be the seed thinks its Up and all the
 other nodes are down. However all the other nodes thinks the seed is down
 instead. The logs for the seed node show everything is running as it should
 be. I've tried restarting the node, turning on/off gossip and thrift and
 nothing seems to get the node to see the rest of its ring as up and running.
 I have also tried restarting one of the other nodes, which had no affect on
 the situation. Below is the ring outputs for the seed and one other node in
 the ring, plus a ping to show that the seed can ping the other node.

 # bin/nodetool -h 0.0.0.0 ring
 Address Status State Load Owns Token
  141784319550391026443072753096570088105
 127.0.0.1 Up Normal 4.61 GB 16.67% 0
 xx.xxx.30.210 Down Normal ? 16.67% 28356863910078205288614550619314017621
 xx.xx.90.87 Down Normal ? 16.67% 56713727820156410577229101238628035242
 xx.xx.22.236 Down Normal ? 16.67% 85070591730234615865843651857942052863
 xx.xx.97.96 Down Normal ? 16.67% 113427455640312821154458202477256070484
 xx.xxx.17.122 Down Normal ? 16.67% 141784319550391026443072753096570088105


 # ping xx.xxx.30.210
 PING xx.xxx.30.210 (xx.xxx.30.210) 56(84) bytes of data.
 64 bytes from xx.xxx.30.210: icmp_req=1 ttl=61 time=0.299 ms
 64 bytes from xx.xxx.30.210: icmp_req=2 ttl=61 time=0.287 ms
 ^C
 --- xx.xxx.30.210 ping statistics ---
 2 packets transmitted, 2 received, 0% packet loss, time 999ms
 rtt min/avg/max/mdev = 0.287/0.293/0.299/0.006 ms


 # bin/nodetool -h xx.xxx.30.210 ring
 Address Status State Load Owns Token
  141784319550391026443072753096570088105
 xx.xxx.23.40 Down Normal ? 16.67% 0
 xx.xxx.30.210 Up Normal 10.58 GB 16.67%
 28356863910078205288614550619314017621
 xx.xx.90.87 Up Normal 10.47 GB 16.67%
 56713727820156410577229101238628035242
 xx.xx.22.236 Up Normal 9.63 GB 16.67%
 85070591730234615865843651857942052863
 xx.xx.97.96 Up Normal 10.68 GB 16.67%
 113427455640312821154458202477256070484
 xx.xxx.17.122 Up Normal 10.18 GB 16.67%
 141784319550391026443072753096570088105

 --
 Ray Slakinski





Re: CQL + Counters = bad request

2011-07-13 Thread samal

  cqlsh UPDATE RouterAggWeekly SET 1310367600 = 1310367600 + 17 WHERE
  KEY = '1_20110728_ifoutmulticastpkts';
  Bad Request: line 1:51 no viable alternative at character '+'


I m able to insert it.
___

cqlsh
cqlsh  UPDATE counts SET 1310367600 = 1310367600 + 17 WHERE KEY =
'1_20110728_ifoutmulticastpkts';
cqlsh  UPDATE counts SET 1310367600 = 1310367600 + 17 WHERE KEY =
'1_20110728_ifoutmulticastpkts';
cqlsh
_
[default@test] list counts;
Using default limit of 100
---
RowKey: 1_20110728_ifoutmulticastpkts
= (counter=12, value=16)
= (counter=1310367600, value=34)
---
RowKey: 1
= (counter=1, value=10)

2 Rows Returned.
[default@test]


Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-10 Thread samal
Yes. may be 0.8.2

current version need specific validation class CounterColumn for CCF, that
only count [+,-,do not replace] stuff, where as normal CF simply just add or
replace.


On Sun, Jul 10, 2011 at 10:39 PM, Aditya Narayan ady...@gmail.com wrote:

 Thanks for info.

 Is there any target version in near future for which this has been promised
 ?


 On Sun, Jul 10, 2011 at 9:12 PM, Sasha Dolgy sdo...@gmail.com wrote:

 No, it's not possible.

 To achieve it, there are two options ... contribute to the issue or
 wait for it to be resolved ...

 https://issues.apache.org/jira/browse/CASSANDRA-2614

 -sd

 On Sun, Jul 10, 2011 at 5:04 PM, Aditya Narayan ady...@gmail.com wrote:
  Is it now possible to store counters in the standard column families
 along
  with non counter type columns ? How to achieve this ?





Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-11 Thread samal gorai
Lot of memtables means lot of sstables means lot of disk io.

On 9/7/10, Benjamin Black b...@b3k.us wrote:
 On Mon, Sep 6, 2010 at 12:41 AM, Janne Jalkanen
 janne.jalka...@ecyrd.com wrote:

 So if I read this right, using lots of CF's is also a Bad Idea(tm)?


 Yes, lots of CFs is bad means lots of CFs is also bad.


-- 
Sent from my mobile device


Re: servers for cassandra

2010-09-04 Thread samal gorai
As of now I think only rackspace.com support cassandra in der cloud
webhosting which will cost around $150 to $200 a month. Der is no cheap
kinda thing in cassandra because data is distributed in multiple servers.
I advice you to test on ur LAN only. You can do benchmark testing to test
real conditions.
I use 64 bit linux (ubuntu) with 4GB RAM that is more than sufficient to
play around.
___
*Samal Gora**i*

On Sat, Sep 4, 2010 at 12:05 PM, vineet daniel vineetdan...@gmail.comwrote:

 Hi

 I am just curious to know if there is any hosting company that provides
 servers at a very low cost, wherein I can install cassandra on WAN. I have
 cassandra setup in my LAN and want to test it in real conditions, taking
 dedicated servers just for testing purposes is not at all feasible for me
 not even pay-as-you go types. I'd really appreciate if anybody can share
 information on such hosting providers.

 Vineet Daniel
 Cell  : +918106217121
 Websites :
 Blog http://vinetedaniel.blogspot.com   |   
 Linkedinhttp://in.linkedin.com/in/vineetdaniel
 |  Twitter https://twitter.com/vineetdaniel






Re: Riptano Cassandra training in Denver

2010-09-01 Thread samal gorai
It will be gr8.

Samal Gorai

On Thu, Sep 2, 2010 at 10:46 AM, vineet daniel vineetdan...@gmail.comwrote:

 Hi Jonathan

 Any plans of coming to India in future ?

 ___
 Regards
 Vineet Daniel
 +918106217121
 ___

 Let your email find you


 On Thu, Sep 2, 2010 at 1:52 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Riptano is going to be in Denver next Friday (Sept 10) for a full-day
 Cassandra training (taught by yours truly).  The training is broken
 into two parts: the first covers application design and modeling in
 Cassandra, with exercises using the Pycassa library; the second covers
 operations, troubleshooting, and performance tuning.

 For more details or to register for the training, see
 http://www.eventbrite.com/event/756085472

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com