Cassandra JDBC

2011-11-16 Thread Jone Lura
Hi,

I downloaded the cassandra-jdbc and built with maven.

And when I am trying to use it in my application I get an exception on the 
following code:

java.sql.Connection conn = 
DriverManager.getConnection(jdbc:cassandra://localhost:9160/MyKeyspace);

java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
…
Does anyone have an idea what it might be?

Best regards,

Jone





Re: memory problems still post- CASSANDRA-3492

2011-11-16 Thread Radim Kolar

Dne 15.11.2011 22:04, Mick Semb Wever napsal(a):

But another node (on the same machine but different cluster), even after
an upgrade to the staging 1.0.3 and a `nodetool scrub`, always soaks all
available memory (up to and plateau at 30G). In fact no cf there use
compression anymore.
I had similar problem yesterday with running nodetool scrub on 1.0.3 
while i was trying to convert -g- tables to current format.  There is 
memory leak in scrub. I do not use compression either.



  HintedHandoff  (active)1(pending)2 and it just seems to stay like that.

Is there a way to more closely monitor that active hinted handoff?
you can count columns in system table holding hints but i got OOM 
everytime i tried.



Can one hinted handoff be responsible for such heap?

no. it is scrub because heap increases after each sstable is processed.


Re: Cassandra JDBC

2011-11-16 Thread Nilabja Banerjee
Try this it should work..

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class InsertData {
public static void main(String[] args) throws ClassNotFoundException,
SQLException{
Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
   Connection jdbcConn = DriverManager
 .getConnection(jdbc:cassandra:/@localhost:9160/test);
   Statement stmt = jdbcConn.createStatement();
   ResultSet reset = stmt.executeQuery(select * from users);

   jdbcConn.close();
}

}


On 16 November 2011 13:55, Jone Lura jone.l...@ecc.no wrote:

 Hi,

 I downloaded the cassandra-jdbc and built with maven.

 And when I am trying to use it in my application I get an exception on the
 following code:

 java.sql.Connection conn = DriverManager.getConnection(
 jdbc:cassandra://localhost:9160/MyKeyspace);

 java.lang.IncompatibleClassChangeError: Implementing class
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)

 …

 Does anyone have an idea what it might be?


 Best regards,


 Jone







Re: Fast lookups for userId to username and vice versa

2011-11-16 Thread Konstantin Naryshkin
Or just have two column families to do it: A CF idToName that has the
userIds as keys and the userName as the only column and a CF nameToId
that has the userNames as keys and the userId as the only column

On Mon, Nov 14, 2011 at 03:50, chovatia jaydeep
chovatia_jayd...@yahoo.co.in wrote:
 Check if Cassandra secondary index meets your requirement.

 Thank you,
 Jaydeep
 
 From: Aklin_81 asdk...@gmail.com
 To: user user@cassandra.apache.org
 Sent: Sunday, 13 November 2011 12:32 PM
 Subject: Fast lookups for userId to username and vice versa

 I need to create mapping from userId(s) to username(s) which need to
 provide for fast lookups service ?
 Also I need to provide a mapping from username to userId inorder to
 implement search functionality in my application.

 What could be a good strategy to implement this ? (I would welcome
 suggestions to use any new technologies if they are really worth for my
 case.)




Re: Fast lookups for userId to username and vice versa

2011-11-16 Thread Boris Yen
I think secondary index could do the trick.

However, if you need to provide the pagination function, I will go for
Konstantin's
solution.


On Wed, Nov 16, 2011 at 10:27 PM, Konstantin Naryshkin konstant...@a-bb.net
 wrote:

 Or just have two column families to do it: A CF idToName that has the
 userIds as keys and the userName as the only column and a CF nameToId
 that has the userNames as keys and the userId as the only column

 On Mon, Nov 14, 2011 at 03:50, chovatia jaydeep
 chovatia_jayd...@yahoo.co.in wrote:
  Check if Cassandra secondary index meets your requirement.
 
  Thank you,
  Jaydeep
  
  From: Aklin_81 asdk...@gmail.com
  To: user user@cassandra.apache.org
  Sent: Sunday, 13 November 2011 12:32 PM
  Subject: Fast lookups for userId to username and vice versa
 
  I need to create mapping from userId(s) to username(s) which need to
  provide for fast lookups service ?
  Also I need to provide a mapping from username to userId inorder to
  implement search functionality in my application.
 
  What could be a good strategy to implement this ? (I would welcome
  suggestions to use any new technologies if they are really worth for my
  case.)
 
 



CQL and subcolumns

2011-11-16 Thread Jone Lura
Hi,

I am trying to find out how to use CQL to be able to use cassandra-jdbc in my 
application, and I have some questions.

I have tried to find the answers int the documentation of Cassandra Query 
Language (CQL) v2.0, but I did not find the answers to my following questions.

How do I create a column family with a sub column?

How do I insert values to a column with a sub column?

I am using cassandra 1.0.2.

Best regards

Jone

Thanks for CQL

2011-11-16 Thread Peter Lin
I just wanted to say thanks to the entire Cassandra Team and Hector
client team for CQL.

I've been using it this week and it makes life easier. At first I had
mixed feelings on CQL, but after using it the last few days, the user
friendly factor makes a huge difference.

peter


Re: Thanks for CQL

2011-11-16 Thread Cass Costello
+1

Sent from my iPhone

On Nov 16, 2011, at 8:09 AM, Peter Lin wool...@gmail.com wrote:

 I just wanted to say thanks to the entire Cassandra Team and Hector
 client team for CQL.
 
 I've been using it this week and it makes life easier. At first I had
 mixed feelings on CQL, but after using it the last few days, the user
 friendly factor makes a huge difference.
 
 peter


sstableloader issue

2011-11-16 Thread mike.li
Hello ,

I need to load an external SSTABLEs to a cluster with 4 nodes.  So I shutdown 
one of the node , and created a separate folder on this node as a temporary 
staging place for the external sstables, and run the sstableloader command like:

./bin/sstableloader /cassandra/bulk_load/Timeseries

Starting client (and waiting 30 seconds for gossip) ...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused

Did anyone have the same issue before? How to get around it?

Thank you,
Mike


This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.

Re: BulkLoader

2011-11-16 Thread Brandon Williams
On Mon, Nov 14, 2011 at 2:49 PM, Giannis Neokleous
gian...@generalsentiment.com wrote:
 Hello everyone,

 We're using the bulk loader to load data every day to Cassandra. The
 machines that use the bulkloader are diferent every day so their IP
 addresses change. When I do describe cluster i see all the unreachable
 nodes that keep piling up for the past few days. Is there a way to remove
 those IP addresses without terminating the whole cluster at the same time
 and restarting it?

 The unreachable nodes cause issues when we want to make schema changes to
 all the nodes or when we want to truncate a CF.

 Any suggestions?


It sounds like you're running into
https://issues.apache.org/jira/browse/CASSANDRA-3351 so the first step
would be to upgrade to a version that has it fixed.

Unfortunately, this won't solve the problem, just prevent it from
happening in the future.  To remove the old nodes, you can apply
https://issues.apache.org/jira/browse/CASSANDRA-3337 on one node and
call the JMX method for the unreachable endpoints.

-Brandon


Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Brian Fleming
Hi All,

I have a question about inter-data centre replication : if you have 2 Data
Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node
in DC1, how efficient is the replication to DC2 - i.e. is that data :
 - replicated over to a single node in DC2 once and internally replicated
 or
 - replicated explicitly to two separate nodes?

Obviously from a LAN resource utilisation perspective, the former would be
preferable.

Many thanks,

Brian


Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Sylvain Lebresne
To be complete, https://issues.apache.org/jira/browse/CASSANDRA-3472
is relevant.

--
Sylvain

On Wed, Nov 16, 2011 at 9:40 PM, Jake Luciani jak...@gmail.com wrote:
 the former

 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming bigbrianflem...@gmail.com
 wrote:

 Hi All,

 I have a question about inter-data centre replication : if you have 2 Data
 Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node
 in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or
  - replicated explicitly to two separate nodes?
 Obviously from a LAN resource utilisation perspective, the former would be
 preferable.
 Many thanks,
 Brian



 --
 http://twitter.com/tjake



Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Brian Fleming
Great - thanks Jake

B.

On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani jak...@gmail.com wrote:

 the former


 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming 
 bigbrianflem...@gmail.comwrote:


 Hi All,

 I have a question about inter-data centre replication : if you have 2
 Data Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a
 node in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or
  - replicated explicitly to two separate nodes?

 Obviously from a LAN resource utilisation perspective, the former would
 be preferable.

 Many thanks,

 Brian




 --
 http://twitter.com/tjake



Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread ehers...@gmail.com
On a related note - assuming there are available resources across the board
(cpu and memory on every node, low network latency, non-saturated
nics/circuits/disks), what's a reasonable expectation for timing on
replication? Sub-second? Less than five seconds?

Ernie

On Wed, Nov 16, 2011 at 4:00 PM, Brian Fleming bigbrianflem...@gmail.comwrote:

 Great - thanks Jake

 B.

 On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani jak...@gmail.com wrote:

 the former


 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming bigbrianflem...@gmail.com
  wrote:


 Hi All,

 I have a question about inter-data centre replication : if you have 2
 Data Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a
 node in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or
  - replicated explicitly to two separate nodes?

 Obviously from a LAN resource utilisation perspective, the former would
 be preferable.

 Many thanks,

 Brian




 --
 http://twitter.com/tjake





Re: Seeking advice on Schema and Caching

2011-11-16 Thread Aditya
Thanks to samal who pointed to look at the composite columns. I am now
using composite columns names containing username+userId  valueless
column. Thus column names are now unique even for users with same name as
userId is also attached to the same composite col name. Thus the
supercolumn issue is resolved.
But I am still seeking advice some on the caching strategy for these rows.
Since while a user is doing the search, the DB will be queried multiple
times because  I 'm not keeping the retrieved columns in the application
layer. Thus I am thinking of caching this row so that the further queries
be served through the cache. However the important point here is that I am
using very fewer resources for this cache so that the rows remain in cache
for a very short time so as to serve the needs only for a single search
time interval like max 30 seconds. Is this approach correct.? That way I
wont be putting unneccessary data in cache for a long time thus saving
resources for other needs.

On Wed, Nov 16, 2011 at 11:20 AM, samal samalgo...@gmail.com wrote:

 I think you can but I am not sure, I haven't tried that yet, Nothing harm
 in keeping value also it will be read in single query only.

 In 2nd case, yes 2 or more query required to get specific user details. As
 username is map to user_id's key(unique like UUID) and user_id key store
 actual details.


 On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan ady...@gmail.com wrote:

 Regarding the first option that you suggested through composite columns,
 can I store the username  id both in the column name and keep the column
 valueless?
 Will I be able to retrieve both the username and id from the composite
 col name ?

 Thanks a lot

 On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan ady...@gmail.comwrote:

 Got the first option that you suggested.

 However, In the second one, are you suggested to use, for e.g,
 key='Marcos'  store cols, for all users of that name, containing userId
 inside that row. That way it would have to read multiple rows while user is
 doing a single search.


 On Wed, Nov 16, 2011 at 10:47 AM, samal samalgo...@gmail.com wrote:


   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is made
 of first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row (with
 key=Mar).
  The column names are framed as remaining letters of the names.
 Thus, a name 'Marcos' will be stored within rowkey Mar  col name 
 cos.
 The id will be stored as column value. Since there could be many users 
 with
 same name. Thus I would have multple userIds(of users named Marcos) to 
 be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that ids
 of users with same name may fit as sub-columns inside a super-column) but
 since supercolumns are not encouraged thus I want to use an alternative
 schema for this usecase if possible. Could you suggest some ideas on 
 this ?
 


 Aditya,

 Have you any given thought on Composite columns [1]. I think it can
 help you solve your problem of multiple user with same name.

 mar:{
   {cos,unique_user_id}:unique_user_id,
   {cos,1}:1,
   {cos,2}:2,
   {cos,3}:3,

 //  {utf8,timeUUID}:timeUUID,
 }
 OR
 you can try wide rows indexing user name to ID's

 marcos{
user1:' ',
user2:' ',
user3:' '
 }

 [1]http://www.slideshare.net/edanuff/indexing-in-cassandra







Re: Network traffic patterns

2011-11-16 Thread Todd Burruss
Are all of your machines equal hardware?  Since those machines are sending data 
somewhere, maybe they are behind in replicating and are continuously catching 
up?

Use a tool like tcpdump to find out where the data is going

From: Philippe watche...@gmail.commailto:watche...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 15 Nov 2011 13:22:38 -0800
To: user user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Network traffic patterns

Sorry about the previous message, I've enabled keyboard shortcuts on 
gmail...*sigh*...

Hello,
I'm trying to understand the network usage I am seeing in my cluster, can 
anyone shed some light?
It's an RF=3, 12-node, cassandra 0.8.6 cluster. repair is performed on each 
node once a week, with a rolling schedule.
The nodes are p13,p14,p15...p24 and are consecutive in that order on the ring. 
Each node is only a cassandra database. I am hitting the cluster from another 
server (p4).

p4 is doing this with 20 threads in parallel

 1.  read a lot of data (some columns for hundreds to tens of thousands of 
keys, split into 512-key multigets)
 2.  process the data
 3.  write back a byte array to cassandra (average size is 400 bytes)
 4.  go back to 1

According to my munin graphs, network usage is about as follows. I am not 
surprised at the bias towards p13-p15 as p4 is getting  storing data mainly 
for keys located on one of those nodes.

 *   p4 : 1.5Mb/s in and out
 *   p13-p15 : 15Mb/s in and 80Mb/s out
 *   p16-p24 : 45Mb/s in and 5Mb/s out

What I don't understand is why p4 is only seeing 1.5Mb/s while I see 80Mb/s on 
p13  p15.

The way I understand this:

 *   p4 makes a multiget to the cluster, electing to use any node in the 
cluster (IN traffic for describe the query)
 *   coordinator node replays the query on all 3 replicas (so 3 servers each 
get the IN traffic, mostly p13-p15)
 *   each server replies to coordinator
 *   coordinator chooses matching values and sends back data to p4

So if p13-p15 are outputting 80Mb/s why am I not seeing 80Mb/s coming into p4 
which is on the receiving end ?

Thanks

2011/11/15 Philippe watche...@gmail.commailto:watche...@gmail.com
Hello,
I'm trying to understand the network usage I am seeing in my cluster, can 
anyone shed some light?
It's an RF=3, 12-node, cassandra 0.8.6 cluster. The nodes are p13,p14,p15...p24 
and are consecutive in that order on the ring.
Each node is only a cassandra database. I am hitting the cluster from another 
server (p4).

The pattern on p4 is the pattern is to

 1.  read a lot of data (some columns for hundreds to tens of thousands of 
keys, split into 512-key multigets)
 2.  process the data
 3.  write back a byte array to cassandra (average size is 400 bytes)

p4 reads as



Re: Efficiency of Cross Data Center Replication...?

2011-11-16 Thread Jeremiah Jordan
Pretty sure data is sent to the coordinating node in DC2 at the same time it is 
sent to replicas in DC1, so I would think 10's of milliseconds after the 
transport time to DC2.

On Nov 16, 2011, at 3:48 PM, ehers...@gmail.com wrote:

 On a related note - assuming there are available resources across the board 
 (cpu and memory on every node, low network latency, non-saturated 
 nics/circuits/disks), what's a reasonable expectation for timing on 
 replication? Sub-second? Less than five seconds? 
 
 Ernie
 
 On Wed, Nov 16, 2011 at 4:00 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 Great - thanks Jake
 
 B.
 
 On Wed, Nov 16, 2011 at 8:40 PM, Jake Luciani jak...@gmail.com wrote:
 the former
 
 
 On Wed, Nov 16, 2011 at 3:33 PM, Brian Fleming bigbrianflem...@gmail.com 
 wrote:
 
 Hi All,
  
 I have a question about inter-data centre replication : if you have 2 Data 
 Centers, each with a local RF of 2 (i.e. total RF of 4) and write to a node 
 in DC1, how efficient is the replication to DC2 - i.e. is that data :
  - replicated over to a single node in DC2 once and internally replicated
  or 
  - replicated explicitly to two separate nodes?
 
 Obviously from a LAN resource utilisation perspective, the former would be 
 preferable.
 
 Many thanks,
 
 Brian
 
 
 
 
 -- 
 http://twitter.com/tjake
 
 



Re: Seeking advice on Schema and Caching

2011-11-16 Thread samal
 Edanuff + Beautiful People

I think row cache could be the best fit but it can take resource
depending on row size. It will only touch disk once (first time) in case of
SST, rest of the req for that row will be served from memory. Try
increasing row cache size and decreasing save period to appropriate value
*Row cache size / save period in seconds: *200/30
 one catch this is only good for small size row, as your one row contain
all entry with first 3 similar char, this can happen that one row could
become very large while other remain very thin.
eg:
 many ppl can have aditya name
adi{
{tya,1}
.
.
}

but only few ppl will have name with x or y.


On Thu, Nov 17, 2011 at 3:29 AM, Aditya ady...@gmail.com wrote:

 Thanks to samal who pointed to look at the composite columns. I am now
 using composite columns names containing username+userId  valueless
 column. Thus column names are now unique even for users with same name as
 userId is also attached to the same composite col name. Thus the
 supercolumn issue is resolved.
 But I am still seeking advice some on the caching strategy for these rows.
 Since while a user is doing the search, the DB will be queried multiple
 times because  I 'm not keeping the retrieved columns in the application
 layer. Thus I am thinking of caching this row so that the further queries
 be served through the cache. However the important point here is that I am
 using very fewer resources for this cache so that the rows remain in cache
 for a very short time so as to serve the needs only for a single search
 time interval like max 30 seconds. Is this approach correct.? That way I
 wont be putting unneccessary data in cache for a long time thus saving
 resources for other needs.


 On Wed, Nov 16, 2011 at 11:20 AM, samal samalgo...@gmail.com wrote:

 I think you can but I am not sure, I haven't tried that yet, Nothing harm
 in keeping value also it will be read in single query only.

 In 2nd case, yes 2 or more query required to get specific user details.
 As username is map to user_id's key(unique like UUID) and user_id key store
 actual details.


 On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan ady...@gmail.comwrote:

 Regarding the first option that you suggested through composite columns,
 can I store the username  id both in the column name and keep the column
 valueless?
 Will I be able to retrieve both the username and id from the composite
 col name ?

 Thanks a lot

 On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan ady...@gmail.comwrote:

 Got the first option that you suggested.

 However, In the second one, are you suggested to use, for e.g,
 key='Marcos'  store cols, for all users of that name, containing userId
 inside that row. That way it would have to read multiple rows while user is
 doing a single search.


 On Wed, Nov 16, 2011 at 10:47 AM, samal samalgo...@gmail.com wrote:


   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is made
 of first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row (with
 key=Mar).
  The column names are framed as remaining letters of the names.
 Thus, a name 'Marcos' will be stored within rowkey Mar  col name 
 cos.
 The id will be stored as column value. Since there could be many users 
 with
 same name. Thus I would have multple userIds(of users named Marcos) 
 to be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that
 ids of users with same name may fit as sub-columns inside a 
 super-column)
 but since supercolumns are not encouraged thus I want to use an 
 alternative
 schema for this usecase if possible. Could you suggest some ideas on 
 this ?
 


 Aditya,

 Have you any given thought on Composite columns [1]. I think it can
 help you solve your problem of multiple user with same name.

 mar:{
   {cos,unique_user_id}:unique_user_id,
   {cos,1}:1,
   {cos,2}:2,
   {cos,3}:3,

 //  {utf8,timeUUID}:timeUUID,
 }
 OR
 you can try wide rows indexing user name to ID's

 marcos{
user1:' ',
user2:' ',
user3:' '
 }

 [1]http://www.slideshare.net/edanuff/indexing-in-cassandra








Re: Seeking advice on Schema and Caching

2011-11-16 Thread Aditya
On Thu, Nov 17, 2011 at 10:25 AM, samal samalgo...@gmail.com wrote:

  Edanuff + Beautiful People

 I think row cache could be the best fit but it can take resource
 depending on row size. It will only touch disk once (first time) in case of
 SST, rest of the req for that row will be served from memory. Try
 increasing row cache size and decreasing save period to appropriate value
 *Row cache size / save period in seconds: *200/30


Very nice . I didn't knew that we could even have the save period setting
as well. This makes the job easier. Now can reduce the period to 30 sec 
put the row cache size to a good enough limit. Thanks :)

Yes there may be rows that will be very wide, I'll need to figure if I can
do something better for that, but even this wont be problematic until my
cache period is reasonable and cache size is set to a good limit, right ?

 one catch this is only good for small size row, as your one row contain
 all entry with first 3 similar char, this can happen that one row could
 become very large while other remain very thin.
 eg:
  many ppl can have aditya name
 adi{
 {tya,1}
 .
 .
 }

 but only few ppl will have name with x or y.



 On Thu, Nov 17, 2011 at 3:29 AM, Aditya ady...@gmail.com wrote:

 Thanks to samal who pointed to look at the composite columns. I am now
 using composite columns names containing username+userId  valueless
 column. Thus column names are now unique even for users with same name as
 userId is also attached to the same composite col name. Thus the
 supercolumn issue is resolved.
 But I am still seeking advice some on the caching strategy for these
 rows. Since while a user is doing the search, the DB will be
 queried multiple times because  I 'm not keeping the retrieved columns in
 the application layer. Thus I am thinking of caching this row so that
 the further queries be served through the cache. However the important
 point here is that I am using very fewer resources for this cache so that
 the rows remain in cache for a very short time so as to serve the needs
 only for a single search time interval like max 30 seconds. Is this
 approach correct.? That way I wont be putting unneccessary data in cache
 for a long time thus saving resources for other needs.


 On Wed, Nov 16, 2011 at 11:20 AM, samal samalgo...@gmail.com wrote:

 I think you can but I am not sure, I haven't tried that yet, Nothing
 harm in keeping value also it will be read in single query only.

 In 2nd case, yes 2 or more query required to get specific user details.
 As username is map to user_id's key(unique like UUID) and user_id key store
 actual details.


 On Wed, Nov 16, 2011 at 11:10 AM, Aditya Narayan ady...@gmail.comwrote:

 Regarding the first option that you suggested through composite
 columns, can I store the username  id both in the column name and keep the
 column valueless?
 Will I be able to retrieve both the username and id from the composite
 col name ?

 Thanks a lot

 On Wed, Nov 16, 2011 at 10:56 AM, Aditya Narayan ady...@gmail.comwrote:

 Got the first option that you suggested.

 However, In the second one, are you suggested to use, for e.g,
 key='Marcos'  store cols, for all users of that name, containing userId
 inside that row. That way it would have to read multiple rows while user 
 is
 doing a single search.


 On Wed, Nov 16, 2011 at 10:47 AM, samal samalgo...@gmail.com wrote:


   I need to add 'search users' functionality to my application. (The
 trigger for fetching searched items(like google instant search) is made
 when 3 letters have been typed in).
 
  For this, I make a CF with String type keys. Each such key is
 made of first 3 letters of a user's name.
 
  Thus all names starting with 'Mar-' are stored in single row
 (with key=Mar).
  The column names are framed as remaining letters of the names.
 Thus, a name 'Marcos' will be stored within rowkey Mar  col name 
 cos.
 The id will be stored as column value. Since there could be many users 
 with
 same name. Thus I would have multple userIds(of users named Marcos) 
 to be
 stored inside columnname cos under key Mar. Thus,
 
  1. Supercolumn seems to be a better fit for my use case(so that
 ids of users with same name may fit as sub-columns inside a 
 super-column)
 but since supercolumns are not encouraged thus I want to use an 
 alternative
 schema for this usecase if possible. Could you suggest some ideas on 
 this ?
 


 Aditya,

 Have you any given thought on Composite columns [1]. I think it can
 help you solve your problem of multiple user with same name.

 mar:{
   {cos,unique_user_id}:unique_user_id,
   {cos,1}:1,
   {cos,2}:2,
   {cos,3}:3,

 //  {utf8,timeUUID}:timeUUID,
 }
 OR
 you can try wide rows indexing user name to ID's

 marcos{
user1:' ',
user2:' ',
user3:' '
 }

 [1]http://www.slideshare.net/edanuff/indexing-in-cassandra









About compile YCSB with Cassandra 1.02

2011-11-16 Thread Matsumoto, Miki | DU
Hi, 

I want to have performance measurement of Cassandra 1.02 using YCSB.

But YCSB only supports Cassandra 0.7.

If someone have knowledge about how to compile Cassandra 1.02 with YCSB or 
tips, could please share it with me? Thank you very much.

Regards
Miki


mmap I/O and shared memory

2011-11-16 Thread Jaesung Lee
I am running 7 nodes cassandra(v1.0.2) cluster.
I am putting 20K rows per sec to the cluster.
This cluster has 1 KS, 3CFs.
Each CF has 4-5 secondary indices.

After I'v run for 1 week, nodes use swap memory.
I changed disk-access-mode to index_only or standard.
I got strange memory results.

 using mmap:
 VIRT: 566g  RES: 36g  SHR:12g
 standard disk access mode
 VIRT:24.7g  RES: 24g  SHR:68m

I allocated 24g memory for JVM heap.

I have some questions about mmap.
It is easy to analyze standard disk access mode's memory result.

I know cassandra use huge virtual memory for mmap I/O and each mmaped addresses 
are mapped to indexed file not swap memory.

But, I don't understand why cassandra use shared memory, if using mmap I/O.

Are there some documents that explain this situation? 

-- 
Jaesung Lee
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)



Dropped request...

2011-11-16 Thread Jeesoo Shin
Hello.
I'm using cassandra 0.8.6
with nodetool tpstats, dropped statistics are shown.

when drop happens... what can I do?
are there ways to turn on debug messages or to look into?

thanks.