Re: Cassandra and harddrives

2012-04-25 Thread Maki Watanabe
If your shared disk is super fast enough to handle IO requests from
multiple cassandra node, you can do it in theory. And the disk will be
the single point of failure in your system.
For optimal performance, each node should have at least 2 hdd, one for
commitlog and one for data.

maki

2012/4/26 samal samalgo...@gmail.com:
 Each node need its own HDD for multiple copies. cant share it with others
 node.


 On Thu, Apr 26, 2012 at 8:52 AM, Benny Rönnhager
 benny.ronnha...@thrutherockies.com wrote:

 Hi!

 I am building a database with several hundred thousands of images.
 have just learned that HaProxy is a very good fronted to a couple of
 Cassandra nodes. I understand how that works but...

 Must every single node (mac mini) have it's own external harddrive with
 the same data (images) or can I just use one hard drive that can be
 accessed by all nodes?

 What is the recommended way to do this?

 Thanks in advance.

 Benny




Re: High latency of cassandra

2012-04-24 Thread Maki Watanabe
If you set trace level for IncomingTCPConnection, the message Version
is now ... will be printed for every inter-cassandra message received
by the node, including Gossip.
Enabling this log in high traffic will saturate IO for your log disk by itself.

You should better to inspect nodetool tpstats, compactionstats at first.

maki

2012/4/24 马超 hossc...@gmail.com:
 Does any one have idea for this? Thanks~


 2012/4/24 马超 hossc...@gmail.com

 Hi all,

 I have some troubles of cassandra in my production:

 I build up a RPC server which using hector client to manipulate the
 cassandra. Wired things happen nowadays: the latency of RPC sometimes
 becames very high (10seconds~70seconds) in several minutes and reduce
 to normal level (30ms in average) after that time. I investigate the
 debug log of cassandra. During high latency time, the cassandra output
 lots of message like:
 IncomingTcpConnection.java(116) Version is now 3. 
 Seems everything be blocked during that time.

 Our settings as following:
 The version of cassandra is 1.0.1 and hector version is 0.7.0 for the
 compatible of thrift version which we use (0.5.0)
 The cluster contains 4 nodes and all of them are seeds. The
 gcgraceseconds is 0 since we needn't delete the data

 p.s. It works well for a long time (3 months) but becames crazy these
 days after we push the new RPC server which supports bigger data
 saving (2mb in average). I'm not sure if these is the reason.

 Hope getting your replay~~

 Thanks,

 Chao.




Re: super column

2012-04-12 Thread Maki Watanabe
http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1

hope this help.
maki

2012/4/12 puneet loya puneetl...@gmail.com:
 what is composite columns?

 super column under it can contain just multiple columns.. will composite
 columns be useful?


 On Thu, Apr 12, 2012 at 3:50 PM, Paolo Bernardi berna...@gmail.com wrote:

 No.
 Maybe that a super column can contain composite columns but I'm not sure.

 Paolo

 On Apr 12, 2012 12:15 PM, puneet loya puneetl...@gmail.com wrote:

 Can a super column contain a super column?

 I mean is nesting of supercolumns possible?

 Regards,

 Puneet :)




Re: Why so many SSTables?

2012-04-10 Thread Maki Watanabe
You can configure sstable size by sstable_size_in_mb parameter for LCS.
The default value is 5MB.
You should better to check you don't have many pending compaction tasks
with nodetool tpstats and compactionstats also.
If you have enough IO throughput, you can increase
compaction_throughput_mb_per_sec
in cassandra.yaml to reduce pending compactions.

maki

2012/4/10 Romain HARDOUIN romain.hardo...@urssaf.fr:

 Hi,

 We are surprised by the number of files generated by Cassandra.
 Our cluster consists of 9 nodes and each node handles about 35 GB.
 We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
 We have 30 CF.

 We've got roughly 45,000 files under the keyspace directory on each node:
 ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
 44372

 The biggest CF is spread over 38,000 files:
 ls -l Documents* | wc -l
 37870

 ls -l Documents*-Data.db | wc -l
 7586

 Many SSTable are about 4 MB:

 19 MB - 1 SSTable
 12 MB - 2 SSTables
 11 MB - 2 SSTables
 9.2 MB - 1 SSTable
 7.0 MB to 7.9 MB - 6 SSTables
 6.0 MB to 6.4 MB - 6 SSTables
 5.0 MB to 5.4 MB - 4 SSTables
 4.0 MB to 4.7 MB - 7139 SSTables
 3.0 MB to 3.9 MB - 258 SSTables
 2.0 MB to 2.9 MB - 35 SSTables
 1.0 MB to 1.9 MB - 13 SSTables
 87 KB to  994 KB - 87 SSTables
 0 KB - 32 SSTables

 FYI here is CF information:

 ColumnFamily: Documents
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
   Row cache size / save period in seconds / keys to save : 0.0/0/all
   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
   Key cache size / save period in seconds: 20.0/14400
   GC grace seconds: 1728000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Column Metadata:
     Column Name: refUUID (7265664944)
       Validation Class: org.apache.cassandra.db.marshal.BytesType
       Index Name: refUUID_idx
       Index Type: KEYS
   Compaction Strategy:
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy
   Compression Options:
     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

 Is it a bug? If not, how can we tune Cassandra to avoid this?

 Regards,

 Romain


Re: leveled compaction - improve log message

2012-04-09 Thread Maki Watanabe
for details, open conf/log4j-server.properties and add following configuration:

log4j.logger.org.apache.cassandra.db.compaction.LeveledManifest=DEBUG

fyi.

maki


2012/4/10 Jonathan Ellis jbel...@gmail.com:
 CompactionExecutor doesn't have level information available to it; it
 just compacts the sstables it's told to.  But if you enable debug
 logging on LeveledManifest you'd see what you want.  (Compaction
 candidates for L{} are {})

 2012/4/5 Radim Kolar h...@filez.com:
 it would be really helpfull if leveled compaction prints level into syslog.

 Demo:

 INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java
 (line 113) Compacting ***LEVEL 1***
 [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')]

  INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java
 (line 221) *** LEVEL 1 *** Compacted to
 [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,].
  59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at
 1.814434MB/s.  Time: 30,256ms.





 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


Re: cassandra and .net

2012-04-09 Thread Maki Watanabe
Check your cassandra log.
If you can't find any interesting log, set cassandra log level
to DEBUG and run your program again.

maki

2012/4/10 puneet loya puneetl...@gmail.com:
 hi,

 sorry i posted the port as 7000. I m using 9160 but still has the same
 error.

 Cannot read, Remote side has closed.
 Can u guess whats happening??

 On Tue, Apr 10, 2012 at 11:00 AM, Pierre Chalamet pie...@chalamet.net
 wrote:

 hello,

 9160 is probably the port to use if you use the default config.

 - Pierre

 On Apr 10, 2012, at 7:26 AM, puneet loya puneetl...@gmail.com wrote:

  using System;
  using System.Collections.Generic;
  using System.Linq;
  using System.Text;
  using Thrift.Collections;
  using Thrift.Protocol;
  using Thrift.Transport;
  using Apache.Cassandra;
 
  namespace ConsoleApplication1
  {
      class Program
      {
          static void Main(string[] args)
          {
              TTransport transport=null;
              try
              {
                  transport = new TBufferedTransport(new
  TSocket(127.0.0.1, 7000));
 
 
                  //if(buffered)
                  //            trans = new TBufferedTransport(trans as
  TStreamTransport);
                  //if (framed)
                  //    trans = new TFramedTransport(trans);
 
                  TProtocol protocol = new TBinaryProtocol(transport);
                  Cassandra.Client client = new
  Cassandra.Client(protocol);
 
                  Console.WriteLine(Opening connection);
 
                  if (!transport.IsOpen)
                      transport.Open();
 
                  client.describe_keyspace(abc);               //
  Crashing at this point
 
            }
              catch (Exception ex)
              {
                  Console.WriteLine(ex.Message);
              }
              finally
              { if(transport!=null)
                  transport.Close(); }
              Console.ReadLine();
          }
      }
  }
 
  I m trying to interact with cassandra server(database) from .net. For
  that i have referred two libraries i.e, apacheCassandra08.dll and
  thrift.dll.. In the following piece of code the connection is getting 
  opened
  but when i m using client object it is giving an error stating Cannot 
  read,
  Remote side has closed.
 
  Can any1 help me out with this? Has any1 faced the same prob?
 
 




Re: cqlsh

2012-03-27 Thread Maki Watanabe
You can find it in the bin directory of the binary distribution.

maki


2012/3/27 puneet loya puneetl...@gmail.com:
 How do i use the cqlsh comand line utility??

 I m using cassandra 1.0.8.. Does cqlsh command line utility comes with the
 download of cassandra 1.0.8 or we have to do it separately.

 Suggest a way to go further

 plz reply :)


Re: cqlsh

2012-03-27 Thread Maki Watanabe
cqlsh is a python script.
You need python, and cql driver for python.

maki

2012/3/28 puneet loya puneetl...@gmail.com:
 i know cqlsh is in the bin directory .. how to run it on Windows? it is not
 a batch file that can be run directly?

 Should we use python to execute it??

 plz show a way :)

 Reply


 On Wed, Mar 28, 2012 at 8:34 AM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

 You can find it in the bin directory of the binary distribution.

 maki


 2012/3/27 puneet loya puneetl...@gmail.com:
  How do i use the cqlsh comand line utility??
 
  I m using cassandra 1.0.8.. Does cqlsh command line utility comes with
  the
  download of cassandra 1.0.8 or we have to do it separately.
 
  Suggest a way to go further
 
  plz reply :)




Re: Fwd: information on cassandra

2012-03-26 Thread Maki Watanabe
auto_bootstrap has been removed from cassandra.yaml and always enabled
since 1.0.
fyi.

maki

2012/3/26 R. Verlangen ro...@us2.nl:
 Yes, you can add nodes to a running cluster. It's very simple: configure
 the cluster name and seed node(s) in cassandra.yaml, set auto_bootstrap to
 true and start the node.


 2012/3/26 puneet loya puneetl...@gmail.com

 5n.. consider i m starting on a single node. can I add nodes later?? plz
 reply :)


 On Sun, Mar 25, 2012 at 7:41 PM, Ertio Lew ertio...@gmail.com wrote:

 I guess 2 node cluster with RF=2 might also be a starting point. Isn't it
 ? Are there any issues with this ?

 On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote:

 Cassandra has distributed architecture. So 1 node does not fit into it.
 although it can used but you loose its benefits , ok if you are just 
 playing
 around, use vm  to learn how cluster communicate, handle request.

 To get full tolerance, redundancy and consistency minimum 3 node is
 required.

 Imp read here:
 http://wiki.apache.org/cassandra/
 http://www.datastax.com/docs/1.0/index
 http://thelastpickle.com/
 http://www.acunu.com/blogs/all/



 On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta garvita.me...@tcs.com
 wrote:

 its not advisable to use cassandra on single node, as its basic
 definition says if a node fails, data still remains in the system, 
 atleast 3
 nodes must be there while setting up a cassandra cluster.


 Garvita Mehta
 CEG - Open Source Technology Group
 Tata Consultancy Services
 Ph:- +91 22 67324756
 Mailto: garvita.me...@tcs.com
 Website: http://www.tcs.com
 
 Experience certainty. IT Services
 Business Solutions
 Outsourcing
 

 -puneet loya wrote: -

 To: user@cassandra.apache.org
 From: puneet loya puneetl...@gmail.com
 Date: 03/24/2012 06:36PM
 Subject: Fwd: information on cassandra




 hi,

 I m puneet, an engineering student. I would like to know that, is
 cassandra useful considering we just have a single node(rather a single
 system) having all the information.
 I m looking for decent response time for the database. can you please
 respond?

 Thank you ,

 Regards,

 Puneet Loya

 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl



Re: unbalanced ring

2012-03-26 Thread Maki Watanabe
What version are you using?
Anyway try nodetool repair  compact.

maki

2012/3/26 Tamar Fraenkel ta...@tok-media.com

 Hi!
 I created Amazon ring using datastax image and started filling the db.
 The cluster seems un-balanced.

 nodetool ring returns:
 Address DC  RackStatus State   Load
  OwnsToken

  113427455640312821154458202477256070485
 10.34.158.33us-east 1c  Up Normal  514.29 KB
 33.33%  0
 10.38.175.131   us-east 1c  Up Normal  1.5 MB
  33.33%  56713727820156410577229101238628035242
 10.116.83.10us-east 1c  Up Normal  1.5 MB
  33.33%  113427455640312821154458202477256070485

 [default@tok] describe;
 Keyspace: tok:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:2]

 [default@tok] describe cluster;
 Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131,
 10.34.158.33, 10.116.83.10]


 Any idea what is the cause?
 I am running similar code on local ring and it is balanced.

 How can I fix this?

 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.png

Re: Cassandra as Database for Role Based Access Control System

2012-03-20 Thread Maki Watanabe
 user ---n:m--- role ---n:m--- resource

It can work, but Cassandra is not RDBMS as you know, so RDBMS-ish data
modeling may not fit in production. (depends on your requirement on
performance. I'm not sure.)
In general you should better to desgin schema from your access pattern.

maki


2012/3/20 Maciej Miklas mac.mik...@googlemail.com:
 Hi *,

 I would like to know your opinion about using Cassandra to implement a
 RBAC-like authentication  authorization model. We have simplified the
 central relationship of the general model
 (http://en.wikipedia.org/wiki/Role-based_access_control) to:

 user ---n:m--- role ---n:m--- resource

 user(s) and resource(s) are indexed with externally visible identifiers.
 These identifiers need to be re-ownable (think: mail aliases), too.

 The main reason to consider Cassandra is the availability, scalability and
 (global) geo-redundancy. This is hard to achieve with a RBDMS.

 On the other side, RBAC has many m:n relations. While some inconsistencies
 may be acceptable, resource ownership (i.e. role=owner) must never ever be
 mixed up.

 What do you think? Is such relational model an antipattern for Cassandra
 usage? Do you know similar solutions based on Cassandra?


 Regards,

 Maciej


 ps. I've posted this question also on stackoverflow, but I would like to
 also get feedback from Cassandra community.






Re: snapshot files locked

2012-03-14 Thread Maki Watanabe
snapshot files are hardlinks of the original sstables.
As you know, on windows, you can't delete files opened by other process.
If you try to delete the hardlink, windows thinks you try to delete
the sstables in production.

maki

2012/3/14 Jim Newsham jnews...@referentia.com:

 Hi,

 I'm using Cassandra 1.0.8, on Windows 7.  When I take a snapshot of the
 database, I find that I am unable to delete the snapshot directory (i.e.,
 dir named {datadir}\{keyspacename}\snapshots\{snapshottag}) while
 Cassandra is running:  The action can't be completed because the folder or
 a file in it is open in another program.  Close the folder or file and try
 again.  If I terminate Cassandra, then I can delete the directory with no
 problem.  Is there a reason why Cassandra must hold onto these files?

 Thanks,
 Jim



Re: [Windows] How to configure simple authentication and authorization ?

2012-03-14 Thread Maki Watanabe
Have you build and installed SimpleAuthenticator from the source repository?
It is not included in the binary kit.

maki

2012/3/14 Sabbiolina sabbiol...@gmail.com:
 HI. I followed this:



 To set up simple authentication and authorization
 1. Edit cassandra.yaml, setting
 org.apache.cassandra.auth.SimpleAuthenticator as the
 authenticator value. The default value of AllowAllAuthenticator is
 equivalent to no authentication.
 2. Edit access.properties, adding entries for users and their permissions to
 read and write to specified
 keyspaces and column families. See access.properties below for details on
 the correct format.
 3. Make sure that users specified in access.properties have corresponding
 entries in passwd.properties.
 See passwd.properties below for details and examples.
 4. After making the required configuration changes, you must specify the
 properties files when starting Cassandra
 with the flags -Dpasswd.properties and -Daccess.properties. For example:
 cd $CASSANDRA_HOME
 sh bin/cassandra -f -Dpasswd.properties=conf/passwd.properties
 -Daccess.properties=conf/access.properties


 I started services with additional parameters, but no result, no Log,
 nothing

 I use datastax 1.0.8 communiti edition on win 7 64 bit


 Tnxs



Re: Adding nodes to cluster (Cassandra 1.0.8)

2012-03-14 Thread Maki Watanabe
Do you use same storage_port across 3 nodes?
Can you access to the storage_port of the seed node from the last (failed) node?

2012/3/14 Rishabh Agrawal rishabh.agra...@impetus.co.in:
 I was able to successfully join a node to already existing one-node cluster
 (without giving any intital_token), but when I add another machine with
 identical settings (with changes in listen broadcast and rpc address). I am
 unable to join it to the cluster and it gives me following error:



 INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap

 INFO 17:50:35,556 JOINING: getting bootstrap token

 ERROR 17:50:35,557 Exception encountered during startup

 java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you
 intended to start a single-node cluster, you should make sure your
 broadcast_address (or listen_address) is listed as a seed.  Otherwise, you
 need to determine why the seed being contacted has no knowledge of the rest
 of the cluster.  Usually, this can be solved by giving all nodes the same
 seed list.

     at
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)

     at
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)

     at
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)

     at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)

     at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)

     at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)

     at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

 java.lang.RuntimeException: No other nodes seen!  Unable to bootstrap.If you
 intended to start a single-node cluster, you should make sure your
 broadcast_address (or listen_address) is listed as a seed.  Otherwise, you
 need to determine why the seed being contacted has no knowledge of the rest
 of the cluster.  Usually, this can be solved by giving all nodes the same
 seed list.

     at
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:168)

     at
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:150)

     at
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:145)

     at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:565)

     at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:484)

     at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:395)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:234)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356)

     at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

 Exception encountered during startup: No other nodes seen!  Unable to
 bootstrap.If you intended to start a single-node cluster, you should make
 sure your broadcast_address (or listen_address) is listed as a seed.
 Otherwise, you need to determine why the seed being contacted has no
 knowledge of the rest of the cluster.  Usually, this can be solved by giving
 all nodes the same seed list.

 INFO 17:50:35,571 Waiting for messaging service to quiesce

 INFO 17:50:35,571 MessagingService shutting down server thread.





 Now when I put some interger value to intial_token  the Cassandra starts
 working but is not able to connect to the main cluster which became evident
 from the command Nodetool –h ip of this node ring. It displayed itself
 with 100% ownership.





 Kindly help me with it asap.



 Regards

 Rishabh




 

 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’
 available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.


Re: cleanup crashing with java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 8

2012-03-14 Thread Maki Watanabe
Fixed in 1.0.9, 1.1.0
https://issues.apache.org/jira/browse/CASSANDRA-3989

You should better to avoid to use cleanup/scrub/upgradesstable if you
can on 1.0.7 though
it will not corrupt sstables.

2012/3/14 Thomas van Neerijnen t...@bossastudios.com:
 Hi all

 I am trying to run a cleanup on a column family and am getting the following
 error returned after about 15 seconds. A cleanup on a slightly smaller
 column family completes in about 21 minutes. This is on the Apache packaged
 version of Cassandra on Ubuntu 11.10, version 1.0.7.

 ~# nodetool -h localhost cleanup Player PlayerDetail
 Error occured during cleanup
 java.util.concurrent.ExecutionException:
 java.lang.ArrayIndexOutOfBoundsException: 8
     at
 java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
     at
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:203)
     at
 org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:237)
     at
 org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:984)
     at
 org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1635)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
     at
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
     at
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
     at
 com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
     at
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
     at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
     at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
     at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
     at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
     at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
     at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
     at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
     at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
     at sun.rmi.transport.Transport$1.run(Transport.java:159)
     at java.security.AccessController.doPrivileged(Native Method)
     at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
     at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
     at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 8
     at
 org.apache.cassandra.db.compaction.LeveledManifest.add(LeveledManifest.java:298)
     at
 org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:186)
     at
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
     at
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:494)
     at
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:234)
     at
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:1006)
     at
 org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:791)
     at
 org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63)
     at
 org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:241)
     at
 org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:182)
     at
 

Re: Adding nodes to cluster (Cassandra 1.0.8)

2012-03-14 Thread Maki Watanabe
Try:
  % telnet seed_node 7000
on the 3rd node and see what it says.

maki


2012/3/14 Rishabh Agrawal rishabh.agra...@impetus.co.in:
 I am using storage port 7000 (deafult) across all nodes.

 -Original Message-
 From: Maki Watanabe [mailto:watanabe.m...@gmail.com]
 Sent: Wednesday, March 14, 2012 3:42 PM
 To: user@cassandra.apache.org
 Subject: Re: Adding nodes to cluster (Cassandra 1.0.8)

 Do you use same storage_port across 3 nodes?
 Can you access to the storage_port of the seed node from the last (failed) 
 node?

 2012/3/14 Rishabh Agrawal rishabh.agra...@impetus.co.in:
 I was able to successfully join a node to already existing one-node
 cluster (without giving any intital_token), but when I add another
 machine with identical settings (with changes in listen broadcast and
 rpc address). I am unable to join it to the cluster and it gives me 
 following error:



 INFO 17:50:35,555 JOINING: schema complete, ready to bootstrap

 INFO 17:50:35,556 JOINING: getting bootstrap token

 ERROR 17:50:35,557 Exception encountered during startup

 java.lang.RuntimeException: No other nodes seen!  Unable to
 bootstrap.If you intended to start a single-node cluster, you should
 make sure your broadcast_address (or listen_address) is listed as a
 seed.  Otherwise, you need to determine why the seed being contacted
 has no knowledge of the rest of the cluster.  Usually, this can be
 solved by giving all nodes the same seed list.

         at
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.
 java:168)

         at
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.ja
 va:150)

         at
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.j
 ava:145)

         at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageServi
 ce.java:565)

         at
 org.apache.cassandra.service.StorageService.initServer(StorageService.
 java:484)

         at
 org.apache.cassandra.service.StorageService.initServer(StorageService.
 java:395)

         at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCas
 sandraDaemon.java:234)

         at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(Abstract
 CassandraDaemon.java:356)

         at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:
 107)

 java.lang.RuntimeException: No other nodes seen!  Unable to
 bootstrap.If you intended to start a single-node cluster, you should
 make sure your broadcast_address (or listen_address) is listed as a
 seed.  Otherwise, you need to determine why the seed being contacted
 has no knowledge of the rest of the cluster.  Usually, this can be
 solved by giving all nodes the same seed list.

         at
 org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.
 java:168)

         at
 org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.ja
 va:150)

         at
 org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.j
 ava:145)

         at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageServi
 ce.java:565)

         at
 org.apache.cassandra.service.StorageService.initServer(StorageService.
 java:484)

         at
 org.apache.cassandra.service.StorageService.initServer(StorageService.
 java:395)

         at
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCas
 sandraDaemon.java:234)

         at
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(Abstract
 CassandraDaemon.java:356)

         at
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:
 107)

 Exception encountered during startup: No other nodes seen!  Unable to
 bootstrap.If you intended to start a single-node cluster, you should
 make sure your broadcast_address (or listen_address) is listed as a seed.
 Otherwise, you need to determine why the seed being contacted has no
 knowledge of the rest of the cluster.  Usually, this can be solved by
 giving all nodes the same seed list.

 INFO 17:50:35,571 Waiting for messaging service to quiesce

 INFO 17:50:35,571 MessagingService shutting down server thread.





 Now when I put some interger value to intial_token  the Cassandra
 starts working but is not able to connect to the main cluster which
 became evident from the command Nodetool –h ip of this node ring. It
 displayed itself with 100% ownership.





 Kindly help me with it asap.



 Regards

 Rishabh




 

 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22.
 Know more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’
 available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use

Re: avoid log spam with 0 HH rows delivered

2012-03-02 Thread Maki Watanabe
Fixed in 1.0?
https://issues.apache.org/jira/browse/CASSANDRA-3176


2012/3/2 Radim Kolar h...@sendmail.cz:
 Can be something made to remove these empty delivery attempts from log?

 Its just tombstoned row.

 [default@system] list HintsColumnFamily;
 Using default limit of 100
 ---
 RowKey: 00

 1 Row Returned.
 Elapsed time: 234 msec(s).


  INFO [HintedHandoff:1] 2012-03-02 05:44:32,359 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 05:44:32,362 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 05:54:31,641 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 05:54:31,644 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:04:25,253 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:04:25,255 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:14:57,984 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:14:58,013 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:24:15,206 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:24:15,208 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:34:43,108 HintedHandOffManager.java
 (line 296) Started hinted handoff for token: 0 with IP: /64.6.104.18
  INFO [HintedHandoff:1] 2012-03-02 06:34:43,110 HintedHandOffManager.java
 (line 373) Finished hinted handoff of 0 rows to endpoint /64.6.104.18




-- 
w3m


Re: Using cassandra at minimal expenditures

2012-02-29 Thread Maki Watanabe
Depends on your traffic :-)

cassandra-env.sh will try to allocate heap with following formula if
you don't specify MAX_HEAP_SIZE.
1. calculate 1/2 of RAM on your system and cap to 1024MB
2. calculate 1/4 of RAM on your system and cap to 8192MB
3. pick the larger value

So how about to start with the default? You will need to monitor the
heap usage at first.

2012/2/29 Ertio Lew ertio...@gmail.com:
 Thanks, I think I don't need high consistency(as per my app requirements) so
 I might be fine with CL.ONE instead of quorum, so I think  I'm probably
 going to be ok with a 2 node cluster initially..

 Could you guys also recommend some minimum memory to start with ? Of course
 that would depend on my workload as well, but that's why I am asking for the
 min


 On Wed, Feb 29, 2012 at 7:40 AM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

  If you run your service with 2 node and RF=2, your data will be
  replicated but
  your service will not be redundant. ( You can't stop both of nodes )

 If your service doesn't need strong consistency ( allow cassandra returns
 old data after write, and possible write lost ), you can use CL=ONE
 for read and write
 to keep availability.

 maki





-- 
w3m


Re: Few Clarifications on Major Compactions

2012-02-29 Thread Maki Watanabe
DataStax has not recommend to run major compaction now:
  http://www.datastax.com/docs/1.0/operations/tuning
But if you can afford it, major compaction will improve read latency as you see.

Major compaction is expensive, so you will not want to run it during
high traffic hours. And you should not run it more than 1 node in
replicas same time. You should not run repair and major compaction in
same time in same (affected) node, because both of the tasks require
massive io.
With these constraints, as often as you run major compaction, you will
get better read latency.

2012/3/1 Eran Chinthaka Withana eran.chinth...@gmail.com:
 Hi,

 I have two questions on major compactions (the ones user initiate using
 nodetool) and I really appreciate if someone can help.

 1. I've noticed that when I run compactions the read latency improves even
 more than I expected (which is good :) ) The improvement is so tempting that
 I'd like to run this almost every week :). I understand after a compaction
 Cassandra will create one giant SSTable and if something happens to it
 things can go little bit crazy. So from your experience how often should we
 be running compactions? What parameters will influence this frequency?

 2. I'm thinking scheduling compactions using a cron job. But the issue is I
 scheduled repairs also using a cronjob to run once in GC Period (of default
 10 days). Now the obvious question is what will happen if a node is running
 both the compactions AND the repair at the same time? Is this something we
 should avoid at all costs? What will be the implications?

 Thanks,
 Eran Chinthaka Withana




-- 
w3m


Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
If you have 3 nodes of RF=3, you can continue the service on cassandra even if
one of the node will fail ( by hardware or software failure ).
One other benefit is you can shutdown one node for maintenance or patch up
without service interruption.

If you run your service with 2 node and RF=2, your data will be replicated but
your service will not be redundant. ( You can't stop both of nodes )

2012/2/29 Ertio Lew ertio...@gmail.com:
 @Aaron: Are you suggesting 3 nodes (rather than 2) to allow quorum
 operations even at the temporary loss of 1 node from cluster's reach ? I
 understand this but I just another question popped up in my mind, probably
 since I'm not much experienced managing cassandra, so I'm unaware whether it
 may be a usual case that some out of n nodes of my cluster may be
 down/unresponsive or out of cluster reach? (I, actually, considered this
 situation like exceptional circumstance not normal one !?)


 On Tue, Feb 28, 2012 at 2:34 AM, aaron morton aa...@thelastpickle.com
 wrote:

 1. I am wandering what is the minimum recommended cluster size to start
 with?

 IMHO 3
 http://thelastpickle.com/2011/06/13/Down-For-Me/

 A

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 28/02/2012, at 8:17 AM, Ertio Lew wrote:

 Hi

 I'm creating an networking site using cassandra. I am wanting to host this
 application but initially with the lowest possible resources  then slowly
 increasing the resources as per the service's demand  need.

 1. I am wandering what is the minimum recommended cluster size to start
 with?
 Are there any issues if I start with as little as 2 nodes in the cluster?
 In that case I guess would have replication factor of 2.
 (this way I would require at min. 3 vps, 1 as web server  the 2 for
 cassandra cluster, right?)

 2. Anyone using cassandra with such minimal resources in
 production environments ? Any experiences or difficulties encountered ?

 3. In case, you would like to recommend some hosting service suitable for
 me ? or if you would like to suggest some other ways to minimize the
 resources (actually the hosting expenses).






-- 
maki


Re: Using cassandra at minimal expenditures

2012-02-28 Thread Maki Watanabe
 If you run your service with 2 node and RF=2, your data will be replicated but
 your service will not be redundant. ( You can't stop both of nodes )

If your service doesn't need strong consistency ( allow cassandra returns
old data after write, and possible write lost ), you can use CL=ONE
for read and write
to keep availability.

maki


Re: hinted handoff 16 s delay

2012-02-23 Thread Maki Watanabe
I've verified it in the source: deliverHintsToEndpointInternal in
HintedHandOffManager.java
Yes it add random delay before HH delivery.

2012/2/24 Todd Burruss bburr...@expedia.com:
 if I remember correctly, cassandra has a random delay in it so hint
 deliver is staggered and does not overwhelm the just restarted node.

 On 2/23/12 1:46 PM, Hontvári József Levente hontv...@flyordie.com
 wrote:

I have played with a test cluster, stopping cassandra on one node and
updating a row on another. I noticed a delay in delivering hinted
handoffs for which I don't know the rationale. After the node which
originally received the update noticed that the other server is up, it
waited 16 s before it started pushing the hints.

Here is the log:

  INFO [GossipStage:1] 2012-02-23 20:05:32,516 StorageService.java (line
988) Node /192.0.2.1 state jump to normal
  INFO [HintedHandoff:1] 2012-02-23 20:05:49,766
HintedHandOffManager.java (line 296) Started hinted handoff for token: 1
with IP: /192.0.2.1
  INFO [HintedHandoff:1] 2012-02-23 20:05:50,048 ColumnFamilyStore.java
(line 704) Enqueuing flush of
Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live bytes, 2
ops)
  INFO [FlushWriter:31] 2012-02-23 20:05:50,049 Memtable.java (line 246)
Writing Memtable-HintsColumnFamily@1352140719(205/1639 serialized/live
bytes, 2 ops)
  INFO [FlushWriter:31] 2012-02-23 20:05:50,192 Memtable.java (line 283)
Completed flushing
/media/data/cassandra/data/system/HintsColumnFamily-hc-10-Data.db (290
bytes)
  INFO [CompactionExecutor:70] 2012-02-23 20:05:50,193
CompactionTask.java (line 113) Compacting
[SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-h
c-10-Data.db'),
SSTableReader(path='/media/data/cassandra/data/system/HintsColumnFamily-hc
-9-Data.db')]
  INFO [HintedHandoff:1] 2012-02-23 20:05:50,195
HintedHandOffManager.java (line 373) Finished hinted handoff of 1 rows
to endpoint /192.0.2.1





-- 
w3m


Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta1 released

2012-02-23 Thread Maki Watanabe
No, I couldn't download the beta with the first link. The mirrors
returned 404 for that.
After exploring the link and I found the latter uri was worked.
So I don't think we need to wait.

2012/2/22 Sylvain Lebresne sylv...@datastax.com:
 Arf, you'r right sorry.
 I've fixed it (but it could take ~1 to get propagated to all apache mirrors).

 --
 SYlvain

 On Wed, Feb 22, 2012 at 2:46 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 The link is wrong.
 http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
 Should be:
 http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0-beta1/apache-cassandra-1.1.0-beta1-bin.tar.gz


 2012/2/21 Sylvain Lebresne sylv...@datastax.com:
 The Cassandra team is pleased to announce the release of the first beta for
 the future Apache Cassandra 1.1.

 Let me first stress that this is beta software and as such is *not* ready 
 for
 production use.

 The goal of this release is to give a preview of what will become Cassandra
 1.1 and to get wider testing before the final release. All help in testing
 this release would be therefore greatly appreciated and please report any
 problem you may encounter[3,4]. Have a look at the change log[1] and the
 release notes[2] to see where Cassandra 1.1 differs from the previous 
 series.

 Apache Cassandra 1.1.0-beta1[5] is available as usual from the cassandra
 website (http://cassandra.apache.org/download/) and a debian package is
 available using the 11x branch (see
 http://wiki.apache.org/cassandra/DebianPackaging).

 Thank you for your help in testing and have fun with it.

 [1]: http://goo.gl/6iURu (CHANGES.txt)
 [2]: http://goo.gl/hWilW (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 [4]: user@cassandra.apache.org
 [5]: 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta1



 --
 w3m



-- 
w3m


Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta1 released

2012-02-21 Thread Maki Watanabe
The link is wrong.
http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0/apache-cassandra-1.1.0-beta1-bin.tar.gz
Should be:
http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.1.0-beta1/apache-cassandra-1.1.0-beta1-bin.tar.gz


2012/2/21 Sylvain Lebresne sylv...@datastax.com:
 The Cassandra team is pleased to announce the release of the first beta for
 the future Apache Cassandra 1.1.

 Let me first stress that this is beta software and as such is *not* ready for
 production use.

 The goal of this release is to give a preview of what will become Cassandra
 1.1 and to get wider testing before the final release. All help in testing
 this release would be therefore greatly appreciated and please report any
 problem you may encounter[3,4]. Have a look at the change log[1] and the
 release notes[2] to see where Cassandra 1.1 differs from the previous series.

 Apache Cassandra 1.1.0-beta1[5] is available as usual from the cassandra
 website (http://cassandra.apache.org/download/) and a debian package is
 available using the 11x branch (see
 http://wiki.apache.org/cassandra/DebianPackaging).

 Thank you for your help in testing and have fun with it.

 [1]: http://goo.gl/6iURu (CHANGES.txt)
 [2]: http://goo.gl/hWilW (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 [4]: user@cassandra.apache.org
 [5]: 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta1



-- 
w3m


Re: How to find a commit for specific release on git?

2012-02-12 Thread Maki Watanabe
I found I can get the info by git tag.
I should better to learn git more to switch...

2012/2/13 Maki Watanabe watanabe.m...@gmail.com:
 Perfect! Thanks.

 2012/2/13 Dave Brosius dbros...@mebigfatguy.com:
 Based on the tags listed here:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git

 I would look here

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=9d4c0d9a37c7d77a05607b85611c3abdaf75be94



 On 02/12/2012 10:39 PM, Maki Watanabe wrote:

 Hello,

 How to find the right commit SHA for specific cassandra release?
 For example, how to checkout 0.8.9 release on git repository?
 With git log --grep=0.8.9, I found the latest commit mentioned about 0.8.9
 was
 ---
 commit 1f92277c4bf9f5f71303ecc5592e27603bc9dec1
 Author: Sylvain Lebresne slebre...@apache.org
 Date:   Sun Dec 11 00:02:14 2011 +

     prepare for release 0.8.9

     git-svn-id:
 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@1212938
 13f79535-47bb-0310-9956-ffa450edef68
 ---

 However I don't think it's a reliable way.  I've also checked
 CHANGES.txt and NEW.txt but thoese say nothing on commit SHA.

 regards,





 --
 w3m



-- 
w3m


Re: How to find a commit for specific release on git?

2012-02-12 Thread Maki Watanabe
Updated http://wiki.apache.org/cassandra/HowToBuild .

2012/2/13 Maki Watanabe watanabe.m...@gmail.com:
 I found I can get the info by git tag.
 I should better to learn git more to switch...

 2012/2/13 Maki Watanabe watanabe.m...@gmail.com:
 Perfect! Thanks.

 2012/2/13 Dave Brosius dbros...@mebigfatguy.com:
 Based on the tags listed here:
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git

 I would look here

 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=9d4c0d9a37c7d77a05607b85611c3abdaf75be94



 On 02/12/2012 10:39 PM, Maki Watanabe wrote:

 Hello,

 How to find the right commit SHA for specific cassandra release?
 For example, how to checkout 0.8.9 release on git repository?
 With git log --grep=0.8.9, I found the latest commit mentioned about 0.8.9
 was
 ---
 commit 1f92277c4bf9f5f71303ecc5592e27603bc9dec1
 Author: Sylvain Lebresne slebre...@apache.org
 Date:   Sun Dec 11 00:02:14 2011 +

     prepare for release 0.8.9

     git-svn-id:
 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@1212938
 13f79535-47bb-0310-9956-ffa450edef68
 ---

 However I don't think it's a reliable way.  I've also checked
 CHANGES.txt and NEW.txt but thoese say nothing on commit SHA.

 regards,





 --
 w3m



 --
 w3m



-- 
w3m


Proposal to lower the minimum limit for phi_convict_threshold

2012-01-21 Thread Maki Watanabe
Hello,
The current trunk limit the value of phi_convict_threshold from 5 to 16
in DatabaseDescriptor.java.
And phi value is calculated in FailureDetector.java as

  PHI_FACTOR x time_since_last_gossip / mean_heartbeat_interval

And the PHI_FACTOR is a predefined value:
PHI_FACTOR = 1 / Log(10) =~ 0.43

So if you use default phi_convict_threshold = 8, it means FailureDetector
wait for 8 / 0.43 = 18.6 times of missing heartbeats before it detects
node failure.

Even if you set the minimum value 5, it needs 5 / 0.43 = 11.6 heartbeat
miss for failure detection.
I think it is a bit much for cassandra ring which build on reliable network
in single datacenter.
If DatabaseDescriptor.java will accepts smaller phi_convict_threshold,
we will be able to configure cassandra to detect failure rapidly ( or
shoot my foot ).
Anyway, I can't find any reason to limit minimum value of phi_convict_threshold
to 5.

maki


Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Maki Watanabe
Are there any significant difference of number of sstables on each nodes?

2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:
 We are running regular repairs, so I don't think that's the problem.
 And the data dir sizes match approx. the load from the nodetool.
 Thanks for the advise, though.

 Our keys are digits only, and all contain a few zeros at the same
 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
 would generate 'hotspots' for those kind of keys, right?

 On 17.01.2012, at 17:34, Mohit Anchlia wrote:

 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs

 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:

 Hi,


 we're using RP and have each node assigned the same amount of the token
 space. The cluster looks like that:


 Address         Status State   Load            Owns    Token


 205648943402372032879374446248852460236

 1       Up     Normal  310.83 GB       12.50%
  56775407874461455114148055497453867724

 2       Up     Normal  470.24 GB       12.50%
  78043055807020109080608968461939380940

 3       Up     Normal  271.57 GB       12.50%
  99310703739578763047069881426424894156

 4       Up     Normal  282.61 GB       12.50%
  120578351672137417013530794390910407372

 5       Up     Normal  248.76 GB       12.50%
  141845999604696070979991707355395920588

 6       Up     Normal  164.12 GB       12.50%
  163113647537254724946452620319881433804

 7       Up     Normal  76.23 GB        12.50%
  184381295469813378912913533284366947020

 8       Up     Normal  19.79 GB        12.50%
  205648943402372032879374446248852460236


 I was under the impression, the RP would distribute the load more evenly.

 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
 node. Should we just move the nodes so that the load is more even
 distributed, or is there something off that needs to be fixed first?


 Thanks

 Marcel

 hr style=border-color:blue

 pchors GmbH

 brhr style=border-color:blue

 pspecialists in digital and direct marketing solutionsbr

 Haid-und-Neu-Straße 7br

 76131 Karlsruhe, Germanybr

 www.chors.com/p

 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht
 Montabaur, HRB 15029/p

 p style=font-size:9pxThis e-mail is for the intended recipient only and
 may contain confidential or privileged information. If you have received
 this e-mail by mistake, please contact us immediately and completely delete
 it (and any attachments) and do not forward it or inform any other person of
 its contents. If you send us messages by e-mail, we take this as your
 authorization to correspond with you by e-mail. E-mail transmission cannot
 be guaranteed to be secure or error-free as information could be
 intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
 or contain viruses. Neither chors GmbH nor the sender accept liability for
 any errors or omissions in the content of this message which arise as a
 result of its e-mail transmission. Please note that all e-mail
 communications to and from chors GmbH may be monitored./p





-- 
w3m


Re: AW: How to control location of data?

2012-01-10 Thread Maki Watanabe
Small correction:
The token range for each node is (Previous_token, My_Token].
( means exclusive and ] means inclusive.
So N1 is responsible from X+1 to A in following case.

maki

2012/1/11 Roland Gude roland.g...@yoochoose.com:


 Each node in the cluster is assigned a token (can be done automatically –
 but usually should not)

 The token of a node is the start token of the partition it is responsible
 for (and the token of the next node is the end token of the current tokens
 partition)



 Assume you have the following nodes/tokens (which are usually numbers but
 for the example I will use letters)



 N1/A

 N2/D

 N3/M

 N4/X



 This means that N1 is responsible (primary) for [A-D)

    N2 for [D-M)

        N3 for [M-X)

 And N4 for [X-A)



 If you have a replication factor of 1 data will go on the nodes like this:



 B - N1

 E-N2

 X-N4



 And so on

 If you have a higher replication factor, the placement strategy decides
 which node will take replicas of which partition (becoming secondary node
 for that partition)

 Simple strategy will just put the replica on the next node in the ring

 So same example as above but RF of 2 and simple strategy:



 B- N1 and N2

 E - N2 and N3

 X - N4 and N1



 Other strategies can factor in things like “put  data in another datacenter”
 or “put data in another rack” or such things.



 Even though the terms primary and secondary imply some means of quality or
 consistency, this is not the case. If a node is responsible for a piece of
 data, it will store it.





 But placement of the replicas is usually only relevant for availability
 reasons (i.e. disaster recovery etc.)

 Actual location should mean nothing to most applications as you can ask any
 node for the data you want and it will provide it to you (fetching it from
 the responsible nodes).

 This should be sufficient in almost all cases.



 So in the above example again, you can ask N3 “what data is available” and
 it will tell you: B, E and X, or you could ask it “give me X” and it will
 fetch it from N4 or N1 or both of them depending on consistency
 configuration and return the data to you.





 So actually if you use Cassandra – for the application the actual storage
 location of the data should not matter. It will be available anywhere in the
 cluster if it is stored on any reachable node.



 Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com]
 Gesendet: Dienstag, 10. Januar 2012 15:06
 An: user@cassandra.apache.org
 Betreff: Re: AW: How to control location of data?



 Hi!



 Thank you for your last reply. I'm still wondering if I got you right...



 ...

 A partitioner decides into which partition a piece of data belongs

 Does your statement imply that the partitioner does not take any decisions
 at all on the (physical) storage location? Or put another way: What do you
 mean with partition?



 To quote http://wiki.apache.org/cassandra/ArchitectureInternals:
 ... AbstractReplicationStrategy controls what nodes get secondary,
 tertiary, etc. replicas of each key range. Primary replica is always
 determined by the token ring (...)



 ...

 You can select different placement strategies and partitioners for different
 keyspaces, thereby choosing known data to be stored on known hosts.

 This is however discouraged for various reasons – i.e.  you need a lot of
 knowledge about your data to keep the cluster balanced. What is your usecase
 for this requirement? there is probably a more suitable solution.



 What we want is to partition the cluster with respect to key spaces.

 That is we want to establish an association between nodes and key spaces so
 that a node of the cluster holds data from a key space if and only if that
 node is a *member* of that key space.



 To our knowledge Cassandra has no built-in way to specify such a
 membership-relation. Therefore we thought of implementing our own replica
 placement strategy until we started to assume that the partitioner had to be
 replaced, too, to accomplish the task.



 Do you have any ideas?





 Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com]
 Gesendet: Dienstag, 10. Januar 2012 09:53
 An: user@cassandra.apache.org
 Betreff: How to control location of data?



 Hi!



 We're evaluating Cassandra for our storage needs. One of the key benefits we
 see is the online replication of the data, that is an easy way to share data
 across nodes. But we have the need to precisely control on what node group
 specific parts of a key space (columns/column families) are stored on. Now
 we're having trouble understanding the documentation. Could anyone help us
 with to find some answers to our questions?

 ·  What does the term replica mean: If a key is stored on exactly three
 nodes in a cluster, is it correct then to say that there are three replicas
 of that key or are there just two replicas (copies) and one original?

 ·  What is the relation between the Cassandra concepts Partitioner and
 Replica 

Re: Integration Error between Cassandra and Eclipse

2012-01-09 Thread Maki Watanabe
Binary package doesn't include source code, which you need to run
cassandra in Eclipse.
If you want to run cassandra in Eclipse, you need to download Source package OR
git repository, and then integrate it with Eclipse.
If you just want to run cassandra, you don't need to checkout source.

To clone cassandra git repository:
  % git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
then the git will make cassandra directory in the current working directory.

To checkout cassandra-1.0.6 release:
  % cd cassandra
  % git checkout cassandra-1.0.6

Before loading the tree as Eclipse project:
  % and build
  % and generate-eclipse-files

good luck!
maki

2012/1/8 dir dir sikerasa...@gmail.com:
 I am also beginner user in cassandra. Honestly I wonder, if we can download
 binary installer from  http://cassandra.apache.org/download/  but why we
 have to
 check out from: svn http://svn.apache.org/repos/asf/cassandra/trunk/
 cassandra-trunk ??

 What is the deference between them?  Can we perform cassandra installation
 process
 by using binary installer in http://cassandra.apache.org/download/ without
 have to
 check out from svn?? Can we integrate cassandra and eclipse IDE without have
 to
 perform ant generate-eclipse-files ??

 If cassandra project have already moved from svn to git, would you tell me
 how to
 check out cassandra project from git, please??

 Thank you.


 On Fri, Jan 6, 2012 at 11:03 AM, Yuki Morishita mor.y...@gmail.com wrote:

 Also note that Cassandra project switched to git from svn.
 See Source control section of http://cassandra.apache.org/download/ .

 Regards,

 Yuki

 --
 Yuki Morishita

 On Thursday, January 5, 2012 at 7:59 PM, Maki Watanabe wrote:

 Sorry, ignore my reply.
 I had same result with import. ( 1 error in unit test code  many warnings
 )

 2012/1/6 Maki Watanabe watanabe.m...@gmail.com:

 How about to use File-Import... rather than File-New Java Project?

 After extracting the source, ant build, and ant generate-eclipse-files:
 1. File-Import...
 2. Choose Existing Project into workspace...
 3. Choose your source directory as root directory and then push Finish


 2012/1/6 bobby saputra zaibat...@gmail.com:

 Hi There,

 I am a beginner user in Cassandra. I hear from many people said
 Cassandra is
 a powerful database software which is used by Facebook, Twitter, Digg,
 etc.
 So I feel interesting to study more about Cassandra.

 When I performed integration process between Cassandra with Eclipse
 IDE (in
 this case I use Java as computer language), I get trouble and have many
 problem.
 I have already followed all instruction from
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
 tutorial was not working properly. I got a lot of errors and warnings
 while
 creating Java project in eclipse.

 These are the errors and warnings:

 Error(X) (1 item):
 Description Resource  Location
 The method rangeSet(RangeT...) in the type Range is not applicable for
 the
 arguments (Range[]) RangeTest.java line 178

 Warnings(!) (100 of 2916 items):
 Description Resource Location
 AbstractType is a raw type. References to generic type AbstractTypeT
 should be parameterized AbstractColumnContainer.java line 72
 (and many same warnings)

 These are what i've done:
 1. I checked out cassandra-trunk from given link using SlikSvn as svn
 client.
 2. I moved to cassandra-trunk folder, and build with ant using ant build
 command.
 3. I generate eclipse files with ant using ant generate-eclipse-files
 command.
 4. I create new java project on eclipse, insert project name with
 cassandra-trunk, browse the location into cassandra-trunk folder.

 Do I perform any mistakes? Or there are something wrong with the tutorial
 in
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

 I have already googling to find the solution to solve this problem, but
 unfortunately
 I found no results. Would you want to help me by giving me a guide how to
 solve
 this problem? Please

 Thank you very much for your help.

 Best Regards,
 Wira Saputra




 --
 w3m




 --
 w3m






-- 
w3m


Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Maki Watanabe
How about to use File-Import... rather than File-New Java Project?

After extracting the source, ant build, and ant generate-eclipse-files:
1. File-Import...
2. Choose Existing Project into workspace...
3. Choose your source directory as root directory and then push Finish


2012/1/6 bobby saputra zaibat...@gmail.com:
 Hi There,

 I am a beginner user in Cassandra. I hear from many people said Cassandra is
 a powerful database software which is used by Facebook, Twitter, Digg, etc.
 So I feel interesting to study more about Cassandra.

 When I performed integration process between Cassandra with Eclipse IDE (in
 this case I use Java as computer language), I get trouble and have many
 problem.
 I have already followed all instruction from
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
 tutorial was not working properly. I got a lot of errors and warnings while
 creating Java project in eclipse.

 These are the errors and warnings:

 Error(X) (1 item):
 Description Resource  Location
 The method rangeSet(RangeT...) in the type Range is not applicable for the
 arguments (Range[]) RangeTest.java line 178

 Warnings(!) (100 of 2916 items):
 Description Resource Location
 AbstractType is a raw type. References to generic type AbstractTypeT
 should be parameterized AbstractColumnContainer.java line 72
 (and many same warnings)

 These are what i've done:
 1. I checked out cassandra-trunk from given link using SlikSvn as svn
 client.
 2. I moved to cassandra-trunk folder, and build with ant using ant build
 command.
 3. I generate eclipse files with ant using ant generate-eclipse-files
 command.
 4. I create new java project on eclipse, insert project name with
 cassandra-trunk, browse the location into cassandra-trunk folder.

 Do I perform any mistakes? Or there are something wrong with the tutorial in
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

 I have already googling to find the solution to solve this problem, but
 unfortunately
 I found no results. Would you want to help me by giving me a guide how to
 solve
 this problem? Please

 Thank you very much for your help.

 Best Regards,
 Wira Saputra



-- 
w3m


Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Maki Watanabe
Sorry, ignore my reply.
I had same result with import. ( 1 error in unit test code  many warnings )

2012/1/6 Maki Watanabe watanabe.m...@gmail.com:
 How about to use File-Import... rather than File-New Java Project?

 After extracting the source, ant build, and ant generate-eclipse-files:
 1. File-Import...
 2. Choose Existing Project into workspace...
 3. Choose your source directory as root directory and then push Finish


 2012/1/6 bobby saputra zaibat...@gmail.com:
 Hi There,

 I am a beginner user in Cassandra. I hear from many people said Cassandra is
 a powerful database software which is used by Facebook, Twitter, Digg, etc.
 So I feel interesting to study more about Cassandra.

 When I performed integration process between Cassandra with Eclipse IDE (in
 this case I use Java as computer language), I get trouble and have many
 problem.
 I have already followed all instruction from
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
 tutorial was not working properly. I got a lot of errors and warnings while
 creating Java project in eclipse.

 These are the errors and warnings:

 Error(X) (1 item):
 Description Resource  Location
 The method rangeSet(RangeT...) in the type Range is not applicable for the
 arguments (Range[]) RangeTest.java line 178

 Warnings(!) (100 of 2916 items):
 Description Resource Location
 AbstractType is a raw type. References to generic type AbstractTypeT
 should be parameterized AbstractColumnContainer.java line 72
 (and many same warnings)

 These are what i've done:
 1. I checked out cassandra-trunk from given link using SlikSvn as svn
 client.
 2. I moved to cassandra-trunk folder, and build with ant using ant build
 command.
 3. I generate eclipse files with ant using ant generate-eclipse-files
 command.
 4. I create new java project on eclipse, insert project name with
 cassandra-trunk, browse the location into cassandra-trunk folder.

 Do I perform any mistakes? Or there are something wrong with the tutorial in
 http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

 I have already googling to find the solution to solve this problem, but
 unfortunately
 I found no results. Would you want to help me by giving me a guide how to
 solve
 this problem? Please

 Thank you very much for your help.

 Best Regards,
 Wira Saputra



 --
 w3m



-- 
w3m


Re: Moving experiences ?

2011-11-09 Thread Maki Watanabe
I missed the news.
How the nodetool move work in recent version (0.8.x or later?)
Just stream appropriate range of data between nodes?

2011/11/10 Peter Schuller peter.schul...@infidyne.com:
 Keep in mind that if you're using an older version of Cassandra a move
 is actually a decommission followed by bootstrap - so neighboring
 nodes will temporarily own a larger part of the ring while a node is
 being moved.

-- 
w3m


cassandra-1.0.0 schema-sample.txt problems

2011-10-24 Thread Maki Watanabe
Hello,  I'm writing CassandraCli wiki page draft (sorry to late,
aaron), and found 2 problems in schema-sample.txt shipped with 1.0.0
release.
cassandra-cli prints following warning and error on loading the schema.

WARNING: [{}] strategy_options syntax is deprecated, please use {}
a0f41dc0-feba-11e0--915a0242929f
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: Keyspace1
Line 3 = No enum const class
org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.MEMTABLE_FLUSH_AFTER

In the schema it uses deprecated syntax on storage_options:
strategy_options=[{replication_factor:1}]
which should be:
strategy_options={replication_factor:1}

And it also uses obsolete column family argument
memtable_flush_after which prevents schema loading.

Those should be fixed in next release.

-- 
w3m


Re: Changing the replication factor of a keyspace

2011-10-24 Thread Maki Watanabe
Konstantin,

You can modify the RF of the keyspace with following command in cassandra-cli:

  update keyspace KEYSPACE_NAME with storage_options = {replication_factor:N};

When you decrease RF, you need to run nodetool clean on each node.
When you increase RF, you need to run nodetool repair on each node.

Please refer to:
http://wiki.apache.org/cassandra/Operations#Replication
for more information.


2011/10/25 Konstantin Naryshkin konstant...@a-bb.net:
 We are setting up my application around Cassandra .8.0 (will move to
 Cassandra 1.0 in the near future). In production the application will
 be running in a two (or more) node cluster with RF 2. In development,
 we do not always have 2 machines to test on, so we may have to run a
 Cassandra cluster consisting of only a single node. In this case, we
 have to reduce the RF of the column family to 1. We may need to switch
 machines from one schema to the other and back again. This is not
 something that will occur often and not in live production. I noticed
 that there is an alter keyspace command in the cassandra-cli and
 PyCassa, which should allow me to change the strategy options. So, it
 is possible to switch a Cassandra instance from RF 1 to RF 2 and back
 again? Will we have something horrible happen if we try to do it on a
 node with data? Is there anything in particular that we have to do
 before/after the transition to make the transition safe?

 I assume that we have to first alter the keyspace and then
 decommission one of the nodes and this will prevent us from losing
 half of our data.




-- 
w3m


schema-sample.txt issues in Cassandra-1.0.0

2011-10-24 Thread Maki Watanabe
Hello,  I'm writing CassandraCli wiki page draft (sorry to late,
aaron), and found 2 problems in schema-sample.txt shipped with 1.0.0
release.
cassandra-cli prints following warning and error on loading the schema.

WARNING: [{}] strategy_options syntax is deprecated, please use {}
a0f41dc0-feba-11e0--915a0242929f
Waiting for schema agreement...
... schemas agree across the cluster
Authenticated to keyspace: Keyspace1
Line 3 = No enum const class
org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.MEMTABLE_FLUSH_AFTER

In the schema it uses deprecated syntax on storage_options:
   strategy_options=[{replication_factor:1}]
which should be:
   strategy_options={replication_factor:1}

And it also uses obsolete column family argument
memtable_flush_after which prevents schema loading.

Those should be fixed in next release.

-- 
maki


Re: Volunteers needed - Wiki

2011-10-10 Thread Maki Watanabe
Hello aaron,
I raise my hand too.
If you have to-do list about the wiki, please let us know.

maki


2011/10/10 aaron morton aa...@thelastpickle.com:
 Hi there,
 The dev's have been very busy and Cassandra 1.0 is just around the corner
 and full of new features. To celebrate I'm trying to give the wiki some
 loving to make things a little more welcoming for new users.
 To keep things manageable I'd like to focus on completeness and correctness
 for now, and worry about being super awesome later. For example the nodetool
 page is incomplete http://wiki.apache.org/cassandra/NodeTool , we do not
 have anything about CQL and config page is from
 0.7 http://wiki.apache.org/cassandra/StorageConfiguration
 As a starting point I've created a draft home
 page http://wiki.apache.org/cassandra/FrontPage_draft_aaron/ . I also hope
 to use this as a planning tool where we can mark off what's in progress or
 has been completed.
 The guidelines I think we should follow are:
 * ensure coverage of 1.0, a best effort for 0.8 and leave any content from
 previous versions.
 * where appropriate include examples from CQL and RPC as both are still
 supported.
 If you would like to contribute to this effort please let me know via the
 email list. It's a great way to contribute to the project and learn how
 Cassandra works, and I'll do my best to help with any questions you may
 have. Or if you have something you've already written that you feel may be
 of use let me know, and we'll see about linking to it.
 Thanks.
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com




-- 
w3m


Re: Cassandra yaml configuration

2011-09-22 Thread Maki Watanabe
The book is a bit out dated now.
You should better to use cassandra-cli to define your application schema.
Please refer to conf/schema-sample.txt and help in cassandra-cli.

% cassandra-cli
[default@unknown] help;
[default@unknown] help create keyspace;
[default@unknown] help create column family;

You can load schema defined in text file by cassandra-cli:
% cassandra-cli -h host -p port --file your-schema-definition.txt

maki



2011/9/22 Sajith Kariyawasam saj...@gmail.com:
 Hi all,
 Im refering to the book authored by Eben Hewitt, named Cassandra The
 Definitive Guide

 There, in the Sample Application chapter (Chapter 4), example 4.1, a sample
 schema is given in a file named cassandra.yaml. (I have mentioned it
 below)
 I'm using Cassandra 0.8.6 version.
 My question is that, whether the given format of the file is correct? if so,
 where i have to mention in the cassandra configuration to load this yaml
 file?
 because the format of the sample schema  comes with Cassandra
 (conf/schema-sample.txt) is bit different from the sample given in the book.

 Schema definition in cassandra.yaml
 
 keyspaces:
 - name: Hotelier
 replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy
 replication_factor: 1
 column_families:
 - name: Hotel
 compare_with: UTF8Type
 - name: HotelByCity
 compare_with: UTF8Type
 - name: Guest
 compare_with: BytesType
 - name: Reservation
 compare_with: TimeUUIDType
 - name: PointOfInterest
 column_type: Super
 compare_with: UTF8Type
 compare_subcolumns_with: UTF8Type
 - name: Room
 compare_with: BytesType
 compare_subcolumns_with: BytesType
 - name: RoomAvailability
 column_type: Super
 compare_with: BytesType
 compare_subcolumns_with: BytesType

 --
 Best Regards
 Sajith




-- 
w3m


Re: Cassandra yaml configuration

2011-09-22 Thread Maki Watanabe
You have a chance to write it by your own. I'll buy one :-)

maki

2011/9/22 Sajith Kariyawasam saj...@gmail.com:
 Thanks Maki.
 If you came across with any other book supporting latest Cassandara
 versions, pls let me know.

 On Thu, Sep 22, 2011 at 12:03 PM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

 The book is a bit out dated now.
 You should better to use cassandra-cli to define your application schema.
 Please refer to conf/schema-sample.txt and help in cassandra-cli.

 % cassandra-cli
 [default@unknown] help;
 [default@unknown] help create keyspace;
 [default@unknown] help create column family;

 You can load schema defined in text file by cassandra-cli:
 % cassandra-cli -h host -p port --file your-schema-definition.txt

 maki



 2011/9/22 Sajith Kariyawasam saj...@gmail.com:
  Hi all,
  Im refering to the book authored by Eben Hewitt, named Cassandra The
  Definitive Guide
 
  There, in the Sample Application chapter (Chapter 4), example 4.1, a
  sample
  schema is given in a file named cassandra.yaml. (I have mentioned it
  below)
  I'm using Cassandra 0.8.6 version.
  My question is that, whether the given format of the file is correct? if
  so,
  where i have to mention in the cassandra configuration to load this yaml
  file?
  because the format of the sample schema  comes with Cassandra
  (conf/schema-sample.txt) is bit different from the sample given in the
  book.
 
  Schema definition in cassandra.yaml
  
  keyspaces:
  - name: Hotelier
  replica_placement_strategy:
  org.apache.cassandra.locator.RackUnawareStrategy
  replication_factor: 1
  column_families:
  - name: Hotel
  compare_with: UTF8Type
  - name: HotelByCity
  compare_with: UTF8Type
  - name: Guest
  compare_with: BytesType
  - name: Reservation
  compare_with: TimeUUIDType
  - name: PointOfInterest
  column_type: Super
  compare_with: UTF8Type
  compare_subcolumns_with: UTF8Type
  - name: Room
  compare_with: BytesType
  compare_subcolumns_with: BytesType
  - name: RoomAvailability
  column_type: Super
  compare_with: BytesType
  compare_subcolumns_with: BytesType
 
  --
  Best Regards
  Sajith
 



 --
 w3m



 --
 Best Regards
 Sajith




-- 
w3m


Re: Recovering from a multi-node cluster failure caused by OOM on repairs

2011-07-27 Thread Maki Watanabe
This kind of information is very helpful.
Thank you to share your experience.

maki


2011/7/27 Teijo Holzer thol...@wetafx.co.nz:
 Hi,

 I thought I share the following with this mailing list as a number of other
 users seem to have had similar problems.

 We have the following set-up:

 OS: CentOS 5.5
 RAM: 16GB
 JVM heap size: 8GB (also tested with 14GB)
 Cassandra version: 0.7.6-2 (also tested with 0.7.7)
 Oracle JDK version: 1.6.0_26
 Number of nodes: 5
 Load per node: ~40GB
 Replication factor: 3
 Number of requests/day: 2.5 Million (95% inserts)
 Total net insert data/day: 1GB
 Default TTL for most of the data: 10 days

 This set-up has been operating successfully for a few months, however
 recently
 we started seeing multi-node failures, usually triggered by a repair, but
 occasionally also under normal operation. A repair on node 3,4 and 5 would
 always cause the cluster as whole to fail, whereas node 1  2 completed
 their
 repair cycles successfully.

 These failures would usually result in 2 or 3 nodes becoming unresponsive
 and
 dropping out of the cluster, resulting in client failure rates to spike up
 to
 ~10%. We normally operate with a failure rate of 0.1%.

 The relevant log entries showed a complete heap memory exhaustion within 1
 minute (see log lines below where we experimented with a larger heap size of
 14GB). Also of interest was a number of huge SliceQueryFilter collections
 running concurrently on the nodes in question (see log lines below).

 The way we ended recovering from this situation was as follows. Remember
 these
 steps were taken to get an unstable cluster back under control, so you might
 want to revert some of the changes once the cluster is stable again.

 Set disk_access_mode: standard in cassandra.yaml
 This allowed us to prevent the JVM blowing out the hard limit of 8GB via
 large
 mmaps. Heap size was set to 8GB (RAM/2). That meant the JVM was never using
 more than 8GB total. mlockall didn't seem to make a difference for our
 particular problem.

 Turn off all row  key caches via cassandra-cli, e.g.
 update column family Example with rows_cached=0;
 update column family Example with keys_cached=0;
 We were seeing compacted row maximum sizes of ~800MB from cfstats, that's
 why
 we turned them all off. Again, we saw a significant drop in the actual
 memory
 used from the available maximum of 8GB. Obviously, this will affect reads,
 but
 as 95% of our requests are inserts, it didn't matter so much for us.

 Bootstrap problematic node:
 Kill Cassandra
 Change auto_bootstrap: true in cassandra.yaml, remove own IP address from
 list of seeds (important)
 Delete all data directories (i.e. commit-log, data, saved-caches)
 Start Cassandra
 Wait for bootstrap to finish (see log  nodetool)
 Change auto_bootstrap: false
 (Run repair)

 The first bootstrap completed very quickly, so we decided to bootstrap every
 node in the cluster (not just the problematic ones). This resulted in some
 data
 loss. The next time we will follow the bootstrap by a repair before
 bootstrapping  repairing the next node to minimize data loss.

 After this procedure, the cluster was operating normally again.

 We now run a continuous rolling repair, followed by a (major) compaction and
 a
 manual garbage collection. As the repairs a required anyway, we decided to
 run
 them all the time in a continuous fashion. Therefore, potential problems can
 be identified earlier.

 The major compaction followed by a manual GC allows us to keep the disk
 usage low on each node. The manual GC is necessary as the unused files on
 disk are only really deleted when the reference is garbage collected inside
 the JVM (a restart would achieve the same).

 We also collected some statistics in regards to the duration of some of the
 operations:

 cleanup/compact: ~1 min/GB
 repair: ~2-3 min/GB
 bootstrap: ~1 min/GB

 This means that if you have a node with 60GB of data, it will take ~1hr to
 compact and ~2-3hrs to repair. Therefore, it is advisable to keep the data
 per
 node below ~120GB. We achieve this by using an aggressive TTL on most of our
 writes.

 Cheers,

   Teijo

 Here are the relevant log entries showing the OOM conditions:


 [2011-07-21 11:12:11,059] INFO: GC for ParNew: 1141 ms, 509843976 reclaimed
 leaving 1469443752 used; max is 14675869696 (ScheduledTasks:1
 GCInspector.java:128)
 [2011-07-21 11:12:15,226] INFO: GC for ParNew: 1149 ms, 564409392 reclaimed
 leaving 2247228920 used; max is 14675869696 (ScheduledTasks:1
 GCInspector.java:128)
 ...
 [2011-07-21 11:12:55,062] INFO: GC for ParNew: 1110 ms, 564365792 reclaimed
 leaving 12901974704 used; max is 14675869696 (ScheduledTasks:1
 GCInspector.java:128)

 [2011-07-21 10:57:23,548] DEBUG: collecting 4354206 of 2147483647:
 940657e5b3b0d759eb4a14a7228ae365:false:41@1311102443362542 (ReadStage:27
 SliceQueryFilter.java:123)




-- 
w3m


Re: Interpreting the output of cfhistograms

2011-07-25 Thread Maki Watanabe
Offset represent different units for each columns.
On SSTables columns, you can see following histgrams:

20  4291637
24  28680590
29  3876198 

It means your 4291637 read operations required 20 SStables to read,
28680590 ops required 24, so on.
In Write/Read latency columns, Offset represents micro seconds.
3711340 read operations completed
in 2 ms.
Most of your row is between 925 ~ 1331 bytes.
Most of your row has 925 ~ 1331 columns.

maki


2011/7/26 Aishwarya Venkataraman cyberai...@gmail.com

 Hello,

 I need help understanding the output of cfhistograms option provided as part 
 of nodetool.
 When I run cfhistograms on one node of a 3 node cluster, I get the following :

 Offset SSTables Write Latency Read Latency Row Size Column Count
 1 0 0 458457 0 0
 2 0 0 3711340 0 0
 3 0 0 12159992 0 0
 4 0 0 14350840 0 0
 5 0 0 7866204 0 0
 6 0 0 3427977 0 0
 7 0 0 2407296 0 0
 8 0 0 2516075 0 0
 10 0 0 5392567 0 0
 12 0 0 4239979 0 0
 14 0 0 2415529 0 0
 17 0 0 1406153 0 0
 20 4291637 0 380625 0 0
 24 28680590 0 191431 0 0
 29 3876198 0 141841 0 0
 35 0 0 57855 0 0
 42 0 0 15403 0 0
 50 0 0 4291 0 0
 60 0 0 2118 0 0
 72 0 0 1096 0 0
 86 0 0 662 0 0
 179 0 0 115 173 173
 215 0 0 70 35 35
 258 0 0 48 0 0
 310 0 0 41 404 404
 372 0 0 37 0 0
 446 0 0 22 975 975
 770 0 0 12 3668 3668
 924 0 0 4 10142 10142
 1331 0 0 4 256983543 256983543
 What do these numbers mean ? How can I interpret the above data ? I found 
 some explanation here 
 http://narendrasharma.blogspot.com/2011/04/cassandra-07x-understanding-output-of.html,
  but I did no understand this completely.

 Thanks,
 Aishwarya


Re: Question about compaction

2011-07-14 Thread Maki Watanabe
These 0 byte files with -Compacted suffix indicate that the
associated sstables can be removed.
In current version, Cassandra delete compacted sstables at Full GC and
on startup.

maki


2011/7/14 Sameer Farooqui cassandral...@gmail.com:
 Running Cassandra 0.8.1. Ran major compaction via:

 sudo /home/ubuntu/brisk/resources/cassandra/bin/nodetool -h localhost
 compact 

 From what I'd read about Cassandra, I thought that after compaction all of
 the different SSTables on disk for a Column Family would be merged into one
 new file.

 However, there are now a bunch of 0-sized Compacted files and a bunch of
 Data files. Any ideas about why there are still so many files left?

 Also, is a minor compaction the same thing as a read-only compaction in 0.7?


 ubuntu@domU-12-31-39-0E-x-x:/raiddrive/data/DemoKS$ ls -l
 total 270527136
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-5670-Compacted
 -rw-r--r-- 1 root root  89457447799 2011-07-10 00:26 DemoCF-g-5670-Data.db
 -rw-r--r-- 1 root root   193456 2011-07-10 00:26 DemoCF-g-5670-Filter.db
 -rw-r--r-- 1 root root  2081159 2011-07-10 00:26 DemoCF-g-5670-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-10 00:26
 DemoCF-g-5670-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-5686-Compacted
 -rw-r--r-- 1 root root    920521489 2011-07-09 22:03 DemoCF-g-5686-Data.db
 -rw-r--r-- 1 root root    11776 2011-07-09 22:03 DemoCF-g-5686-Filter.db
 -rw-r--r-- 1 root root   126725 2011-07-09 22:03 DemoCF-g-5686-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-09 22:03
 DemoCF-g-5686-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-5781-Compacted
 -rw-r--r-- 1 root root    223970446 2011-07-09 22:38 DemoCF-g-5781-Data.db
 -rw-r--r-- 1 root root 7216 2011-07-09 22:38 DemoCF-g-5781-Filter.db
 -rw-r--r-- 1 root root    32750 2011-07-09 22:38 DemoCF-g-5781-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-09 22:38
 DemoCF-g-5781-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-5874-Compacted
 -rw-r--r-- 1 root root    156284248 2011-07-09 23:20 DemoCF-g-5874-Data.db
 -rw-r--r-- 1 root root 5056 2011-07-09 23:20 DemoCF-g-5874-Filter.db
 -rw-r--r-- 1 root root    10400 2011-07-09 23:20 DemoCF-g-5874-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-09 23:20
 DemoCF-g-5874-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-6938-Compacted
 -rw-r--r-- 1 root root  22947541446 2011-07-10 11:43 DemoCF-g-6938-Data.db
 -rw-r--r-- 1 root root    49936 2011-07-10 11:43 DemoCF-g-6938-Filter.db
 -rw-r--r-- 1 root root   563550 2011-07-10 11:43 DemoCF-g-6938-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-10 11:43
 DemoCF-g-6938-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-6996-Compacted
 -rw-r--r-- 1 root root    224253930 2011-07-10 11:28 DemoCF-g-6996-Data.db
 -rw-r--r-- 1 root root 7216 2011-07-10 11:27 DemoCF-g-6996-Filter.db
 -rw-r--r-- 1 root root    26250 2011-07-10 11:28 DemoCF-g-6996-Index.db
 -rw-r--r-- 1 root root 4276 2011-07-10 11:28
 DemoCF-g-6996-Statistics.db
 -rw-r--r-- 1 root root    0 2011-07-13 03:07 DemoCF-g-8324-Compacted





-- 
w3m


Re: Re: Re: AntiEntropy?

2011-07-13 Thread Maki Watanabe
I'll write a FAQ for this topic :-)

maki

2011/7/13 Peter Schuller peter.schul...@infidyne.com:
 To be sure that I didn't misunderstand (English is not my mother tongue) here
 is what the entire repair paragraph says ...

 Read it, I maintain my position - the book is wrong or at the very
 least strongly misleading.

 You *definitely* need to run nodetool repair periodically for the
 reasons documented in the link I sent before, unless you have specific
 reasons not to and know what you're doing.

 --
 / Peter Schuller




-- 
w3m


Re: Replicating to all nodes

2011-07-13 Thread Maki Watanabe
Consistency and Availability are in trade-off each other.
If you use RF=7 + CL=ONE, your read/write will success if you have one
node alive during replicate data to 7 nodes.
Of course you will have a chance to read old data in this case.
If you need strong consistency, you must use CL=QUORUM.

maki


2011/7/14 Kyle Gibson kyle.gib...@frozenonline.com:
 Thanks for the reply Peter.

 The goal is to configure a cluster in which reads and writes can
 complete successfully even if only 1 node is online. For this to work,
 each node would need the entire dataset. Your example of a 3 node ring
 with RF=3 would satisfy this requirement. However, if two nodes are
 offline, CL.QUORUM would not work, I would need to use CL.ONE. If all
 3 nodes are online, CL.ONE is undershooting, I would want to use
 CL.QUORUM (or maybe CL.ALL). Or does CL.ONE actually function this
 way, somewhat?

 A complication occurs when you want to add another node. Now there's a
 4 node ring, but only 3 replicas, so each node isn't guaranteed to
 have all of the data, so the cluster can't completely function when
 N-1 nodes are offline. So this is why I would like to have the RF
 scale relative to the size of the cluster. Am I mistaken?

 Thanks!

 On Wed, Jul 13, 2011 at 6:41 PM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 Read and write operations should succeed even if only 1 node is online.

 When a read is performed, it is performed against all active nodes.

 Using QUORUM is the closest thing you get for reads without modifying
 Cassandra. You can't make it wait for all nodes that happen to be up.

 When a write is performed, it is performed against all active nodes,
 inactive/offline nodes are updated when they come back online.

 Writes always go to all nodes that are up, but if you want to wait for
 them before returning OK to the client than no - except CL.ALL
 (which means you don't survive one being down) and CL.QUORUM (which
 means you don't wait for all if all are up).

 I don't believe it does. Currently the replication factor is hard
 coded based on key space, not a function of the number of nodes in the
 cluster. You could say, if N = 7, configure replication factor = 7,
 but then if only 6 nodes are online, writes would fail. Is this
 correct?

 No. Reads/write fail according to the consistency level. The RF +
 consistency level tells how many nodes must be up and successfully
 service the request in order for the operation to succeed. RF just
 tells you the number of total nodes int he replicate set for a key;
 whether an operation fails is up to the consistency level.

 I would ask: Why are you trying to do this? It really seems you're
 trying to do the wrong thing. Why would you ever want to replicate
 to all? If you want 3 copies in total, then do RF=3 and keep a 3 node
 ring. If you need more capacity, you add nodes and retain RF. If you
 need more redundancy, you have to increase RF. Those are two very
 different axis along which to scale. I cannot think of any reason why
 you would want to tie RF to the total number of nodes.

 What is the goal you're trying to achieve?

 --
 / Peter Schuller (@scode on twitter)





-- 
w3m


Re: Limit what nodes are writeable

2011-07-11 Thread Maki Watanabe
Cassandra has authentication interface, but doesn't have authorization.
So you need to implement authorization in your application layer.

maki


2011/7/11 David McNelis dmcne...@agentisenergy.com:
 I've been looking in the documentation and haven't found anything about
 this...  but is there support for making a node  read-only?
 For example, you have a cluster set up in two different data centers / racks
 / whatever, with your replication strategy set up so that the data is
 redundant between the two places.  In one of the places all of the incoming
 data will be  processed and inserted into your cluster.  In the other data
 center you plan to allow people to run analytics, but you want to restrict
 the permissions so that the people running analytics can connect to
 Cassandra in whatever way makes the most sense for them, but you don't want
 those people to be able to edit/update data.
 Is it currently possible to configure your cluster in this manner?  Or would
 it only be possible through a third-party solution like wrapping one of the
 access libraries in a way that does not support write operations.

 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 o: 630.359.6395
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.





-- 
w3m


Re: Decorator Algorithm

2011-06-23 Thread Maki Watanabe
A little addendum

Key := Your data to identify a row
Token := Index on the ring calculated from Key. The calculation is
defined in replication strategy.

You can lookup responsible nodes (endpoints) for a specific key with
JMX getNaturalEndpoints interface.

maki


2011/6/24 aaron morton aa...@thelastpickle.com:
 Various places in the code call IPartitioner.decorateKey() which returns a 
 DecoratedKeyT which contains both the original key and the TokenT

 The RandomPartitioner md5 to hash the key ByteBuffer and create a BigInteger. 
 OPP converts the key into utf8 encoded String.

 Using the token to find which endpoints contain replicas is done by the 
 AbstractReplicationStrategy.calculateNaturalEndpoints() implementations.

 Does that help?

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23 Jun 2011, at 19:58, Jonathan Colby wrote:

 Hi -

 I'd like to understand more how the token is hashed with the key to 
 determine on which node the data is stored - called decorating in cassandra 
 speak.

 Can anyone share any documentation on this or describe this more in detail?  
  Yes, I could look at the code, but I was hoping to be able to read more 
 about how it works first.

 thanks.





-- 
w3m


Re: insufficient space to compact even the two smallest files, aborting

2011-06-10 Thread Maki Watanabe
But decreasing min_compaction_threashold will affect on minor
compaction frequency, won't it?

maki


2011/6/10 Terje Marthinussen tmarthinus...@gmail.com:
 bug in the 0.8.0 release version.
 Cassandra splits the sstables depending on size and tries to find (by
 default) at least 4 files of similar size.
 If it cannot find 4 files of similar size, it logs that message in 0.8.0.
 You can try to reduce the minimum required  files for compaction and it will
 work.
 Terje
 2011/6/10 Héctor Izquierdo Seliva izquie...@strands.com

 Hi, I'm running a test node with 0.8, and everytime I try to do a major
 compaction on one of the column families this message pops up. I have
 plenty of space on disk for it and the sum of all the sstables is
 smaller than the free capacity. Is there any way to force the
 compaction?






-- 
w3m


Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Maki Watanabe
You can find useful information in:
http://www.datastax.com/docs/0.8/operations/scheduled_tasks

sstables are immutable. Once it written to disk, it won't be updated.
When you take snapshot, the tool makes hard links to sstable files.
After certain time, you will have some times of memtable flushs, so
your sstable files will be merged, and obsolete sstable files will be
removed. But snapshot set will remains on your disk, for backup.

Assume you have sstable: A B C D E F,
When you take snapshot, you will have hard links A B C D E F under
snapshots subdirectory.
These hard links = files will not removed even after you run
major/minor compaction.

maki

2011/6/7 AJ a...@dude.podzone.net:
 On 6/6/2011 11:25 PM, Benjamin Coverston wrote:

 Currently, my data dir has about 16 sets.  I thought that compaction
 (with nodetool) would clean-up these files, but it doesn't.  Neither does
 cleanup or repair.

 You're not even talking about snapshots using nodetool snapshot yet. Also
 nodetool compact does compact all of the live files, however the compacted
 SSTables will not be cleaned up until a garbage collection is triggered, or
 a capacity threshold is met.

 Ok, so after a compaction, Cass is still not done with the older sets of .db
 files and I should let Cass delete them?  But, I thought one of the main
 purposes of compaction was to reclaim disk storage resources.  I'm only
 playing around with a small data set so I can't tell how fast the data
 grows.  I'm trying to plan my storage requirements.  Is each newly-generated
 set as large in size as the previous?

 The reason I ask is it seems a snapshot is...

 Q1: Should the files with the lower index #'s (under the data/{keyspace}
 directory) be manually deleted?  Or, do ALL of the files in this directory
 need to be backed-up?

 Do not ever delete files in your data directory if you care about data on
 that replica, unless they are from a column family that no longer exists on
 that server. There may be some duplicate data in the files, but if the files
 are in the data directory, as a general rule, they are there because they
 contain some set of data that is in none of the other SSTables.

 ... It seems a snapshot is implemented, unsurprisingly,  as just a link to
 the latest (highest indexed) set; not the previous sets.  So, obviously,
 only the latest *.db files will get backed-up.  Therefore, the previous sets
 must be worthless.





-- 
w3m


Re: Direct control over where data is stored?

2011-06-05 Thread Maki Watanabe
getNaturalEndpoints tells you which key will be stored on which nodes,
but we can't force cassandra to store given key to specific nodes.

maki

2011/6/6 mcasandra mohitanch...@gmail.com:

 Khanh Nguyen wrote:

 Is there a way to tell where a piece of data is stored in a cluster?
 For example, can I tell if LastNameColumn['A'] is stored at node 1 in
 the ring?


 I have not used it but you can see getNaturalEndpoints in jmx. It will tell
 you which nodes are responsible for a given row *key*

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Direct-control-over-where-data-is-stored-tp6441048p6443571.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Direct control over where data is stored?

2011-06-04 Thread Maki Watanabe
You may be able to do it with the Order Preserving Partitioner with
making key to node mapping before storing data, or you may need your
custom Partitioner. Please note that you are responsible to distribute
load between nodes in this case.
From application design perspective, it is not clear for me why you
need to store user A and his friends into same box

maki


2011/6/5 Khanh Nguyen nguyen.h.kh...@gmail.com:
 Hi everyone,

 Is it possible to have direct control over where objects are stored in
 Cassandra? For example, I have a Cassandra cluster of 4 machines and 4
 objects A, B, C, D; I want to store A at machine 1, B at machine 2, C
 at machine 3 and D at machine 4. My guess is that I need to intervene
 they way Cassandra hashes an object into the keyspace? If so, how
 complicated the task will be?

 I'm new to the list and Cassandra. The reason I am asking is that my
 current project is related to social locality of data: if A and B are
 Facebook friends, I want to store their data as close as possible,
 preferably in the same machine in a cluster.

 Thank you.

 Regards,

 -k




-- 
w3m


Re: starting with PHPcassa

2011-06-01 Thread Maki Watanabe
Amrita,
I recommend you to take a bit more time to investigate, think, and
struggle on the problems by yourself before posting questions.
It will increase your technical skill, and help you much when you will
face on really serious problem in future.

For the current problem, if I am you, I'll make a small php script
without cassandra thing and verify it works fine at first.

maki

2011/6/1 Amrita Jayakumar amritajayakuma...@gmail.com:
 yeh i tried restarting... but i get to see the following

  /etc/init.d/apache2 restart
  * Restarting web server apache2
    apache2: Could not reliably determine the server's fully qualified domain
 name, using 127.0.0.1 for ServerName
  ... waiting apache2: Could not reliably determine the server's fully
 qualified domain name, using 127.0.0.1 for ServerName

 Thanks and Regards,
 Amrita

 On Wed, Jun 1, 2011 at 11:50 AM, Marcus Bointon mar...@synchromedia.co.uk
 wrote:

 On 1 Jun 2011, at 08:12, Amrita Jayakumar wrote:

 I have deployed this code into a php file phcass.php in the ubuntu machine
 in the location /var/www/vishnu/. But nothing happens when i execute the
 file through the browser. Neither can i find the data inserted in the column
 family 'Users'.
 can anyone help???

 Did you restart apache after changing that PHP config?
 Are you certain it's running at all? instead of that echo, try
 var_dump($column_family);
 Marcus



Re: sync commitlog in batch mode lose data

2011-05-31 Thread Maki Watanabe
How much replication factor did you set for the keyspace?
If the RF is 2, your data should be replicated to both of nodes. If
the RF is 1, you will lose the half of data when the node A is down.

maki


2011/5/31 Preston Chang zhangyf2...@gmail.com:
 Hi,
 I have a cluster with two nodes (node A and node B) and make a test as
 follows:
 1). set commitlog sync in batch mode and the sync batch window in 0 ms
 2). one client wrote random keys in infinite loop with consistency level
 QUORUM and record the keys in file after the insert() method return normally
 3). unplug one server (node A) power cord
 4). restart the server and cassandra service
 5). read the key list generated in step 2) with consistency level ONE
 I thought the result of test is all the key in list can be read normally,
 but actually there are some NotFound keys.
 My question is why there are NotFound keys. In my opinion server would not
 ack the client before finishing syncing the commitlog if I set commitlog
 sync in batch mode and the sync batch window in 0 ms. So if the insert()
 method return normally it means the mutation had been written in commitlog
 and the commitlog had been synced to the disk. Am I right?
 My Cassandra version is 0.7.3.
 Thanks for your help very much.
 --
 by Preston Chang





-- 
w3m


Re: starting with PHPcassa

2011-05-31 Thread Maki Watanabe
http://thobbs.github.com/phpcassa/installation.html

They also have mailing list and irc channel.
http://thobbs.github.com/phpcassa/

maki


2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
 I have log files of the format id key value. I want to load these
 files into cassandra using PHPcassa.
 I have installed Cassandra 7. Can anyone please guide me with the exact
 procedures as in how to install PHPcassa and take things forward?

 Thanks and Regards,
 Amrita




-- 
w3m


Re: No space left on device problem when starting Cassandra

2011-05-31 Thread Maki Watanabe
at org.apache.log4j.Category.info(Category.java:666)

It seems that your cassandra can't write log by device full.
Check where your cassanra log is written to. The log file path is
configured at log4j.appender.R.File property
in conf/log4j-server.properties.

maki

2011/6/1 Bryce Godfrey bryce.godf...@azaleos.com:
 Hi there, I’m a bit new to Linux and Cassandra so I’m hoping someone can
 help me with this.



 I’ve been evaluating Cassandra for the last few days and I’m now having a
 problem starting up the service.   I receive this error below and I’m unsure
 on where I’m out of space at, and how to free up more.



 azadmin@cassandra-01: $ sudo /usr/tmp/apache-cassandra-0.7.6-2/bin/cassandra
 -f

 INFO 18:21:46,830 Logging initialized

 log4j:ERROR Failed to flush writer,

 java.io.IOException: No space left on device

     at java.io.FileOutputStream.writeBytes(Native Method)

     at java.io.FileOutputStream.write(FileOutputStream.java:297)

     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)

     at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:290)

     at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:294)

     at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)

     at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)

     at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59)

     at
 org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324)

     at
 org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)

     at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)

     at
 org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

     at
 org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

     at org.apache.log4j.Category.callAppenders(Category.java:206)

     at org.apache.log4j.Category.forcedLog(Category.java:391)

     at org.apache.log4j.Category.info(Category.java:666)

     at
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:79)

 INFO 18:21:46,841 Heap size: 16818110464/16819159040

 #

 # A fatal error has been detected by the Java Runtime Environment:

 #

 #  SIGBUS (0x7) at pc=0x7f35b493f571, pid=1234, tid=139869156091648

 #

 # JRE version: 6.0_22-b22

 # Java VM: OpenJDK 64-Bit Server VM (20.0-b11 mixed mode linux-amd64
 compressed oops)

 # Derivative: IcedTea6 1.10.1

 # Distribution: Ubuntu Natty (development branch), package
 6b22-1.10.1-0ubuntu1

 # Problematic frame:

 # C  [libffi.so.5+0x2571]  ffi_prep_java_raw_closure+0x541

 #

 # An error report file with more information is saved as:

 # /media/commitlogs/hs_err_pid1234.log

 #

 # If you would like to submit a bug report, please include

 # instructions how to reproduce the bug and visit:

 #   https://bugs.launchpad.net/ubuntu/+source/openjdk-6/

 # The crash happened outside the Java Virtual Machine in native code.

 # See problematic frame for where to report the bug.

 #



 I seem to have enough space, except in the /dev/mapper/Cassandra—01-root,
 and I’m unsure of that anyway:

 azadmin@cassandra-01:/$ df -h

 Filesystem    Size  Used Avail Use% Mounted on

 /dev/mapper/cassandra--01-root

   1.2G  1.2G 0 100% /

 none   16G  236K   16G   1% /dev

 none   16G 0   16G   0% /dev/shm

 none   16G   36K   16G   1% /var/run

 none   16G 0   16G   0% /var/lock

 /dev/sdb1  33G  176M   33G   1% /media/commitlogs

 /dev/sdc1  66G  180M   66G   1% /media/data

 /dev/sda1 228M   23M  193M  11% /boot



 Thanks,

 ~Bryce


Re: problem in starting the cassandra single node setup

2011-05-30 Thread Maki Watanabe
Did you read Jonathan's reply?
If you can't understand what README says, please let us know where you
are stack on.

maki


2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
 can anyone help me how to start with cassandra??? starting from the
 basics???

 Thanks and Regards,
 Amrita

 On Mon, May 30, 2011 at 6:41 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Here's what README says:

  * bin/cassandra -f

 Running the startup script with the -f argument will cause Cassandra to
 remain in the foreground and log to standard out.

 Now let's try to read and write some data using the command line client.

  * bin/cassandra-cli --host localhost

 The command line client is interactive so if everything worked you should
 be sitting in front of a prompt...

  Connected to: Test Cluster on localhost/9160
  Welcome to cassandra CLI.

  Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
  [default@unknown]


 On Mon, May 30, 2011 at 4:09 AM, Amrita Jayakumar
 amritajayakuma...@gmail.com wrote:
  Marcus,
      Can u please tell me how to do that??? I was just following
  the
  instructions in the README file that came with the package.
 
  Thanks and Regards,
  Amrita
 
  On Mon, May 30, 2011 at 2:36 PM, Marcus Bointon
  mar...@synchromedia.co.uk
  wrote:
 
  On 30 May 2011, at 10:59, Amrita Jayakumar wrote:
 
   I am new to cassandra. I am trying to start the Cassandra single node
   setup using the command
   bin/cassandra -f. But there is no response from the prompt.. this is
   what it shows
 
  I'm new to this too, but I think you're looking at the wrong thing.
  cassandra -f is not an interactive mode, it just runs the server in the
  foreground and shows you log output, which is useful for debugging
  server
  config. The output you have looks like it's working fine. You need to
  start
  another terminal and connect to it from a cassandra client.
 
  Marcus
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
w3m


Re: problem in starting the cassandra single node setup

2011-05-30 Thread Maki Watanabe
You can just start
bin/cassandra -f
.

Readme.txt  says:
 Now that we're ready, let's start it up!

   * bin/cassandra -f

 Running the startup script with the -f argument will cause Cassandra to
 remain in the foreground and log to standard out.

So, you need another terminal to run cassandra-cli. Open another
terminal window and then:

Readme.txt says:
 Now let's try to read and write some data using the command line client.

  * bin/cassandra-cli --host localhost

 The command line client is interactive so if everything worked you should
 be sitting in front of a prompt...

 Connected to: Test Cluster on localhost/9160
 Welcome to cassandra CLI.

You will see the cassandra-cli pormpt on your terminal like:

  Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
  [default@unknown]


maki



2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
 Hi Maki,

 I am trying to install apache-cassandra-0.7.6-2.
 Here are the steps i followed as per the readme file.

    tar -zxvf apache-cassandra-0.7.6-2.tar.gz
    cd apache-cassandra-0.7.6-2
    sudo mkdir -p /var/log/cassandra
    sudo chown -R `whoami` /var/log/cassandra
    sudo mkdir -p /var/lib/cassandra
    sudo chown -R `whoami` /var/lib/cassandra

 Now is there any configuration settings to be made in
 apache-cassandra-0.7.6-2/conf/ before i fire

    bin/cassandra -f ???

 If so then which all are the that i should change???

 Thanks and Regards,
 Amrita


 On Tue, May 31, 2011 at 10:00 AM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

 Did you read Jonathan's reply?
 If you can't understand what README says, please let us know where you
 are stack on.

 maki


 2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
  can anyone help me how to start with cassandra??? starting from the
  basics???
 
  Thanks and Regards,
  Amrita
 
  On Mon, May 30, 2011 at 6:41 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  Here's what README says:
 
   * bin/cassandra -f
 
  Running the startup script with the -f argument will cause Cassandra to
  remain in the foreground and log to standard out.
 
  Now let's try to read and write some data using the command line
  client.
 
   * bin/cassandra-cli --host localhost
 
  The command line client is interactive so if everything worked you
  should
  be sitting in front of a prompt...
 
   Connected to: Test Cluster on localhost/9160
   Welcome to cassandra CLI.
 
   Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
   [default@unknown]
 
 
  On Mon, May 30, 2011 at 4:09 AM, Amrita Jayakumar
  amritajayakuma...@gmail.com wrote:
   Marcus,
       Can u please tell me how to do that??? I was just
   following
   the
   instructions in the README file that came with the package.
  
   Thanks and Regards,
   Amrita
  
   On Mon, May 30, 2011 at 2:36 PM, Marcus Bointon
   mar...@synchromedia.co.uk
   wrote:
  
   On 30 May 2011, at 10:59, Amrita Jayakumar wrote:
  
I am new to cassandra. I am trying to start the Cassandra single
node
setup using the command
bin/cassandra -f. But there is no response from the prompt.. this
is
what it shows
  
   I'm new to this too, but I think you're looking at the wrong thing.
   cassandra -f is not an interactive mode, it just runs the server in
   the
   foreground and shows you log output, which is useful for debugging
   server
   config. The output you have looks like it's working fine. You need
   to
   start
   another terminal and connect to it from a cassandra client.
  
   Marcus
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 w3m





-- 
w3m


Re: problem in starting the cassandra single node setup

2011-05-30 Thread Maki Watanabe
2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
 Thank You so much Maki :) Its working now... I dont know what went wrong
 yesterday...

 BTW bin/cassandra-cli --host localhost is to read and write data using the
 command line client right???

Yes.

 So what if i need to load data from a file into cassandra???

 i.e.i have a log file with so many lines... lines may be in the format
 sessionid key value

 so how can i load data from this log file into cassandra?

 Thanks and Regards,
 Amrita

You can write/read on cassandra from misc. programming language.
- Java
- Python
- Ruby
etc.
Refer to http://wiki.apache.org/cassandra/ClientOptions for more details.
Of course you need to design and define your database schema before
writing code.
You should better to learn about Cassandra concept and architecture including:
  - Keyspace
  - Column Family
  - Replication Factor
  - Consistency Level
  - Distributed Delete
  - Compaction
before going farther.

maki




 On Tue, May 31, 2011 at 10:42 AM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

 You can just start
 bin/cassandra -f
 .

 Readme.txt  says:
  Now that we're ready, let's start it up!
 
    * bin/cassandra -f
 
  Running the startup script with the -f argument will cause Cassandra to
  remain in the foreground and log to standard out.

 So, you need another terminal to run cassandra-cli. Open another
 terminal window and then:

 Readme.txt says:
  Now let's try to read and write some data using the command line client.
 
   * bin/cassandra-cli --host localhost
 
  The command line client is interactive so if everything worked you
  should
  be sitting in front of a prompt...
 
  Connected to: Test Cluster on localhost/9160
  Welcome to cassandra CLI.

 You will see the cassandra-cli pormpt on your terminal like:

  Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
  [default@unknown]


 maki



 2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
  Hi Maki,
 
  I am trying to install apache-cassandra-0.7.6-2.
  Here are the steps i followed as per the readme file.
 
     tar -zxvf apache-cassandra-0.7.6-2.tar.gz
     cd apache-cassandra-0.7.6-2
     sudo mkdir -p /var/log/cassandra
     sudo chown -R `whoami` /var/log/cassandra
     sudo mkdir -p /var/lib/cassandra
     sudo chown -R `whoami` /var/lib/cassandra
 
  Now is there any configuration settings to be made in
  apache-cassandra-0.7.6-2/conf/ before i fire
 
     bin/cassandra -f ???
 
  If so then which all are the that i should change???
 
  Thanks and Regards,
  Amrita
 
 
  On Tue, May 31, 2011 at 10:00 AM, Maki Watanabe
  watanabe.m...@gmail.com
  wrote:
 
  Did you read Jonathan's reply?
  If you can't understand what README says, please let us know where you
  are stack on.
 
  maki
 
 
  2011/5/31 Amrita Jayakumar amritajayakuma...@gmail.com:
   can anyone help me how to start with cassandra??? starting from the
   basics???
  
   Thanks and Regards,
   Amrita
  
   On Mon, May 30, 2011 at 6:41 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   Here's what README says:
  
    * bin/cassandra -f
  
   Running the startup script with the -f argument will cause Cassandra
   to
   remain in the foreground and log to standard out.
  
   Now let's try to read and write some data using the command line
   client.
  
    * bin/cassandra-cli --host localhost
  
   The command line client is interactive so if everything worked you
   should
   be sitting in front of a prompt...
  
    Connected to: Test Cluster on localhost/9160
    Welcome to cassandra CLI.
  
    Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
    [default@unknown]
  
  
   On Mon, May 30, 2011 at 4:09 AM, Amrita Jayakumar
   amritajayakuma...@gmail.com wrote:
Marcus,
    Can u please tell me how to do that??? I was just
following
the
instructions in the README file that came with the package.
   
Thanks and Regards,
Amrita
   
On Mon, May 30, 2011 at 2:36 PM, Marcus Bointon
mar...@synchromedia.co.uk
wrote:
   
On 30 May 2011, at 10:59, Amrita Jayakumar wrote:
   
 I am new to cassandra. I am trying to start the Cassandra
 single
 node
 setup using the command
 bin/cassandra -f. But there is no response from the prompt..
 this
 is
 what it shows
   
I'm new to this too, but I think you're looking at the wrong
thing.
cassandra -f is not an interactive mode, it just runs the server
in
the
foreground and shows you log output, which is useful for
debugging
server
config. The output you have looks like it's working fine. You
need
to
start
another terminal and connect to it from a cassandra client.
   
Marcus
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of DataStax, the source for professional Cassandra
   support
   http://www.datastax.com
  
  
 
 
 
  --
  w3m
 
 



 --
 w3m





-- 
w3m


Re: Consistency Level throughput

2011-05-26 Thread Maki Watanabe
I assume your question is on that how CL will affects on the throughput.

In theory, I believe CL will not affect on the throughput of the
Cassandra system.
In any CL, the coordinator node needs to submit write/read requests
along the RF specified for the KS.
But for the latency, CL will affects on.  Stronger CL will cause larger latency.
In the real world, it will depends on system configuration,
application design, data, and all of the environment.
However if you found shorter latency with stronger CL, there must be
some reason to explain the behavior.

maki

2011/5/27 Ryu Kobayashi beter@gmail.com:
 Hi,

 Question of Consistency Level throughput.

 Environment:
 6 nodes. Replication factor is 3.

 ONE and QUORUM it was not for the throughput difference.
 ALL just extremely slow.
 Not ONE had only half the throughput.
 ONE, TWO and THREE were similar results.

 Is there any difference between 2 nodes and 3 nodes?

 --
  beter@gmail.com
 twitter:@ryu_kobayashi




-- 
w3m


Re: Inconsistent data issues when running nodetool move.

2011-05-14 Thread Maki Watanabe
It depends on what you really use which CL for your operations.
Your RF is 2, so if you read/write with CL=ALL, your r/w will be
always consistent. If your read is CL=ONE, you have chance to read old
data anytime, decommission is not matter. CL=QUORUM on RF=2 is
semantically identical with CL=ALL.

maki

2011/5/13 Ryan Hadley r...@sgizmo.com:
 Hi,

 I'm running Cassandra (0.7.4) on a 4 node ring.  It was a 3 node ring, but we 
 ended up expanding it to 4... So then I followed the many suggestions to 
 rebalance the ring.  I found a script that suggested I use:

 # ~/nodes_calc.py
 How many nodes are in your cluster? 4
 node 0: 0
 node 1: 42535295865117307932921825928971026432
 node 2: 85070591730234615865843651857942052864
 node 3: 127605887595351923798765477786913079296

 So I started to migrate each node to those tokens.

 I have my replication factor set to 2, so I guess I was expecting to be able 
 to continue to use this ring without issues.  But it seems that the node 
 still accepts writes while it's decommissioning?  I say this because if I 
 interrupt the decommission by stopping Cassandra and starting it again, it 
 appears to run through several commit logs.  And as soon as it's through with 
 those commit logs, I no longer get consistency issues.

 The issue I'm seeing is that writes to this ring will succeed, but it's 
 possible for a subsequent read to return an older object.  For several 
 minutes even.

 I'm not sure if I did something wrong... learning as I go here and this list 
 archive has been very useful.  But, is there anyway I can rebalance the node 
 and get better consistency?

 Thanks,
 Ryan


JMX access sample by jython (Was: How to invoke getNaturalEndpoints with jconsole?)

2011-05-14 Thread Maki Watanabe
Just FYI for beginners like me: I've also write it with jython.
Getting attributes are more easier than invoke Operations.  I feel
jython will be a good option to create custom monitoring/management
tools.
#!/usr/bin/jython
#
# *** This is JYTHON script. You can't run it on CPython. ***

import sys;

import java.net.InetAddress;
import javax.management.MBeanServerConnection;
import javax.management.ObjectName;
import javax.management.openmbean.CompositeDataSupport;
import javax.management.openmbean.CompositeType;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;

if len(sys.argv)  4:
print usage: getNaturalEndpoings.py host port keyspace key
exit(1)

(pname, host, port, keyspace, key) = sys.argv
url_spec = service:jmx:rmi:///jndi/rmi://%s:%s/jmxrmi % (host, port)


# Create JMXConnection with the URL spec and extract MBServerServerConnection
jmxurl = javax.management.remote.JMXServiceURL(url_spec)
jmxc = javax.management.remote.JMXConnectorFactory.connect(jmxurl)
mbsc = jmxc.getMBeanServerConnection();

mbname =org.apache.cassandra.db:type=StorageService

# It's a bit tricky:
# To invoke JMX method, we need to make argument array and signature array
# for the method.
# Array of signature represent type of each argument.
# For getNaturalEndpoints, 1st arg is java.lang.String, and
# 2nd args is byte array. [B is the signature for Java byte array.

opArgs = [keyspace, java.lang.String.getBytes(key, UTF-8)]
opSignature = [java.lang.String, [B]

# Invoke MBean Operation
nodes = mbsc.invoke(javax.management.ObjectName(mbname), getNaturalEndpoints, opArgs, opSignature)

for node in nodes:
print java.net.InetAddress.getHostAddress(node)

# Close JMX connection
jmxc.close()


Re: Inconsistent data issues when running nodetool move.

2011-05-14 Thread Maki Watanabe
If you do CL=ONE write + CL=ALL read, then it seems OK...
You should better to stay in this thread until some of experts would
answer your question.

2011/5/14 Ryan Hadley r...@sgizmo.com:
 Thanks Maki,

 That makes sense with my symptoms...  I was doing a CL=ONE for write and a 
 CL=ALL for read, expecting that to be sufficient.

 I will try both set to ALL and see if I get better consistency.

 -Ryan

 On May 14, 2011, at 4:41 AM, Maki Watanabe wrote:

 It depends on what you really use which CL for your operations.
 Your RF is 2, so if you read/write with CL=ALL, your r/w will be
 always consistent. If your read is CL=ONE, you have chance to read old
 data anytime, decommission is not matter. CL=QUORUM on RF=2 is
 semantically identical with CL=ALL.

 maki

 2011/5/13 Ryan Hadley r...@sgizmo.com:
 Hi,

 I'm running Cassandra (0.7.4) on a 4 node ring.  It was a 3 node ring, but 
 we ended up expanding it to 4... So then I followed the many suggestions to 
 rebalance the ring.  I found a script that suggested I use:

 # ~/nodes_calc.py
 How many nodes are in your cluster? 4
 node 0: 0
 node 1: 42535295865117307932921825928971026432
 node 2: 85070591730234615865843651857942052864
 node 3: 127605887595351923798765477786913079296

 So I started to migrate each node to those tokens.

 I have my replication factor set to 2, so I guess I was expecting to be 
 able to continue to use this ring without issues.  But it seems that the 
 node still accepts writes while it's decommissioning?  I say this because 
 if I interrupt the decommission by stopping Cassandra and starting it 
 again, it appears to run through several commit logs.  And as soon as it's 
 through with those commit logs, I no longer get consistency issues.

 The issue I'm seeing is that writes to this ring will succeed, but it's 
 possible for a subsequent read to return an older object.  For several 
 minutes even.

 I'm not sure if I did something wrong... learning as I go here and this 
 list archive has been very useful.  But, is there anyway I can rebalance 
 the node and get better consistency?

 Thanks,
 Ryan





-- 
w3m


Re: Hinted Handoff

2011-05-13 Thread Maki Watanabe
HH will be stored into one of live replica node. It is just a hint,
rather than data to be replicated.

maki

2011/5/12 Anurag Gujral anurag.guj...@gmail.com:
 Hi All,
            I have two questions:
 a) Is there  a way to turn on and off hinted handoff per keyspace rather
 than for multiple keyspaces.
 b)It looks like cassandra stores hinted handoff data in one row.Is it true?
 .Does having one row for hinted handoff implies
 if nodes are down for longer period of time not all the data which needs to
 be replicated will be on the node which is alive.
 Thanks
 Anurag


Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-13 Thread Maki Watanabe
I wrote a small JMX client to invoke getNaturalEndpoints.
It works fine at my test environment, but throws NPE for keyspace we
will use for our application (both 0.7.5).
Does anyone know quick resolution of that before I setting up
cassandra on eclipse to inspect what happens :)

thanks

Exception in thread main javax.management.RuntimeMBeanException:
java.lang.NullPointerException
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:877)
[snip]
at 
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993)
at my.test.getNaturalEndpoints.main(getNaturalEndpoints.java:32)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:266)
at org.apache.cassandra.db.Table.init(Table.java:212)
at org.apache.cassandra.db.Table.open(Table.java:106)
at 
org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1497)
[snip]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


2011/5/11 Jonathan Ellis jbel...@gmail.com:
 Thanks!

 On Wed, May 11, 2011 at 10:20 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Add a new faq:
 http://wiki.apache.org/cassandra/FAQ#jconsole_array_arg

 2011/5/11 Nick Bailey n...@datastax.com:
 Yes.

 On Wed, May 11, 2011 at 8:25 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Thanks,

 So my options are:
 1. Write a thrift client code to call describe_ring with hashed key
 or
 2. Write a JMX client code to call getNaturalEndpoints

 right?

 2011/5/11 Nick Bailey n...@datastax.com:
 As far as I know you can not call getNaturalEndpoints from jconsole
 because it takes a byte array as a parameter and jconsole doesn't
 provide a way for inputting a byte array. You might be able to use the
 thrift call 'describe_ring' to do what you want though. You will have
 to manually hash your key to see what range it falls in however.

 On Wed, May 11, 2011 at 6:14 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Hello,
 It's a question on jconsole rather than cassandra, how can I invoke
 getNaturalEndpoints with jconsole?

 org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints

 I want to run this method to find nodes which are responsible to store
 data for specific row key.
 I can find this method on jconsole but I can't invoke it because the
 button is gray out and doesn't accept
 click.

 Thanks,
 --
 maki





 --
 w3m





 --
 w3m




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
w3m
Exception in thread main javax.management.RuntimeMBeanException: 
java.lang.NullPointerException
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:877)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:890)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:859)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
at 
sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255

Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-13 Thread Maki Watanabe
I did not drop the keyspace, but your comment lead me to resolution.
I found cassandra-cli is not case sensitive on keyspace. I used
keyspace name FooBar on cassandra-cli, but it was Foobar in correct.
cassandra-cli didn't complain on my mistake, but the JMX interface is
less tolerance.
If I use correct name, the tool runs fine.

Thanks.

2011/5/13 Alex Araujo cassandra-us...@alex.otherinbox.com:
 On 5/13/11 10:08 AM, Maki Watanabe wrote:

 I wrote a small JMX client to invoke getNaturalEndpoints.
 It works fine at my test environment, but throws NPE for keyspace we
 will use for our application (both 0.7.5).
 Does anyone know quick resolution of that before I setting up
 cassandra on eclipse to inspect what happens :)

 thanks

 Exception in thread main javax.management.RuntimeMBeanException:
 java.lang.NullPointerException
        at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:877)
 [snip]
        at
 javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993)
        at my.test.getNaturalEndpoints.main(getNaturalEndpoints.java:32)
 Caused by: java.lang.NullPointerException
        at
 org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:266)
        at org.apache.cassandra.db.Table.init(Table.java:212)
        at org.apache.cassandra.db.Table.open(Table.java:106)
        at
 org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:1497)
 [snip]
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)

 Did you by chance see this after dropping the keyspace?  I believe I've seen
 this as well.  If so (and if I'm interpreting the stack trace and code
 correctly) it might be related to queuing an op for a keyspace that's been
 dropped without checking if its metadata is null rather than your code.




-- 
w3m


How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Hello,
It's a question on jconsole rather than cassandra, how can I invoke
getNaturalEndpoints with jconsole?

org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints

I want to run this method to find nodes which are responsible to store
data for specific row key.
I can find this method on jconsole but I can't invoke it because the
button is gray out and doesn't accept
click.

Thanks,
-- 
maki


Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Thanks,

So my options are:
1. Write a thrift client code to call describe_ring with hashed key
or
2. Write a JMX client code to call getNaturalEndpoints

right?

2011/5/11 Nick Bailey n...@datastax.com:
 As far as I know you can not call getNaturalEndpoints from jconsole
 because it takes a byte array as a parameter and jconsole doesn't
 provide a way for inputting a byte array. You might be able to use the
 thrift call 'describe_ring' to do what you want though. You will have
 to manually hash your key to see what range it falls in however.

 On Wed, May 11, 2011 at 6:14 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Hello,
 It's a question on jconsole rather than cassandra, how can I invoke
 getNaturalEndpoints with jconsole?

 org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints

 I want to run this method to find nodes which are responsible to store
 data for specific row key.
 I can find this method on jconsole but I can't invoke it because the
 button is gray out and doesn't accept
 click.

 Thanks,
 --
 maki





-- 
w3m


Re: How to invoke getNaturalEndpoints with jconsole?

2011-05-11 Thread Maki Watanabe
Add a new faq:
http://wiki.apache.org/cassandra/FAQ#jconsole_array_arg

2011/5/11 Nick Bailey n...@datastax.com:
 Yes.

 On Wed, May 11, 2011 at 8:25 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Thanks,

 So my options are:
 1. Write a thrift client code to call describe_ring with hashed key
 or
 2. Write a JMX client code to call getNaturalEndpoints

 right?

 2011/5/11 Nick Bailey n...@datastax.com:
 As far as I know you can not call getNaturalEndpoints from jconsole
 because it takes a byte array as a parameter and jconsole doesn't
 provide a way for inputting a byte array. You might be able to use the
 thrift call 'describe_ring' to do what you want though. You will have
 to manually hash your key to see what range it falls in however.

 On Wed, May 11, 2011 at 6:14 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Hello,
 It's a question on jconsole rather than cassandra, how can I invoke
 getNaturalEndpoints with jconsole?

 org.apache.cassandra.service.StorageService.Operations.getNaturalEndpoints

 I want to run this method to find nodes which are responsible to store
 data for specific row key.
 I can find this method on jconsole but I can't invoke it because the
 button is gray out and doesn't accept
 click.

 Thanks,
 --
 maki





 --
 w3m





-- 
w3m


Re: seed faq

2011-04-24 Thread Maki Watanabe
Done. Thank you for your comment.

maki

2011/4/24 aaron morton aa...@thelastpickle.com:
 May also want to add that seed nodes do not auto bootstrap.

 Thanks
 Aaron


Re: multiple nodes sharing the same IP

2011-04-23 Thread Maki Watanabe
storage_port: Used for Gossip and Data exchange. So in your word, it
is the port for the seeds.

You CAN change the storage_port, but all nodes in your ring need to
use same storage_port number.
That's why you need different IP address for each node.

rpc_port: Used for Thrift which the Cassandra clients connect to.

I can't understand why you can't change rpc_port.

maki


2011/4/23 Tomas Vondra t...@fuzzy.cz:
 Dne 23.4.2011 03:08, Jonathan Ellis napsal(a):
 You really need different IPs.

 OK, thanks. Is there some reason for that? Because if you can't specify
 the port for the seeds (which seems like the reason why different IPs
 are needed), then you actually can't change the port at all. So the rpc
 port is actually fixed and there's no point in changing it ...

 Tomas



Concern on HeartBeatState Generation

2011-04-22 Thread Maki Watanabe
Hello,
I found Gossipper is initiated with seconds from Unix Epoch
(=System.currentTimeMillis() / 1000)
 for HeartBeatState Generation.
Do we get same generation value at very quick restart? Are there any risk here?

regards,

maki


Re: seed faq

2011-04-21 Thread Maki Watanabe
Thank you, Naren.
I'll add some more details and upload it to FAQ wiki page.

maki

2011/4/21 Narendra Sharma narendra.sha...@gmail.com:
 Here are some more details that might help:
 1. You are right that Seeds are referred on startup to learn about the ring.
 2. It is a good idea to have more than 1 seed. Seed is not SPoF. Remember
 Gossip also provides eventual consistency. So if seed is missing, the new
 node may not have the correct view of the ring. However, after talking to
 other nodes it will eventually have the uptodate state of the ring.
 3. In an iteration Gossiper on a node sends gossip message
  - To a known live node (picked randomly)
  - To a known dead node (based on some probability)
  - To a seed node (based on some probability)

 Thanks,
 Naren
 On Wed, Apr 20, 2011 at 7:13 PM, Maki Watanabe watanabe.m...@gmail.com
 wrote:

 I made self answered faqs on seed after reading the wiki and code.
 If I misunderstand something, please point out to me.

 == What are seeds? ==

 Seeds, or seed nodes are the nodes which new nodes refer to on
 bootstrap to know ring information.
 When you add a new node to ring, you need to specify at least one live
 seed to contact. Once a node join the ring, it learns about the other
 nodes, so it doesn't need seed on subsequent boot.

 There is no special configuration for seed node itself. In stable and
 static ring, you can point non-seed node as seed on bootstrap though
 it is not recommended.
 Nodes in the ring tend to send Gossip message to seeds more often by
 design, so it is probable that seeds have most recent and updated
 information of the ring. ( Refer to [[ArchitectureGossip]] for more
 details )

 == Does single seed mean single point of failure? ==

 If you are using replicated CF on the ring, only one seed in the ring
 doesn't mean single point of failure. The ring can operate or boot
 without the seed. But it is recommended to have multiple seeds in
 production system to maintain the ring.



 Thanks
 --
 maki



 --
 Narendra Sharma
 Solution Architect
 http://www.persistentsys.com
 http://narendrasharma.blogspot.com/




seed faq

2011-04-20 Thread Maki Watanabe
I made self answered faqs on seed after reading the wiki and code.
If I misunderstand something, please point out to me.

== What are seeds? ==

Seeds, or seed nodes are the nodes which new nodes refer to on
bootstrap to know ring information.
When you add a new node to ring, you need to specify at least one live
seed to contact. Once a node join the ring, it learns about the other
nodes, so it doesn't need seed on subsequent boot.

There is no special configuration for seed node itself. In stable and
static ring, you can point non-seed node as seed on bootstrap though
it is not recommended.
Nodes in the ring tend to send Gossip message to seeds more often by
design, so it is probable that seeds have most recent and updated
information of the ring. ( Refer to [[ArchitectureGossip]] for more
details )

== Does single seed mean single point of failure? ==

If you are using replicated CF on the ring, only one seed in the ring
doesn't mean single point of failure. The ring can operate or boot
without the seed. But it is recommended to have multiple seeds in
production system to maintain the ring.



Thanks
-- 
maki


Re: cluster IP question and Jconsole?

2011-04-15 Thread Maki Watanabe
127.0.0.2 to 127.0.0.5 are valid IP addresses. Those are just alias
addresses for your loopback interface.
Verify:
  % ifconfig -a

127.0.0.0/8 is for loopback, so you can't connect this address from
remote machines.
You may be able configure SSH port forwarding from your monitroing
host to cassandra node though I haven't try.

maki

2011/4/16 tinhuty he tinh...@hotmail.com:
 I have followed the description here
 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters
 to created 5 instances of cassandra in one CentOS 5.5 machine. using
 nodetool shows the 5 nodes are all running fine.

 Note the 5 nodes are using IP 127.0.0.1 to 127.0.0.5. I understand 127.0.0.1
 is pointing to local server, but how about 127.0.0.2 to 127.0.0.5? looks to
 me that they are not valid IP? how come all 5 nodes are working ok?

 Another question. I have installed MX4J in instance 127.0.0.1 on port 8081.
 I am able to connect to http://server:8081/ from the browser. However how do
 I connect using Jconsole that was installed in another windows
 machines?(since my CentOS5.5 doesn't have X installed, only SSH allowed).

 Thanks.


Re: problems getting started with Cassandra Ruby

2011-04-12 Thread Maki Watanabe
Hello Mark,

Disable verbose mode (-w or $VERBOSE) of ruby.
Or, you can cleanup ruby thrift library by yourself.


2011/4/12 Mark Lilback mlilb...@stat.wvu.edu:
 I'm trying to connect to Cassandra from a Ruby script. I'm using rvm, and 
 made a clean install of Ruby 1.9.2 and then did gem install cassandra. When 
 I run a script that just contains require 'cassandra/0.7', I get the output 
 below. Any suggestion on what I need to do to get rid of these warnings?


 /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift/server/nonblocking_server.rb:80:
  warning: `' interpreted as argument prefix
 /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift_native.bundle:
  warning: method redefined; discarding old skip
 /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift/protocol/base_protocol.rb:235:
  warning: previous definition of skip was here
(snip)

 --
 Mark Lilback
 West Virginia University Department of Statistics
 mlilb...@stat.wvu.edu






-- 
w3m


Re: nodetool repair compact

2011-04-05 Thread Maki Watanabe
Thanks Sylvain, it's very clear.
But should I still need to force major compaction regularly to clear tombstones?
I know that minor compaction clear the tombstones after 0.7, but
maximumCompactionThreshold limits the maximum number of sstable which
will be merged at once, so to GC all tombstones in all sstable in
gc_grace_period, it is safe to run nodetool compact at least once in
gc_grace_period, isn't it?

maki

2011/4/6 Sylvain Lebresne sylv...@datastax.com:
 On Tue, Apr 5, 2011 at 12:01 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Hello,
 On reading O'Reilly's Cassandra book and wiki, I'm a bit confusing on
 nodetool repair and compact.
 I believe we need to run nodetool repair regularly, and it synchronize
 all replica nodes at the end.
 According to the documents the repair invokes major compaction also
 (as side effect?).

 Those documents are wrong then. A repair does not trigger a major
 compaction. The only thing that makes it similar to a major compaction is
 that it will iterate over all the sstables. But for instance, you won't end
 up with one big sstable at the end of repair as you would with a major
 compaction.

 Will this major compaction apply on replica nodes too?

 If I have 3 node ring and CF of RF=3, what should I do periodically on
 this system is:
 - nodetool repair on one of the nodes
 or
 - nodetool repair on one of the nodes, and nodetool compact on 2 of the nodes
 ?

 So as said, repair and compact are independent. You should
 periodically run nodetool
 repair (on one of your nodes in your case as you said). However, it is
 not advised anymore
 to run nodetool compact regularly unless you have a good reason to.

 --
 Sylvain



nodetool repair compact

2011-04-04 Thread Maki Watanabe
Hello,
On reading O'Reilly's Cassandra book and wiki, I'm a bit confusing on
nodetool repair and compact.
I believe we need to run nodetool repair regularly, and it synchronize
all replica nodes at the end.
According to the documents the repair invokes major compaction also
(as side effect?).
Will this major compaction apply on replica nodes too?

If I have 3 node ring and CF of RF=3, what should I do periodically on
this system is:
- nodetool repair on one of the nodes
or
- nodetool repair on one of the nodes, and nodetool compact on 2 of the nodes
?

Thanks
maki


Re: Does anyone build 0.7.4 on IDEA?

2011-03-31 Thread Maki Watanabe
ant on my command line had completed without error.
Next I tried to build cassandra 0.7.4 in eclipse, and had luck.
So I'll explore cassandra code with eclipse, rather than IDEA.

maki

2011/3/31 Maki Watanabe watanabe.m...@gmail.com:
 Not yet. I'll try.

 maki

 2011/3/31 Tommy Tynjä to...@diabol.se:
 Have you assured you are able to build Cassandra outside
 of IDEA, e.g. on command line?

 Best regards,
 Tommy
 @tommysdk

 On Thu, Mar 31, 2011 at 3:56 AM, Maki Watanabe watanabe.m...@gmail.com 
 wrote:
 Hello,

 I'm trying to build and run cassandra 0.7.4-src on IntelliJ IDEA 10 CE
 on OSX with reading
 http://wiki.apache.org/cassandra/RunningCassandraInIDEA.
 Though I need to omit interface/avro/gen-java, exclude
 java/org/apache/cassandra/hadoop, and
 downloadadd jna.jar into library path, I could kill most of errors.

 However, now it complains on compiling
 java/org/apache/cassandra/db/ReadResponse.java, because of:
  Error:(93, 27) can't access to org.apache.cassandra.db.RowSerializer
                      can't find class file of
 org.apache.cassandra.db.RowSerializer

 I found the class RowSerializer in Row.java, as package
 org.apache.cassandra.db scope,
 But ReadResponse.java is in the package also. Then I can't understand
 why IDEA can't find the class.

 Any suggestion?

 maki






-- 
w3m


Naming issue on nodetool repair command

2011-03-30 Thread Maki Watanabe
Woud you cassandra team think to add an alias name for nodetool
repair command?
I mean, the word repair scares some of people.
When I say we need to run nodetool repair regularly on cassandra
nodes,  they think OH... Those are broken so often!.
So if I can say it in more soft word, ex. sync, tune, or
harmonize, it will be better in impression.

maki


Re: stress.py bug?

2011-03-22 Thread Maki Watanabe
A client thread need to wait for response, during the server can
handle multiple requests simultaneously.

2011/3/22 Sheng Chen chensheng2...@gmail.com:
 I am just wondering, why the stress test tools (python, java) need more
 threads ?
 Is the bottleneck of a single thread in the client, or in the server?
 Thanks.
 Sean

 2011/3/22 Ryan King r...@twitter.com

 On Mon, Mar 21, 2011 at 4:02 AM, pob peterob...@gmail.com wrote:
  Hi,
  I'm inserting data from client node with stress.py to cluster of 6
  nodes.
  They are all on 1Gbps network, max real throughput of network is 930Mbps
  (after measurement).
  python stress.py -c 1 -S 17  -d{6nodes}  -l3 -e QUORUM
   --operation=insert -i 1 -n 50 -t100
  The problem is stress.py show up it does avg ~750ops/sec what is
  127MB/s,
  but the real throughput of network is ~116MB/s.

 You may need more concurrency in order to saturate your network.

 -ryan





-- 
w3m


Re: Error connection to remote JMX agent! on nodetool

2011-03-22 Thread Maki Watanabe
How do you define your Keyspace?
As you may know, in Cassandra, replication (factor) is defined as the
attribute of Keyspace.
And what do you mean:
 However replication never happened.
 I can't get data I set at other node.

What did you do on cassandra, and what did you get in response?

maki


2011/3/23 ko...@vivinavi.com ko...@vivinavi.com:
 Hi Sasha
 Thank you so much for your advice.
 I changed JMX_PORT from 10036 to 8080 in cassandra-env.sh.
 Now nodetool ring is working as following.

 # nodetool --host **.**.254.54 ring
 Address Status   State Load    Owns    Token

           31247585259092561925693111230676487333
 **.**.254.53    Up Normal  51.3 KB 84.50%
 4871825541058236750403047111542070004
 **.**.254.54    Up Normal  66.71 KB   15.50%
 31247585259092561925693111230676487333

 Then it seems I could set data to other node by Cassandra-cli --host other
 node IP --port 9160.(Currently only 2 nodes)
 However replication never happened.
 I can't get data I set at other node.
 I don't know what's wrong.
 (I thought replication starts when cassandra -p restart)
 Please advice me how to do to start replication.
 Thank you for your advice in advance.


 (2011/03/18 23:38), Sasha Dolgy wrote:

 You need to specify the -jmxport with nodetool

 On Mar 19, 2011 2:48 AM, ko...@vivinavi.com ko...@vivinavi.com wrote:
 Hi everyone

 I am still new to Cassandra, Thrift.
 But anyway Cassandra 0.7.4, Thrift 0.5.0 are working on java 1.6.0.18 of
 Debian 5.0.7.at single node.
 Then I had to try and check multi node on 2 servers.
 (JVM_PORT=10036 on /etc/cassandra-env.sh)
 I modified /etc/cassandra/cassandra.yaml as following.
 auto_bootstrap:false -true
 seeds: -127.0.0.1 - add Global IP addres of 2 servers(incl.own server)
 listen_address:localhost - Own Global IP address(or own host name on
 /etc/hosts)
 rpc_address:localhost -0.0.0.0
 I run master server and then slave server.
 netstat -nl is as following. on both servers.
 Proto Recv-Q Send-Q Local Address Foreign Address State
 tcp 0 0 0.0.0.0:9160 0.0.0.0:* LISTEN
 tcp 0 0 0.0.0.0:10036 0.0.0.0:* LISTEN
 tcp 0 0 **.**.**.**:7000 0.0.0.0:* LISTEN

 However it seems Cassandra doesn't work.
 Because I can't get any data from Cluster (always null, data is broken?)
 So I checked the nodetool (nodetool --host IP ring).
 The nodetool had errors as following.
 Error connection to remote JMX agent!
 java.io.IOException: Failed to retrieve RMIServer stub:
 javax.naming.ServiceUnavailableException [Root exception is
 java.rmi.ConnectException: Connection refused to host: **.**.**.**;
 nested exception is:
 java.net.ConnectException: Connection refused]
 at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342)
 at

 javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267)
 at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137)
 at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107)
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511)
 Caused by: javax.naming.ServiceUnavailableException [Root exception is
 java.rmi.ConnectException: Connection refused to host: **.**.**.**;
 nested exception is:
 java.net.ConnectException: Connection refused]
 at
 com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118)
 at

 com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203)
 at javax.naming.InitialContext.lookup(InitialContext.java:409)
 at

 javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1902)
 at

 javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1871)
 at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:276)
 ... 4 more
 Caused by: java.rmi.ConnectException: Connection refused to host:
 **.**.**.**; nested exception is:
 java.net.ConnectException: Connection refused
 at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
 at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
 at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
 at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340)
 at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
 at
 com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:114)
 ... 9 more
 Caused by: java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at

 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
 at

 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
 at
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
 at java.net.Socket.connect(Socket.java:546)
 at java.net.Socket.connect(Socket.java:495)
 at java.net.Socket.init(Socket.java:392)
 at java.net.Socket.init(Socket.java:206)
 at

 

Re: moving data from single node cassandra

2011-03-17 Thread Maki Watanabe
Refer to:
http://wiki.apache.org/cassandra/StorageConfiguration

You can specify the data directories with following parameter in
storage-config.xml (or cassandra.yaml in 0.7+).

commit_log_directory : where commitlog will be written
data_file_directories : data files
saved_cache_directory : saved row cache

maki


2011/3/17 Komal Goyal ko...@ensarm.com:
 Hi,
 I am having single node cassandra setup on a windows machine.
 Very soon I have ran out of space on this machine so have increased the
 hardisk capacity of the machine.
 Now I want to know how I configure cassandra to start storing data in these
 high space partitions?
 Also how the existing data store in this single node cassandra can be moved
 from C drive to the other drives?
 Is there any documentation as to how these configurations can be done?
 some supporting links will be very helpful..


 Thanks,

 Komal Goyal



Re: swap setting on linux

2011-03-16 Thread Maki Watanabe
According to Cassandra Wiki, best strategy is no swap at all.
http://wiki.apache.org/cassandra/MemtableThresholds#Virtual_Memory_and_Swap

2011/3/16 ruslan usifov ruslan.usi...@gmail.com:
 Dear community!

 Please share you settings for swap on linux box

-- 
w3m


Re: Cassandra still won't start - in-use ports block it

2011-03-12 Thread Maki Watanabe
Hello Bob,

1. What does lsof says on TCP:9160 port?

$ lsof -i TCP:9160

2. Have you try to change rpc_port in conf/cassandra.yaml?
ex. rpc_port: 19160

maki

2011/3/12 Jeremy Hanna jeremy.hanna1...@gmail.com:
 I don't know if others have asked this but do you have a firewall running 
 that would prevent access to those ports or something like that?

 On Mar 11, 2011, at 10:40 PM, Bob Futrelle wrote:

 My frustration continues, especially exasperating because so many people 
 just seem to download Cassandra and run it with no problems.
 All my efforts have been stymied by one port-in-use problem after another.
 People on this list have helped and their suggestions got me a little bit 
 further, but no further.

 Platform,  MacBook Pro, OS 10.6.6

 Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
 Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)

 Highlights of the problems:
 ...
 WARN 07:48:28,482 Could not start register mbean in JMX
 ...
 Caused by: java.net.BindException: Address already in use
 ...
 ERROR 07:48:28,511 Exception encountered during startup.
 java.lang.RuntimeException: Unable to create thrift socket to 
 localhost/10.0.1.3:9160
 ...
 Caused by: org.apache.thrift.transport.TTransportException: Could not create 
 ServerSocket on address localhost/10.0.1.3:9160.
 ...

   - Bob Futrelle


auto_bootstrap setting after bootstrapping

2011-03-08 Thread Maki Watanabe
Hello,
According to the Wiki/StorageConfiguration page, auto_bootstrap is
described as below:

auto_bootstrap
Set to 'true' to make new [non-seed] nodes automatically migrate the
right data to themselves. (If no InitialToken is specified, they will
pick one such that they will get half the range of the most-loaded
node.) If a node starts up without bootstrapping, it will mark itself
bootstrapped so that you can't subsequently accidently bootstrap a
node with data on it. (You can reset this by wiping your data and
commitlog directories.)


Does it mean auto_bootstrap setting affects only at initial boot?
Will Cassandra just ignore this setting at subsequent boot?

Thanks,

maki


Re: auto_bootstrap setting after bootstrapping

2011-03-08 Thread Maki Watanabe
Thx!

2011/3/8 aaron morton aa...@thelastpickle.com:
 AFAIK yes. The node marks itself as bootstrapped whenever it starts, and
 will not re-bootstrap once that it set.
 More info here
 http://wiki.apache.org/cassandra/Operations#Bootstrap
 Hope that helps.
 Aaron
 On 8/03/2011, at 9:35 PM, Maki Watanabe wrote:

 Hello,
 According to the Wiki/StorageConfiguration page, auto_bootstrap is
 described as below:
 
 auto_bootstrap
 Set to 'true' to make new [non-seed] nodes automatically migrate the
 right data to themselves. (If no InitialToken is specified, they will
 pick one such that they will get half the range of the most-loaded
 node.) If a node starts up without bootstrapping, it will mark itself
 bootstrapped so that you can't subsequently accidently bootstrap a
 node with data on it. (You can reset this by wiping your data and
 commitlog directories.)
 

 Does it mean auto_bootstrap setting affects only at initial boot?
 Will Cassandra just ignore this setting at subsequent boot?

 Thanks,

 maki





-- 
w3m


Broken image links in Wiki:MemtableThresholds

2011-02-20 Thread Maki Watanabe
Hello folks,
I'm tranlating the Wiki pages to Japanese now.
I found all of images in MemtableThresholds are broken:
http://wiki.apache.org/cassandra/MemtableThresholds

Can anyone fix the links?

Thanks
-- 
maki


Re: Broken image links in Wiki:MemtableThresholds

2011-02-20 Thread Maki Watanabe
Ok, I got it.

2011年2月21日13:37  morishita.y...@future.co.jp:

 I think apache infra team is working on the issue...

 https://issues.apache.org/jira/browse/INFRA-3352

 -Original Message-
 From: Maki Watanabe [mailto:watanabe.m...@gmail.com]
 Sent: Monday, February 21, 2011 1:20 PM
 To: user@cassandra.apache.org
 Subject: Broken image links in Wiki:MemtableThresholds

 Hello folks,
 I'm tranlating the Wiki pages to Japanese now.
 I found all of images in MemtableThresholds are broken:
 http://wiki.apache.org/cassandra/MemtableThresholds

 Can anyone fix the links?

 Thanks
 --
 maki




-- 
w3m