Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-15 Thread Jabbar Azam
I know the SSD's are a bit small but they should be enough for our
application. Out test data is 1.6 TB(including replication of rf=3). Can't
we use LCS? This will give us more space at the expensive of more I/O but
SSD's have loads of I/Os.





Thanks

Jabbar Azam


On 14 April 2013 20:20, Jabbar Azam aja...@gmail.com wrote:

 Thanks Aaron.

 Thanks

 Jabbar Azam


 On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote:

 That's better.

 The SSD size is a bit small, and be warned that you will want to leave
 50Gb to 100GB free to allow room for compaction (using the default size
 tiered).

 On the ram side you will want to run about 4GB (assuming cass 1.2) for
 the JVM the rest can be off heap Cassandra structures. This may not leave
 too much free space for the os page cache, but SSD may help there.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and
 256gb ssds?

 I know it will depend on our workload but will be better than a dual core
 CPU. I think

 Jabbar Azam
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:

 Duel core not the greatest you might run into GC issues before you run
 out of IO from your ssd devices. Also cassandra has other concurrency
 settings that are tuned roughly around the number of processors/cores. It
 is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
 gen garbage managing lots of sockets whatever.


 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:

 That's my guess. My colleague is still looking at CPU's so I'm hoping
 he can get quad core CPU's for the servers.

 Thanks

 Jabbar Azam


 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)








Re: running cassandra on 8 GB servers

2013-04-15 Thread Nikolay Mihaylov
Just a small update here
currently running on one node with 7 GB heap and no JNA
all defaults except the heap, and everything looks OK.

On Sun, Apr 14, 2013 at 9:10 PM, aaron morton aa...@thelastpickle.comwrote:

 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?

 Best thing to do would be run a test on how long it takes to repair or
 bootstrap a node. The 300GB to 500Gb was just a guideline.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 12:02 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?

 Thanks,
 Dean

 On 4/11/13 2:26 PM, aaron morton aa...@thelastpickle.com wrote:

 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.

 You will have a very unhappy time.

 The general rule of thumb / guideline for a HDD based system with 1G
 networking is 300GB to 500Gb per node. See previous discussions on this
 topic for reasons.

 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients

 What was the error ?

 What version are you using?
 If you have changed any defaults for memory in cassandra-env.sh or
 cassandra.yaml revert them. Generally C* will do the right thing and not
 OOM, unless you are trying to store a lot of data on a node that does not
 have enough memory. See this thread for background
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 12/04/2013, at 7:35 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 For one project I will need to run cassandra on following dedicated
 servers:

 Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
 attached HDD's in some kind of RAID, visible as single HDD.

 I can do cluster of 20-30 such servers, may be even more.

 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.

 Currently I am testing with one of such servers, except HDD is 300 GB.
 Every 15-20 hours, I get out of heap memory, e.g. something like:

 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943
 Gossiper.java (line 1077) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
 MessagingService.java (line 682) Waiting for messaging service to quiesce
 INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655
 MessagingService.java (line 888) MessagingService shutting down server
 thread.
 ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
 (line 217) Error occurred during processing of message.
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down

 Anyone have some advices about better utilization of such servers?

 Nick.







Re: CQL3 And ReversedTypes Question

2013-04-15 Thread Gareth Collins
Added:

https://issues.apache.org/jira/browse/CASSANDRA-5472

thanks,
Gareth


On Sun, Apr 14, 2013 at 2:33 PM, aaron morton aa...@thelastpickle.comwrote:

 Bad Request: Type error:
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318cannot be 
 passed as argument 0 of function dateof of type timeuuid

 Is there something I am missing here or should I open a new ticket?

 Yes please.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:40 PM, Gareth Collins gareth.o.coll...@gmail.com
 wrote:

 OK, trying out 1.2.4. The previous issue seems to be fine, but I am
 experiencing a new one:

 cqlsh:location create table test_y (message_id timeuuid, name text,
 PRIMARY KEY (name,message_id));
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_y;

  dateOf(message_id)
 --
  2013-04-13 00:33:42-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:44-0400

 cqlsh:location create table test_x (message_id timeuuid, name text,
 PRIMARY KEY (name,message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_x;
 Bad Request: Type error:
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318cannot be 
 passed as argument 0 of function dateof of type timeuuid

 Is there something I am missing here or should I open a new ticket?

 thanks in advance,
 Gareth


 On Tue, Mar 26, 2013 at 3:30 PM, Gareth Collins 
 gareth.o.coll...@gmail.com wrote:

 Added:

 https://issues.apache.org/jira/browse/CASSANDRA-5386

 Thanks very much for the quick answer!

 regards,
 Gareth

 On Tue, Mar 26, 2013 at 3:55 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
  You aren't missing anything obvious. That's a bug really. Would you mind
  opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA?
 
  --
  Sylvain
 
 
  On Tue, Mar 26, 2013 at 2:48 AM, Gareth Collins 
 gareth.o.coll...@gmail.com
  wrote:
 
  Hi,
 
  I created a table with the following structure in cqlsh (Cassandra
  1.2.3 - cql 3):
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId));
 
  I can quite happily add values to this table. e.g:
 
  insert into client_queue (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  Yet if I decide I want to set the clustering order on messageId DESC:
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
  ORDER BY (messageId DESC);
 
  and try to do an insert:
 
  insert into client_queue2 (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  I get the following error:
 
  Bad Request: Type error: cannot assign result of function now (type
  timeuuid) to messageid (type
 
 
 'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')
 
  I am sure I am missing something obvious here, but I don't understand.
  Why am I getting an error? What do I need
  to do to be able to add an entry to this table?
 
  thanks in advance,
  Gareth
 
 






Re: Problems with shuffle

2013-04-15 Thread Richard Low
On 14 April 2013 00:56, Rustam Aliyev rustam.li...@code.az wrote:

  Just a followup on this issue. Due to the cost of shuffle, we decided
 not to do it. Recently, we added new node and ended up in not well balanced
 cluster:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host
 ID   Rack
 UN  10.0.1.8  52.28 GB   260 18.3%
 d28df6a6-c888-4658-9be1-f9e286368dce  rack1
 UN  10.0.1.11 55.21 GB   256 9.4%
 7b0cf3c8-0c42-4443-9b0c-68f794299443  rack1
 UN  10.0.1.2  49.03 GB   259 17.9%
 2d308bc3-1fd7-4fa4-b33f-cbbbdc557b2f  rack1
 UN  10.0.1.4  48.51 GB   255 18.4%
 c253dcdf-3e93-495c-baf1-e4d2a033bce3  rack1
 UN  10.0.1.1  67.14 GB   253 17.9%
 4f77fd70-b134-486b-9c25-cfea96b6d412  rack1
 UN  10.0.1.3  47.65 GB   253 18.0%
 4d03690d-5363-42c1-85c2-5084596e09fc  rack1

 It looks like new node took from each other node equal amount of vnodes -
 which is good. However, it's not clear why it decided to have twice less
 than other nodes.


I think this is expected behaviour when adding a node to a cluster that has
been upgraded to vnodes without shuffling.  The old nodes have equally
spaced contiguous tokens.  The new node will choose 256 random new tokens,
which will on average bisect the old ranges.  This means each token the new
node has will only cover half the range (on average) as the old ones.

However, the thing that really matters is the load, which is surprisingly
balanced at 55 GB.  This isn't guaranteed though - it could be about half
or it could be significantly more.  The problem with not doing the shuffle
is the vnode after all the contiguous vnodes for a certain node will be the
target for the second replica of *all* the vnodes for that node.  E.g. if
node A has tokens 10, 20, 30, 40, node B has tokens 50, 60, 70, 80 and node
C (the new node) chooses token 45, it will store a replica for all data
stored in A's tokens.  This is exactly the same reason as why tokens in a
multi-DC deployment need to be interleaved rather than be contiguous.

If shuffle isn't going to work, you could instead decommission each node
then bootstrap it in again.  In principle that should copy your data twice
as much as required (shuffle is optimal in terms of data transfer) but some
implementation details might make it more efficient.

Richard.


Re: Extracting data from SSTable files with MapReduce

2013-04-15 Thread Jasper K.
Hi Aaron,

I did try to upgrade to 1.2 but it did not work out. Maybe to many versions
in between.

Why would later formats make this easier you think?

Jasper



2013/4/14 aaron morton aa...@thelastpickle.com

 The SSTable files are in the -f- format from 0.8.10.

 If you can upgrade to the latest version it will make things easier.
 Start a node and use nodetool upgradesstables.

 The org.apache.cassandra.tools.SSTableExport class provides a blue print
 for reading rows from disk.

 hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 7:58 PM, Jasper K. jasper.knu...@incentro.com wrote:

 Hi,

 Does anyone have any experience with running a MapReduce directly against
 a CF's SSTable files?

 I have a use case where this seems to be an option. I want to export all
 data from a CF to a flat file format for statistical analysis.

 Some factors that make it (more) doable in my case:
 -The Cassandra instance is not 'on-line' (no writes- no reads)
 -The .db files were exported from another instance. I got them all in one
 place now

 The SSTable files are in the -f- format from 0.8.10.

 Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it
 should be possible to write a Hadoop RecordReader for Cassandra rowkeys.

 But maybe I am not fully aware of what I am up to.

 --

 *Jasper** *





--


Re: StatusLogger format?

2013-04-15 Thread William Oberman
99% sure it's in bytes.


On Mon, Apr 15, 2013 at 11:25 AM, William Oberman
ober...@civicscience.comwrote:

 Mainly the:
 ColumnFamilyMemtable ops,data
 section.

 Is data in bytes/kb/mb/etc?

 Example line:
 StatusLogger.java (line 116) civicscience.sessions4963,1799916

 Thanks!





Re: Cassandra 1.2.2 cluster + raspberry

2013-04-15 Thread murat migdisoglu
Hi Aaron,

Thank you for your support. It was my mistake indeed. The second node was
still configured to have the internode comm to be compressed.

After I fixed it, I'm able to start my cluster.

Cheers



On Thu, Apr 11, 2013 at 12:40 PM, aaron morton aa...@thelastpickle.comwrote:

 I've already tried to set internode_compression: none in my yaml files.

 What version are you on?

 If you've set internode_compression to none and restarted? Can you double
 check.
 The code stack shows cassandra deciding that the connection should be
 compressed.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/04/2013, at 12:54 PM, murat migdisoglu murat.migdiso...@gmail.com
 wrote:

 Hi,

 I'm trying to set up a cassandra cluster for some experiments on my
 raspberry pies but I'm still having trouble to join my nodes to the cluster.

 I started with two nodes (192.168.2.3 and 192.168.2.7) and when I start
 the cassandra, I see the following exception on the node 192.168.2.7
 ERROR [WRITE-/192.168.2.3] 2013-04-10 02:10:24,524 CassandraDaemon.java
 (line 132) Exception in thread Thread[WRITE-/192.168.2.3,5,main]
 java.lang.NoClassDefFoundError: Could not initialize class
 org.xerial.snappy.Snappy
 at
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
 at
 org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:66)
 at
 org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:322)
 at
 org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:143)

 I suspect that the lack of native snappy libraries are causing this
 exception furing the internode communication.
 I did not try to compile the native Snappy for ARM yet but I wonder if it
 is not possible to use cassandra without snappy.

 I've already tried to set internode_compression: none in my yaml files.

 nodetool outputs:

 nodetool -h pi1 ring

 Datacenter: dc1
 ==
 Replicas: 1

 Address RackStatus State   Load
 OwnsToken

 192.168.2.7 RAC1Up Normal  92.35 KB
 100.00% 0

 nodetool -h pi2 ring

 Datacenter: dc1
 ==
 Replicas: 1

 Address RackStatus State   Load
 OwnsToken

 192.168.2.3 RAC1Up Normal  92.42 KB
 100.00% 85070591730234615865843651857942052864



 Kind Regards







-- 
Find a job you enjoy, and you'll never work a day in your life.
Confucius


Re: Vnodes - HUNDRED of MapReduce jobs

2013-04-15 Thread Alicia Leong
Hi cem cayiro...@gmail.com,

In your previous reply, you mentioned that you have a simple solution.
Can you share with us :)

Thanks in advance.


On Sat, Mar 30, 2013 at 2:33 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 It should be easy to control the number of map tasks.
 http://wiki.apache.org/hadoop/HowManyMapsAndReduces. It standard HDFS you
 might run into a directory with 10,000 small files and you do not want
 10,000 map tasks. This is what the CombinedInputFormat's do, they help you
 control the number of map tasks a job will generate. For example, imagine i
 have a multi-tenant cluster. If a job kicks up 10,000 map tasks, all those
 tasks can starve out other jobs. Being able to say I only want 4 map tasks
 per c* node regardless of the number of vnodes would be a meaningful and
 useful feature.


 On Fri, Mar 29, 2013 at 2:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Yes but my point, is with 50 map slots you can only be processing 50 at
 once. So it will take 1000/50 waves of mappers to complete the job.


 On Fri, Mar 29, 2013 at 11:46 AM, Jonathan Ellis jbel...@gmail.comwrote:

 My point is that if you have over 16MB of data per node, you're going
 to get thousands of map tasks (that is: hundreds per node) with or
 without vnodes.

 On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  Every map reduce task typically has a minimum Xmx of 256MB memory. See
  mapred.child.java.opts...
  So if you have a 10 node cluster with 256 vnodes... You will need to
 spawn
  2,560 map tasks to complete a job.
  And a 10 node hadoop cluster with 5 map slotes a node... You have 50
 map
  slots.
 
  Wouldnt it be better if the input format spawned 10 map tasks instead
 of
  2,560?
 
 
  On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  I still don't see the hole in the following reasoning:
 
  - Input splits are 64k by default.  At this size, map processing time
  dominates job creation.
  - Therefore, if job creation time dominates, you have a toy data set
  ( 64K * 256 vnodes = 16 MB)
 
  Adding complexity to our inputformat to improve performance for this
  niche does not sound like a good idea to me.
 
  On Thu, Mar 28, 2013 at 8:40 AM, cem cayiro...@gmail.com wrote:
   Hi Alicia ,
  
   Cassandra input format creates mappers as many as vnodes. It is a
 known
   issue. You need to lower the number of vnodes :(
  
   I have a simple solution for that and ready to write a patch.
 Should I
   create a ticket about that? I don't know the procedure about that.
  
Regards,
   Cem
  
  
   On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong lccali...@gmail.com
   wrote:
  
   Hi All,
  
   I have 3 nodes of Cassandra 1.2.3  edited the cassandra.yaml for
   vnodes.
  
   When I execute a M/R job .. the console showed HUNDRED of Map
 tasks.
  
   May I know, is the normal since is vnodes?  If yes, this have slow
 the
   M/R
   job to finish/complete.
  
  
   Thanks
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced






Re: Does Memtable resides in Heap?

2013-04-15 Thread Jay Svc
Thanks Vitor,

So as per recommendation its only efficient when heap size is below 8GB.
How about when we have more RAM, does that rest of the RAM can be left for
OS to make use?

How about the bloom filter and index samples, are they part of off-heap?

Thank you for your response.

Regards,
Jay


On Thu, Apr 11, 2013 at 10:35 PM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

 Memtables resides in heap, write rate impacts GC, more writes - more
 frequent and longer ParNew GC pauses.


 From: Jay Svc [mailto:jaytechg...@gmail.com]
 Sent: Friday, April 12, 2013 01:03
 To: user@cassandra.apache.org
 Subject: Does Memtable resides in Heap?

 Hi Team,

 I have got this 8GB of RAM out of that 4GB allocated to Java Heap. My
 question is the size of Memtable does it contribute to heap size? or they
 are part of off-heap?

 Does bigger Memtable would have impact on GC and overall memory management?

 I am using DSE 3.0 / Cassandra 1.1.9.

 Thanks,
 Jay

 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.