Re: Implement details of Protocol v5 framing

2024-08-15 Thread Jeremy Hanna
Also I just wanted to point out that there is a cassandra-drivers slack room on the ASF slack where folks that work on the different drivers interact. (added to both user and dev list thread) > On Aug 14, 2024, at 5:59 PM, Dinesh Joshi wrote: > > Hi Vincent, > > This is the Cassandra user's m

Re: apache cassandra development process and future

2018-07-24 Thread Jeremy Hanna
For full disclosure, I've been in the Apache Cassandra community since 2010 and at DataStax since 2012. So DataStax moved on to focus on things for their customers, effectively putting most development effort into DataStax Enterprise. However, there have been a lot of fixes and improvements co

Re: Partition size

2016-09-12 Thread Jeremy Hanna
Generally if you foresee the partitions getting out of control in terms of size, a method often employed is to bucket according to some criteria. For example, if I have a time series use case, I might bucket by month or week. That presumes you can foresee it though. As far as limiting that ca

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jeremy Hanna
Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more detail

Re: Pig 0.12.0 and Cassandra 2.0.2

2013-12-13 Thread Jeremy Hanna
I need to update those to be current with the Cassandra source download. You’re right, you would just use what’s in the examples directory now for Pig. You should be able to run the examples, but generally you need to specify the partitioner of the cluster, the host name of a node in the clust

Re: Snappy Load Error

2013-11-29 Thread Jeremy Hanna
With RHEL, there is a problem with snappy 1.0.5. You’d need to use 1.0.4.1 which works fine but you need to download it separately and put it in your lib directory. You can find the 1.0.4.1 file from https://github.com/apache/cassandra/tree/cassandra-1.1.12/lib Jeremy On 29 Nov 2013, at 10:1

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
rt#Oozie > > I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1. > > I try to test these options and see if it works- > > Thanks in advance > > > > > > > > > > > > 2013/11/28 Jeremy Hanna > >> If I rememb

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed. What is the problem you’re experiencing that you are unable to do this? Jeremy On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera wrote: > hi all;

Cassandra Summit EU 2013

2013-09-30 Thread Jeremy Hanna
For those in the Europe area, there will be a Cassandra Summit EU 2013 in London in the month of October. On 17 October, there will be the main conference sessions and the 16th and 18th there will be Cassandra workshops. http://www.datastax.com/cassandraeurope2013 The speakers have been announ

Re: Security?

2013-09-05 Thread Jeremy Hanna
1/security/security_features On 5 Sep 2013, at 17:51, "Hartzman, Leslie" wrote: > Thanks for the info. > > So open-source Cassandra does not provide for auditing? > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursd

Re: Security?

2013-09-05 Thread Jeremy Hanna
For open-source Cassandra, there is a framework for security (see the security book-thing in the sidebar): http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html For those wanting additional things like auditing and other features, there's DataStax Enterprise: http://www.datasta

Re: bug in Pig LOAD with cqlStorage and param columns? - cassandra 1.2.8 - pig 0.11.1

2013-08-21 Thread Jeremy Hanna
In order to narrow down the problem, I would start without the request parameters and see if that works. Then I would add the request parameters one at a time to see what breaks things. Often pig is not very helpful with its error messages, so I've had to use this method a lot. On 21 Aug 2013

Re: C* 1.0.6 to 1.1.12: upgradesstables or scrub?

2013-08-13 Thread Jeremy Hanna
If you were using leveled compaction on any column families in 1.0, you'll need to run offline scrub on those column families. On 13 Aug 2013, at 15:38, Romain HARDOUIN wrote: > Hi all, > > We are migrating from C* 1.0.6 to 1.1.12 and after reading DataStax > documentation (http://www.datast

Re: [RELEASE] Apache Cassandra 1.2.8

2013-07-29 Thread Jeremy Hanna
The CHANGES and NEWS links pointed to the 1.2.8-tentative. The 1.2.8 links are: CHANGES.txt: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/1.2.8 NEWS.txt: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags

Re: Too many open files and stopped compaction with many pending compaction tasks

2013-06-27 Thread Jeremy Hanna
Are you on SSDs? On 27 Jun 2013, at 14:24, "Desimpel, Ignace" wrote: > On a test with 3 cassandra servers version 1.2.5 with replication factor 1 > and leveled compaction, I did a store last night and I did not see any > problem with Cassandra. On all 3 machine the compaction is stopped alread

Re: Cassandra as storage for cache data

2013-06-25 Thread Jeremy Hanna
If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk). To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2. See: http://www.datastax.com/d

Re: chunk lenght

2013-03-09 Thread Jeremy Hanna
These pages may have some helpful background for you: http://www.datastax.com/docs/1.1/configuration/storage_configuration#compression-options http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression Cheers, Jeremy On Mar 9, 2013, at 9:27 PM, Kanwar Sangha wrote: > Hi – Can some

Re: Authentication and Authorization with Cassandra 1.2.2.

2013-02-26 Thread Jeremy Hanna
does this help? Links at the bottom show the cql statements to add/modify users: http://www.datastax.com/docs/1.2/security/native_authentication On Feb 26, 2013, at 4:06 PM, C.F.Scheidecker Antunes wrote: > Hello all, > > Cassandra has changed and now has a default authentication and authori

Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Jeremy Hanna
Fwiw - here is are some changes that a friend said should make C*'s Hadoop support work with CDH4 - for ColumnFamilyRecordReader. https://gist.github.com/jeromatron/4967799 On Feb 16, 2013, at 8:23 AM, Edward Capriolo wrote: > Here is the deal. > > http://wiki.apache.org/hadoop/Defining%20Hado

Re: Start token sorts after end token

2013-02-01 Thread Jeremy Hanna
See https://issues.apache.org/jira/browse/CASSANDRA-5168 - should be fixed in 1.1.10 and 1.2.2. On Jan 30, 2013, at 9:18 AM, Tejas Patil wrote: > While reading data from Cassandra in map-reduce, I am getting > "InvalidRequestException(why:Start token sorts after end token)" > > Below is the c

Re: Hybrid Hadoop Cassandra Cluster

2013-01-18 Thread Jeremy Hanna
Hi Naveen, You can start with http://wiki.apache.org/cassandra/HadoopSupport but there's also a commercial product that you can use, DataStax Enterprise: http://www.datastax.com/docs/datastax_enterprise2.2/solutions/hadoop_index which makes things more streamlined, but it's a commercial product

Re: progress of cleanup operations

2012-11-29 Thread Jeremy Hanna
You can do check nodetool compactionstats to see progress for current cleanup operations. It essentially traverses all of your sstables and removes data that the node isn't responsible for. So that's the overall operation, so you would estimate in terms of how long it would take to go through

Re: leveled compaction and tombstoned data

2012-11-08 Thread Jeremy Hanna
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner wrote: > "kill performance" is relative. Leveled Compaction basically costs 2x disk > IO. Look at

Re: hadoop consistency level

2012-10-18 Thread Jeremy Hanna
On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote: > On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman > wrote: >> Not sure I understand your question (if there is one..) >> >> You are more than welcome to do CL ONE and assuming you have hadoop nodes >> in the right places on your ring things

Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
t; On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna > wrote: > The Dachis Group (where I just came from, now at DataStax) uses pig with > cassandra for a lot of things. However, we weren't using the widerow > implementation yet since wide row support is new to 1.1.x and we were on 0

Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
The Dachis Group (where I just came from, now at DataStax) uses pig with cassandra for a lot of things. However, we weren't using the widerow implementation yet since wide row support is new to 1.1.x and we were on 0.7, then 0.8, then 1.0.x. I think since it's new to 1.1's hadoop support, it s

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
It's always had data locality (since hadoop support was added in 0.6). You don't need to specify a partition, you specify the input predicate with ConfigHelper or the cassandra.input.predicate property. On Oct 2, 2012, at 2:26 PM, "Hiller, Dean" wrote: > So you're saying that you can access th

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you

Re: is multithreaded_compaction stable?

2012-09-15 Thread Jeremy Hanna
Generally the main knob for compaction performance is compaction_throughput_in_mb in cassandra.yaml. It defaults to 16. You can use nodetool setcompactionthroughput' to set it on a running server. The next time Cassandra server starts it will use what's in the yaml again. You might try usin

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Jeremy Hanna
A couple of guesses: - are you mixing versions of Cassandra? Streaming differences between versions might throw this error. That is, are you bulk loading with one version of Cassandra into a cluster that's a different version? - (shot in the dark) is your cluster overwhelmed for some reason? I

Re: Differences in row iteration behavior

2012-09-14 Thread Jeremy Hanna
Are there any deletions in your data? The Hadoop support doesn't filter out tombstones, though you may not be filtering them out in your code either. I've used the hadoop support for doing a lot of data validation in the past and as long as you're sure that the code is sound, I'm pretty confid

Re: Cassandra 1.1.1 on Java 7

2012-09-09 Thread Jeremy Hanna
Starting with 1.6.0_34, you'll need xss set to 180k. It's updated with the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12). https://issues.apache.org/jira/browse/CASSANDRA-4631 See also the comments on https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference to wh

Re: Index build status

2012-08-20 Thread Jeremy Hanna
For an individual node, you can check the status of building indexes using nodetool compactionstats. And similarly, if you want to speed up building the indexes (and you have the extra IO) you can increase or unthrottle your compaction throughput temporarily - nodetool setcompactionthrough 0 to

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
rote: > Thanks Jeremy, but this doesn't work for me. I am using cql3, because I need > new features like composite keys. The manual you pointed to is for 2.0. > I have suspicion that cql3 does not support dynamic tables at all. Is there a > manual for cql3? > > -----Orig

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
you can use the cqlsh help but it will eventually refer you to a cql reference such as this one that says what the options are. Looks like you need just 'default_validation'. http://www.datastax.com/docs/1.0/references/cql/index#cql-column-family-storage-parameters On Jul 6, 2012, at 2:13 PM,

Re: Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Jeremy Hanna
Sorry - it was at the austin cassandra meetup and we didn't record the presentation. I wonder if this would be a popular topic to have at the upcoming Cassandra SF event which would be recorded... On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote: > Hi! > > I found the slides of the lecture

Re: Exception when truncate

2012-05-17 Thread Jeremy Hanna
when doing a truncate, it has to talk to all of the nodes in the ring to perform the operation. by the error, it looks like one of the nodes was unreachable for some reason. you might do a nodetool ring in the cli do a 'describe cluster;' and see if your ring is okay. So I think the operation

Re: Source for Cassandra Pig and Hive

2012-05-02 Thread Jeremy Hanna
The hive support is going to be integrated into the main source tree with this ticket: https://issues.apache.org/jira/browse/CASSANDRA-4131 You can go to https://github.com/riptano/hive to find the CassandraStorageHandler right now though. For 1.0.8, the CassandraStorage class for the Pig suppor

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
I backported this to 0.8.4 and it didn't fix the problem we were seeing (as I outlined in my parallel post) but if it fixes it for you, then beautiful. Just wanted to let you know our experience with similar symptoms. On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote: > Fixed in https://issue

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
fwiw - we had a similar problem reading at quorum with 0.8.4 when reading with hadoop. The symptom we see is when reading a column family with hadoop using quorum using 0.8.4, we have lots of minor compactions as a result of heavy writes. When we read at CL.ONE or move to 1.0.8 the problem is

cassandra_jobs on twitter

2012-04-10 Thread Jeremy Hanna
some time back, I created the account cassandra_jobs on twitter. if you email the user list or better yet just cc cassandra_jobs on twitter, I'll retweet it there so that the information can get out to more people. https://twitter.com/#!/cassandra_jobs cheers, Jeremy

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Jeremy Hanna
you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though. On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote: > Hi, > > I was recently trying Hadoop job + cassandra-all 0.8.10 again and the > Ti

Re: hadoop map join with ColumnFamilyInputFormat

2012-03-01 Thread Jeremy Hanna
I haven't used that in particular, but it's pretty trivial to do that with Pig and I would imagine it would just do the right thing under the covers. It's a simple join with Pig. We use pygmalion to get data from the Cassandra bag. A simple example would be: DEFINE FromCassandraBag org.pygmal

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
By chance are you in EC2? On Feb 24, 2012, at 8:33 AM, Patrik Modesto wrote: > Hi Jeremy, > > I've seen the page and tried the values but to no help. > > Here goes tcpdump of one failed TCP connection: > > 15:06:20.231421 IP 10.0.18.87.9160 > 10.0.18.87.39396: Flags [P.], seq > 137891735:13790

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
Check out the troubleshooting section of the hadoop support - we ran into the same thing and tried to update that with some info on how to get around it: http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting On Feb 24, 2012, at 7:20 AM, Patrik Modesto wrote: > Hi, > > I can see some st

Re: General questions about Cassandra

2012-02-17 Thread Jeremy Hanna
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS, over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a simpler solution, there is DataStax's enterprise product whic

Re: Hive + Cassandra tutorial

2012-01-23 Thread Jeremy Hanna
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source download of cassandra there's a contrib/pig section that has a wordcount example. On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote: > Hi, > > I'm trying to experiment with Hive using Data in Cassandra. Brisk look

Re: Installing C* on EC2

2012-01-13 Thread Jeremy Hanna
On Jan 12, 2012, at 6:36 PM, Mohit Anchlia wrote: > What's the best way to install C*? Any good links? http://www.slideshare.net/mattdennis/cassandra-on-ec2 has some interesting points that aren't immediately obvious - it's mdennis in the cassandra irc channel if you had any questions about th

Re: Hadoop + Cassandra

2012-01-06 Thread Jeremy Hanna
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll want to look in the section on cluster configuration. DataStax also has a product that makes it pretty simple to use Hadoop with Cassandra if you don't mind paying for it - http://www.datastax.com/products/enterprise

Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > Hi, could anyone tell me whether this is possible with Cassandra using an > appropriately sized EC2 cluster. > > 100,000 clients writing 50k each

Re: cassandra data to hadoop.

2011-12-24 Thread Jeremy Hanna
to achieve this. > > -R > > On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna > wrote: > We do this all the time. Take a look at > http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use > mapreduce or pig to get data out of cassandra. If it

Re: Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Jeremy Hanna
One way to get a good bird's eye view of the cluster would be to install DataStax Opscenter - the community edition is free. You can do a lot of checks from a web interface that are based on the jmx hooks that are in Cassandra. We use it and it's helped us a lot. Hope it helps for what you're

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
33 AM, Praveen Sadhu wrote: > Have you tried Brisk? > > > > On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna" > wrote: > >> We do this all the time. Take a look at >> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can >> u

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's going to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data nodes on your cassandra

Re: Using Cassandra in Rails App

2011-12-16 Thread Jeremy Hanna
Traditionally there are two places to go. Twitter's ruby client at https://github.com/twitter/cassandra or the newer cql driver at http://code.google.com/a/apache-extras.org/p/cassandra-ruby/. The latter might be nice for green field applications but CQL is still gaining features. Some peopl

Re: Cassandra not suitable?

2011-12-07 Thread Jeremy Hanna
If you're getting lots of timeout exceptions with mapreduce, you might take a look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting We saw that and tweaked a variety of things - all of which are listed there. Ultimately, we also boosted hadoop's tolerance for them as well and it

Cassandra_Jobs on Twitter

2011-11-30 Thread Jeremy Hanna
For those interested in Apache Cassandra related jobs - either hiring or in search of - there is now a @Cassandra_Jobs account on Twitter. You can either send posts to that account on twitter or send them to me at this email address with a public link to the job posting and I will tweet them. Che

Re: User Survey

2011-11-29 Thread Jeremy Hanna
On Nov 29, 2011, at 12:25 PM, Don Smith wrote: > cli's "show keyspaces" command shows way too much information by default. > > I think by default it should show just one line per keyspace. A "-v" option > could show more info. If you are using 1.x, there is a describe command for specific ke

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote: > Jeremy Hanna gmail.com> writes: > >> >> If you are only interested in loading one row, why do you need to use Pig? >> Is > it an extremely wide row? >> >> Unless you are using an ordered

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you have to mapreduce over the whole column family. That will change probably in 1.1. H

Re: secondary indexes streaming building - when there are none

2011-11-13 Thread Jeremy Hanna
https://issues.apache.org/jira/browse/CASSANDRA-3488 On Nov 12, 2011, at 9:52 AM, Jeremy Hanna wrote: > It sounds like that's just a message in compactionstats that's a no-op. This > is reporting for about an hour that it's building a secondary index on a > specific

Re: secondary indexes streaming building - when there are none

2011-11-12 Thread Jeremy Hanna
> On Fri, Nov 11, 2011 at 9:10 PM, Jeremy Hanna > wrote: >> We're using 0.8.4 in our cluster and two nodes needed rebuilding. When >> building and streaming data to the nodes, there were multiple instances of >> building secondary indexes. We haven't had seco

secondary indexes streaming building - when there are none

2011-11-11 Thread Jeremy Hanna
We're using 0.8.4 in our cluster and two nodes needed rebuilding. When building and streaming data to the nodes, there were multiple instances of building secondary indexes. We haven't had secondary indexes in that keyspace since like mid-August. Is that a bug? Thanks, Jeremy

Re: Efficient map reduce over ranges of Cassandra data

2011-11-11 Thread Jeremy Hanna
Nice! Thanks Ed. On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote: > Hey all, > > I know there are several tickets in the pipe that should make it possible do > use secondary indexes to run map reduce jobs that do not have to ingest the > entire dataset such as: > > https://issues.apache.

Re: Massive writes when only reading from Cassandra

2011-10-17 Thread Jeremy Hanna
cable rock in our backpack and hopefully clears up where that setting is actually used. I'll update the storage configuration wiki to include that caveat as well. On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote: > Thanks for the insights. I may first try disabling hinted handoff for

Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Jeremy Hanna
Just for informational purposes, Pete and I tried to troubleshoot it via twitter. I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1. He's going to dig in to see if there's something else going on. // Cassandra-cli stuff // bin/cassandra-cli -h localhost -p 9160 create keyspace

Hadoop settings if running into blacklisted task trackers with Cassandra

2011-09-24 Thread Jeremy Hanna
I thought I would share something valuable that Jacob Perkins (who recently started with us) shared. We were seeing blacklisted task trackers and occasionally failed jobs. These were almost always based on TimedOutExceptions from Cassandra. We've been fixing underlying reasons for those excep

Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Jeremy Hanna
Take a look at http://www.datastax.com/dev/blog/bulk-loading I'm sure there is a way to make it more seamless for what you want to do and it could be built on, but the recent bulk loading additions will provide the best foundation. On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote: > We are tryi

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
> So to move data from node with token 0, the new node needs to have > initial token set to 170141183460469231731687303715884105727 ? I would do this route. > Another idea: could I move token to 1, and then use token 0 on the new node? nodetool move prior to 0.8 is a very heavy operation.

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727 On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote: > What could you do if the initial_token is 0? > > On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna > wrote: >> Yeah - I would bootstrap at

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
Turned out that wasn't a problem - I put some notes on the ticket. On Sep 10, 2011, at 6:22 PM, Jeremy Hanna wrote: > I tried looking through the source to see if the log statements would happen > regardless but it doesn't look like it. Also I looked at one of the nodes >

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
rowse/CASSANDRA-3176 On Sep 10, 2011, at 5:50 PM, Jeremy Hanna wrote: > INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java > (line 323) Started hinted handoff for endpoint /10.1.2.3 > INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java > (l

Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
We just tried to disable hinted handoff by setting: hinted_handoff_enabled: false in all the nodes of our cluster and restarting them. When they come back up, we continue to see things like this: INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line 323) Started hinted h

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
t; 2) You have something doing writes that you're not aware of, I guess > you could track that down using wireshark to see where the write > messages are coming from > > On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna > wrote: > > Oh and we're running 0.8.4 and the RF

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
Oh and we're running 0.8.4 and the RF is 3. On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote: > In addition, the mutation stage and the read stage are backed up like: > > Pool NameActive Pending Blocked > ReadStage32

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 0 CompactionManager n/a29 MessagingServicen/a 0,34 On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote: > We are experiencing mass

Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
We are experiencing massive writes to column families when only doing reads from Cassandra. A set of 5 hadoop jobs are reading from Cassandra and then writing out to hdfs. That is the only thing operating on the cluster. We are reading at CL.QUORUM with hadoop and have written with CL.QUORUM.

Re: Anybody out there using 0.8 in production

2011-09-08 Thread Jeremy Hanna
We run 0.8 in production and it's been working well for us. There are some new settings that we had to tune for - for example, the default concurrent compaction is the number of cores. We had to tune that down because we also run hadoop jobs on our nodes. On Sep 8, 2011, at 4:44 PM, Anand Som

Re: Any tentative data for 0.8.5 release?

2011-09-07 Thread Jeremy Hanna
The voting started on Monday and is a 72 hour vote. So if there aren't any problems that people find, it should be released sometime Thursday (7 September). On Sep 7, 2011, at 10:41 AM, Roshan Dawrani wrote: > Hi, > > Quick check: is there a tentative date for release of Cassandra 0.8.5? > >

Re: cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-04 Thread Jeremy Hanna
Thanks William - so you were able to get everything running correctly, right? FWIW, we're in the process of upgrading to 0.8.4 and found that all we needed was that first link you mentioned - the VersionedValue modification. It's running fine on our staging cluster and we're in the process of m

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I dont remember setting up snitch. > > The servers are all in a VPC, the only thing I did was configure the seed IP > so all the nodes can see each other. > > Ben > > On Sat, Sep 3, 2011 at 11:13 PM, Jeremy Hanna > wrote: > I would look at http://www.slideshar

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2 Also, people generally do raid0 on the ephemerals. EBS is a bad fit for cassandra - see the presentation above. However, that means you'll need to have a backup strategy, which is also mentioned in the presentation. Also ar

Re: Cassandra prod environment

2011-09-02 Thread Jeremy Hanna
We moved off of ubuntu because of kernel issues in the AMIs we found in 10.04 and 10.10 in ec2. So we're now on debian squeeze with ext4. It's been great for us. One thing that bit us is we'd been using property file snitch and the availability zones as racks and had an equal number of nodes

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
rive the current time in > nano seconds though? > > On Tue, Aug 30, 2011 at 2:39 PM, Jeremy Hanna > wrote: >> Yes - the reason why internally Cassandra uses milliseconds * 1000 is >> because System.nanoTime javadoc says "This method can only be used to >> mea

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
/repos/asf/cassandra/trunk/contrib/pig. Are there any > other resource that you can point me to? There seems to be a lack of samples > on this subject. > > On Tue, Aug 30, 2011 at 10:56 PM, Jeremy Hanna > wrote: > FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
0. > > Anyone sees problem with this approach? > > On Tue, Aug 30, 2011 at 2:20 PM, Edward Capriolo > wrote: >> >> >> On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna >> wrote: >>> >>> I would not use nano time with cassandra. Internall

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
Ed- you're right - milliseconds * 1000. That's right. The other stuff about nano time still stands, but you're right - microseconds. Sorry about that. On Aug 30, 2011, at 1:20 PM, Edward Capriolo wrote: > > > On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna > wr

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
I would not use nano time with cassandra. Internally and throughout the clients, milliseconds is pretty much a standard. You can get into trouble because when comparing nanoseconds with milliseconds as long numbers, nanoseconds will always win. That bit us a while back when we deleted someth

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to potentially move to Brisk because of the simplicity of operations there. Not sure what you mean about the true power of Hadoop. In my mind the true power of Hadoop is the ability to parallelize jobs and send each task to wher

Matt Dennis' presentation on Cassandra best practices on EC2

2011-08-29 Thread Jeremy Hanna
Just wanted to let people know about a great presentation that Matt Dennis did here at the Cassandra Austin meetup. It's on Cassandra best practices on EC2. We found the presentation extremely helpful. http://www.slideshare.net/mattdennis/cassandra-on-ec2

minor compaction of secondary index that no longer exists?

2011-08-28 Thread Jeremy Hanna
I was watching compactionstats via opscenter and saw one of my nodes was minor compacting a secondary index column family. Problem is I removed all of my secondary indexes on Friday and just double checked on the CLI with 'show keyspaces;' and sure enough, no secondary indexes. Is this a bug?

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-28 Thread Jeremy Hanna
in token order, that can lead to serious hotspots. For more on this with ec2, see: http://www.slideshare.net/mattdennis/cassandra-on-ec2/5 where he talks about alternating zones. On Aug 25, 2011, at 10:45 AM, mcasandra wrote: > Thanks for the update > > Jeremy Hanna wrote: >&

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-25 Thread Jeremy Hanna
ext token of a different rack (depending on which it is looking for). So that is why alternating by rack is important. That might be able to be smarter in the future which would be nice - to not have to care and let Cassandra spread the replication around intelligently. On Aug 23, 2011, at 6:02 A

Re: Memory overhead of vector clocks…. how often are they pruned?

2011-08-24 Thread Jeremy Hanna
At the point that book was written (about a year ago it was finalized), vector clocks were planned. In August or September of last year, they were removed. 0.7 was released in January. The ticket for vector clocks is here and you can see the reasoning for not using them at the bottom. https

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
m unreasonable - about a MB. I turned up logging to DEBUG for that class and I get plenty of dropped READ_REPAIR messages, but nothing coming out of DEBUG in the logs to indicate the time taken that I can see. > > Cheers > > - > Aaron Morton > Freelance Cass

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: >> We've been having issues where as soon as we start doing heavy writes (via >> hadoop) recently, it really hammers 4 nodes out of 20. We're using random >> partitioner and we've set the initial tokens for our 20 nodes according to >> the ge

4/20 nodes get disproportionate amount of mutations

2011-08-22 Thread Jeremy Hanna
We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've re

Re: hints system CF getting out of control

2011-08-18 Thread Jeremy Hanna
: > I would assume it's because it thinks some node is down and is > creating hints for it. > > On Thu, Aug 18, 2011 at 6:31 PM, Jeremy Hanna > wrote: >> We're trying to bootstrap some new nodes and it appears when adding a new >> node that there is a lot

hints system CF getting out of control

2011-08-18 Thread Jeremy Hanna
We're trying to bootstrap some new nodes and it appears when adding a new node that there is a lot of logging on hints being flushed and compacted. It's been taking about 75 minutes thus far to bootstrap for only about 10 GB of data. It's ballooned up to over 40 GB on the new node. I do 'ls -

  1   2   3   >