Re: apache cassandra development process and future

2018-07-24 Thread Jeremy Hanna
For full disclosure, I've been in the Apache Cassandra community since 2010 and at DataStax since 2012. So DataStax moved on to focus on things for their customers, effectively putting most development effort into DataStax Enterprise. However, there have been a lot of fixes and improvements

Re: Partition size

2016-09-12 Thread Jeremy Hanna
Generally if you foresee the partitions getting out of control in terms of size, a method often employed is to bucket according to some criteria. For example, if I have a time series use case, I might bucket by month or week. That presumes you can foresee it though. As far as limiting that

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jeremy Hanna
Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more

Re: Pig 0.12.0 and Cassandra 2.0.2

2013-12-13 Thread Jeremy Hanna
I need to update those to be current with the Cassandra source download. You’re right, you would just use what’s in the examples directory now for Pig. You should be able to run the examples, but generally you need to specify the partitioner of the cluster, the host name of a node in the

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed. What is the problem you’re experiencing that you are unable to do this? Jeremy On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
0.11.1. I try to test these options and see if it works- Thanks in advance 2013/11/28 Jeremy Hanna jeremy.hanna1...@gmail.com If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed

Cassandra Summit EU 2013

2013-09-30 Thread Jeremy Hanna
For those in the Europe area, there will be a Cassandra Summit EU 2013 in London in the month of October. On 17 October, there will be the main conference sessions and the 16th and 18th there will be Cassandra workshops. http://www.datastax.com/cassandraeurope2013 The speakers have been

Re: Security?

2013-09-05 Thread Jeremy Hanna
For open-source Cassandra, there is a framework for security (see the security book-thing in the sidebar): http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html For those wanting additional things like auditing and other features, there's DataStax Enterprise:

Re: Security?

2013-09-05 Thread Jeremy Hanna
/security_features On 5 Sep 2013, at 17:51, Hartzman, Leslie leslie.d.hartz...@medtronic.com wrote: Thanks for the info. So open-source Cassandra does not provide for auditing? -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Thursday, September 05, 2013 9

Re: bug in Pig LOAD with cqlStorage and param columns? - cassandra 1.2.8 - pig 0.11.1

2013-08-21 Thread Jeremy Hanna
In order to narrow down the problem, I would start without the request parameters and see if that works. Then I would add the request parameters one at a time to see what breaks things. Often pig is not very helpful with its error messages, so I've had to use this method a lot. On 21 Aug

Re: C* 1.0.6 to 1.1.12: upgradesstables or scrub?

2013-08-13 Thread Jeremy Hanna
If you were using leveled compaction on any column families in 1.0, you'll need to run offline scrub on those column families. On 13 Aug 2013, at 15:38, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Hi all, We are migrating from C* 1.0.6 to 1.1.12 and after reading DataStax

Re: Too many open files and stopped compaction with many pending compaction tasks

2013-06-27 Thread Jeremy Hanna
Are you on SSDs? On 27 Jun 2013, at 14:24, Desimpel, Ignace ignace.desim...@nuance.com wrote: On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the

Re: Cassandra as storage for cache data

2013-06-25 Thread Jeremy Hanna
If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk). To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2. See:

Re: chunk lenght

2013-03-09 Thread Jeremy Hanna
These pages may have some helpful background for you: http://www.datastax.com/docs/1.1/configuration/storage_configuration#compression-options http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression Cheers, Jeremy On Mar 9, 2013, at 9:27 PM, Kanwar Sangha kan...@mavenir.com

Re: Authentication and Authorization with Cassandra 1.2.2.

2013-02-26 Thread Jeremy Hanna
does this help? Links at the bottom show the cql statements to add/modify users: http://www.datastax.com/docs/1.2/security/native_authentication On Feb 26, 2013, at 4:06 PM, C.F.Scheidecker Antunes cf.antu...@gmail.com wrote: Hello all, Cassandra has changed and now has a default

Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Jeremy Hanna
Fwiw - here is are some changes that a friend said should make C*'s Hadoop support work with CDH4 - for ColumnFamilyRecordReader. https://gist.github.com/jeromatron/4967799 On Feb 16, 2013, at 8:23 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Here is the deal.

Re: Start token sorts after end token

2013-02-01 Thread Jeremy Hanna
See https://issues.apache.org/jira/browse/CASSANDRA-5168 - should be fixed in 1.1.10 and 1.2.2. On Jan 30, 2013, at 9:18 AM, Tejas Patil tejas.patil...@gmail.com wrote: While reading data from Cassandra in map-reduce, I am getting InvalidRequestException(why:Start token sorts after end

Re: Hybrid Hadoop Cassandra Cluster

2013-01-18 Thread Jeremy Hanna
Hi Naveen, You can start with http://wiki.apache.org/cassandra/HadoopSupport but there's also a commercial product that you can use, DataStax Enterprise: http://www.datastax.com/docs/datastax_enterprise2.2/solutions/hadoop_index which makes things more streamlined, but it's a commercial

Re: progress of cleanup operations

2012-11-29 Thread Jeremy Hanna
You can do check nodetool compactionstats to see progress for current cleanup operations. It essentially traverses all of your sstables and removes data that the node isn't responsible for. So that's the overall operation, so you would estimate in terms of how long it would take to go through

Re: leveled compaction and tombstoned data

2012-11-08 Thread Jeremy Hanna
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner synfina...@gmail.com wrote: kill performance is relative. Leveled Compaction basically costs 2x

Re: hadoop consistency level

2012-10-18 Thread Jeremy Hanna
On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh ailin...@gmail.com wrote: On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman mkjell...@barracuda.com wrote: Not sure I understand your question (if there is one..) You are more than welcome to do CL ONE and assuming you have hadoop nodes in the

Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
The Dachis Group (where I just came from, now at DataStax) uses pig with cassandra for a lot of things. However, we weren't using the widerow implementation yet since wide row support is new to 1.1.x and we were on 0.7, then 0.8, then 1.0.x. I think since it's new to 1.1's hadoop support, it

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs,

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
It's always had data locality (since hadoop support was added in 0.6). You don't need to specify a partition, you specify the input predicate with ConfigHelper or the cassandra.input.predicate property. On Oct 2, 2012, at 2:26 PM, Hiller, Dean dean.hil...@nrel.gov wrote: So you're saying that

Re: is multithreaded_compaction stable?

2012-09-15 Thread Jeremy Hanna
Generally the main knob for compaction performance is compaction_throughput_in_mb in cassandra.yaml. It defaults to 16. You can use nodetool setcompactionthroughput' to set it on a running server. The next time Cassandra server starts it will use what's in the yaml again. You might try

Re: Differences in row iteration behavior

2012-09-14 Thread Jeremy Hanna
Are there any deletions in your data? The Hadoop support doesn't filter out tombstones, though you may not be filtering them out in your code either. I've used the hadoop support for doing a lot of data validation in the past and as long as you're sure that the code is sound, I'm pretty

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Jeremy Hanna
A couple of guesses: - are you mixing versions of Cassandra? Streaming differences between versions might throw this error. That is, are you bulk loading with one version of Cassandra into a cluster that's a different version? - (shot in the dark) is your cluster overwhelmed for some reason?

Re: Cassandra 1.1.1 on Java 7

2012-09-09 Thread Jeremy Hanna
Starting with 1.6.0_34, you'll need xss set to 180k. It's updated with the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12). https://issues.apache.org/jira/browse/CASSANDRA-4631 See also the comments on https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference to

Re: Index build status

2012-08-20 Thread Jeremy Hanna
For an individual node, you can check the status of building indexes using nodetool compactionstats. And similarly, if you want to speed up building the indexes (and you have the extra IO) you can increase or unthrottle your compaction throughput temporarily - nodetool setcompactionthrough 0

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
you can use the cqlsh help but it will eventually refer you to a cql reference such as this one that says what the options are. Looks like you need just 'default_validation'. http://www.datastax.com/docs/1.0/references/cql/index#cql-column-family-storage-parameters On Jul 6, 2012, at 2:13 PM,

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
Jeremy, but this doesn't work for me. I am using cql3, because I need new features like composite keys. The manual you pointed to is for 2.0. I have suspicion that cql3 does not support dynamic tables at all. Is there a manual for cql3? -Original Message- From: Jeremy Hanna

Re: Exception when truncate

2012-05-17 Thread Jeremy Hanna
when doing a truncate, it has to talk to all of the nodes in the ring to perform the operation. by the error, it looks like one of the nodes was unreachable for some reason. you might do a nodetool ring in the cli do a 'describe cluster;' and see if your ring is okay. So I think the

Re: Matthew Dennis's Cassandra On EC2

2012-05-17 Thread Jeremy Hanna
Sorry - it was at the austin cassandra meetup and we didn't record the presentation. I wonder if this would be a popular topic to have at the upcoming Cassandra SF event which would be recorded... On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote: Hi! I found the slides of the lecture

Re: Source for Cassandra Pig and Hive

2012-05-02 Thread Jeremy Hanna
The hive support is going to be integrated into the main source tree with this ticket: https://issues.apache.org/jira/browse/CASSANDRA-4131 You can go to https://github.com/riptano/hive to find the CassandraStorageHandler right now though. For 1.0.8, the CassandraStorage class for the Pig

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
fwiw - we had a similar problem reading at quorum with 0.8.4 when reading with hadoop. The symptom we see is when reading a column family with hadoop using quorum using 0.8.4, we have lots of minor compactions as a result of heavy writes. When we read at CL.ONE or move to 1.0.8 the problem is

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
I backported this to 0.8.4 and it didn't fix the problem we were seeing (as I outlined in my parallel post) but if it fixes it for you, then beautiful. Just wanted to let you know our experience with similar symptoms. On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote: Fixed in

cassandra_jobs on twitter

2012-04-10 Thread Jeremy Hanna
some time back, I created the account cassandra_jobs on twitter. if you email the user list or better yet just cc cassandra_jobs on twitter, I'll retweet it there so that the information can get out to more people. https://twitter.com/#!/cassandra_jobs cheers, Jeremy

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Jeremy Hanna
you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though. On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote: Hi, I was recently trying Hadoop job + cassandra-all 0.8.10 again and the

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
Check out the troubleshooting section of the hadoop support - we ran into the same thing and tried to update that with some info on how to get around it: http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting On Feb 24, 2012, at 7:20 AM, Patrik Modesto wrote: Hi, I can see some

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
By chance are you in EC2? On Feb 24, 2012, at 8:33 AM, Patrik Modesto wrote: Hi Jeremy, I've seen the page and tried the values but to no help. Here goes tcpdump of one failed TCP connection: 15:06:20.231421 IP 10.0.18.87.9160 10.0.18.87.39396: Flags [P.], seq 137891735:137904068,

Re: General questions about Cassandra

2012-02-17 Thread Jeremy Hanna
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS, over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a simpler solution, there is DataStax's enterprise product

Re: Hive + Cassandra tutorial

2012-01-23 Thread Jeremy Hanna
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source download of cassandra there's a contrib/pig section that has a wordcount example. On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote: Hi, I'm trying to experiment with Hive using Data in Cassandra. Brisk looks

Re: Installing C* on EC2

2012-01-13 Thread Jeremy Hanna
On Jan 12, 2012, at 6:36 PM, Mohit Anchlia wrote: What's the best way to install C*? Any good links? http://www.slideshare.net/mattdennis/cassandra-on-ec2 has some interesting points that aren't immediately obvious - it's mdennis in the cassandra irc channel if you had any questions about

Re: Hadoop + Cassandra

2012-01-06 Thread Jeremy Hanna
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll want to look in the section on cluster configuration. DataStax also has a product that makes it pretty simple to use Hadoop with Cassandra if you don't mind paying for it - http://www.datastax.com/products/enterprise

Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: Hi, could anyone tell me whether this is possible with Cassandra using an appropriately sized EC2 cluster. 100,000 clients writing 50k each to

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's going to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data nodes on your cassandra

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
wrote: Have you tried Brisk? On Dec 23, 2011, at 9:30 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's

Re: Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Jeremy Hanna
One way to get a good bird's eye view of the cluster would be to install DataStax Opscenter - the community edition is free. You can do a lot of checks from a web interface that are based on the jmx hooks that are in Cassandra. We use it and it's helped us a lot. Hope it helps for what

Re: Using Cassandra in Rails App

2011-12-16 Thread Jeremy Hanna
Traditionally there are two places to go. Twitter's ruby client at https://github.com/twitter/cassandra or the newer cql driver at http://code.google.com/a/apache-extras.org/p/cassandra-ruby/. The latter might be nice for green field applications but CQL is still gaining features. Some

Re: Cassandra not suitable?

2011-12-07 Thread Jeremy Hanna
If you're getting lots of timeout exceptions with mapreduce, you might take a look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting We saw that and tweaked a variety of things - all of which are listed there. Ultimately, we also boosted hadoop's tolerance for them as well and

Cassandra_Jobs on Twitter

2011-11-30 Thread Jeremy Hanna
For those interested in Apache Cassandra related jobs - either hiring or in search of - there is now a @Cassandra_Jobs account on Twitter. You can either send posts to that account on twitter or send them to me at this email address with a public link to the job posting and I will tweet them.

Re: User Survey

2011-11-29 Thread Jeremy Hanna
On Nov 29, 2011, at 12:25 PM, Don Smith wrote: cli's show keyspaces command shows way too much information by default. I think by default it should show just one line per keyspace. A -v option could show more info. If you are using 1.x, there is a describe command for specific keyspaces

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you have to mapreduce over the whole column family. That will change probably in 1.1.

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote: Jeremy Hanna jeremy.hanna1234 at gmail.com writes: If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? Unless you are using an ordered partitioner, you can't limit the rows you

Re: secondary indexes streaming building - when there are none

2011-11-13 Thread Jeremy Hanna
https://issues.apache.org/jira/browse/CASSANDRA-3488 On Nov 12, 2011, at 9:52 AM, Jeremy Hanna wrote: It sounds like that's just a message in compactionstats that's a no-op. This is reporting for about an hour that it's building a secondary index on a specific column family. Not sure

Re: secondary indexes streaming building - when there are none

2011-11-12 Thread Jeremy Hanna
, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: We're using 0.8.4 in our cluster and two nodes needed rebuilding. When building and streaming data to the nodes, there were multiple instances of building secondary indexes. We haven't had secondary indexes in that keyspace since like mid

Re: Efficient map reduce over ranges of Cassandra data

2011-11-11 Thread Jeremy Hanna
Nice! Thanks Ed. On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote: Hey all, I know there are several tickets in the pipe that should make it possible do use secondary indexes to run map reduce jobs that do not have to ingest the entire dataset such as:

secondary indexes streaming building - when there are none

2011-11-11 Thread Jeremy Hanna
We're using 0.8.4 in our cluster and two nodes needed rebuilding. When building and streaming data to the nodes, there were multiple instances of building secondary indexes. We haven't had secondary indexes in that keyspace since like mid-August. Is that a bug? Thanks, Jeremy

Re: Massive writes when only reading from Cassandra

2011-10-17 Thread Jeremy Hanna
in our backpack and hopefully clears up where that setting is actually used. I'll update the storage configuration wiki to include that caveat as well. On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote: Thanks for the insights. I may first try disabling hinted handoff for one run of our data

Re: pig_cassandra problem - Incompatible field schema error

2011-10-11 Thread Jeremy Hanna
Just for informational purposes, Pete and I tried to troubleshoot it via twitter. I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1. He's going to dig in to see if there's something else going on. // Cassandra-cli stuff // bin/cassandra-cli -h localhost -p 9160 create keyspace

Hadoop settings if running into blacklisted task trackers with Cassandra

2011-09-24 Thread Jeremy Hanna
I thought I would share something valuable that Jacob Perkins (who recently started with us) shared. We were seeing blacklisted task trackers and occasionally failed jobs. These were almost always based on TimedOutExceptions from Cassandra. We've been fixing underlying reasons for those

Re: Tool for SQL - Cassandra data movement

2011-09-22 Thread Jeremy Hanna
Take a look at http://www.datastax.com/dev/blog/bulk-loading I'm sure there is a way to make it more seamless for what you want to do and it could be built on, but the recent bulk loading additions will provide the best foundation. On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote: We are

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727 On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote: What could you do if the initial_token is 0? On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yeah - I would bootstrap

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
So to move data from node with token 0, the new node needs to have initial token set to 170141183460469231731687303715884105727 ? I would do this route. Another idea: could I move token to 1, and then use token 0 on the new node? nodetool move prior to 0.8 is a very heavy operation.

Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
We are experiencing massive writes to column families when only doing reads from Cassandra. A set of 5 hadoop jobs are reading from Cassandra and then writing out to hdfs. That is the only thing operating on the cluster. We are reading at CL.QUORUM with hadoop and have written with

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 0 CompactionManager n/a29 MessagingServicen/a 0,34 On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote: We are experiencing massive

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
Oh and we're running 0.8.4 and the RF is 3. On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote: In addition, the mutation stage and the read stage are backed up like: Pool NameActive Pending Blocked ReadStage32 773 0

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
doing writes that you're not aware of, I guess you could track that down using wireshark to see where the write messages are coming from On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Oh and we're running 0.8.4 and the RF is 3. On Sep 10, 2011, at 3:49 PM

Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
We just tried to disable hinted handoff by setting: hinted_handoff_enabled: false in all the nodes of our cluster and restarting them. When they come back up, we continue to see things like this: INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line 323) Started hinted

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
/CASSANDRA-3176 On Sep 10, 2011, at 5:50 PM, Jeremy Hanna wrote: INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line 323) Started hinted handoff for endpoint /10.1.2.3 INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line 379) Finished

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
Turned out that wasn't a problem - I put some notes on the ticket. On Sep 10, 2011, at 6:22 PM, Jeremy Hanna wrote: I tried looking through the source to see if the log statements would happen regardless but it doesn't look like it. Also I looked at one of the nodes via jmx and checked out

Re: Anybody out there using 0.8 in production

2011-09-08 Thread Jeremy Hanna
We run 0.8 in production and it's been working well for us. There are some new settings that we had to tune for - for example, the default concurrent compaction is the number of cores. We had to tune that down because we also run hadoop jobs on our nodes. On Sep 8, 2011, at 4:44 PM, Anand

Re: Any tentative data for 0.8.5 release?

2011-09-07 Thread Jeremy Hanna
The voting started on Monday and is a 72 hour vote. So if there aren't any problems that people find, it should be released sometime Thursday (7 September). On Sep 7, 2011, at 10:41 AM, Roshan Dawrani wrote: Hi, Quick check: is there a tentative date for release of Cassandra 0.8.5?

Re: cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-04 Thread Jeremy Hanna
Thanks William - so you were able to get everything running correctly, right? FWIW, we're in the process of upgrading to 0.8.4 and found that all we needed was that first link you mentioned - the VersionedValue modification. It's running fine on our staging cluster and we're in the process of

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2 Also, people generally do raid0 on the ephemerals. EBS is a bad fit for cassandra - see the presentation above. However, that means you'll need to have a backup strategy, which is also mentioned in the presentation. Also

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
snitch. The servers are all in a VPC, the only thing I did was configure the seed IP so all the nodes can see each other. Ben On Sat, Sep 3, 2011 at 11:13 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2 Also

Re: Cassandra prod environment

2011-09-02 Thread Jeremy Hanna
We moved off of ubuntu because of kernel issues in the AMIs we found in 10.04 and 10.10 in ec2. So we're now on debian squeeze with ext4. It's been great for us. One thing that bit us is we'd been using property file snitch and the availability zones as racks and had an equal number of nodes

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to potentially move to Brisk because of the simplicity of operations there. Not sure what you mean about the true power of Hadoop. In my mind the true power of Hadoop is the ability to parallelize jobs and send each task to

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
I would not use nano time with cassandra. Internally and throughout the clients, milliseconds is pretty much a standard. You can get into trouble because when comparing nanoseconds with milliseconds as long numbers, nanoseconds will always win. That bit us a while back when we deleted

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
edlinuxg...@gmail.com wrote: On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I would not use nano time with cassandra. Internally and throughout the clients, milliseconds is pretty much a standard. You can get into trouble because when comparing nanoseconds

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
. Are there any other resource that you can point me to? There seems to be a lack of samples on this subject. On Tue, Aug 30, 2011 at 10:56 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to potentially move to Brisk

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
the current time in nano seconds though? On Tue, Aug 30, 2011 at 2:39 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Yes - the reason why internally Cassandra uses milliseconds * 1000 is because System.nanoTime javadoc says This method can only be used to measure elapsed time

Matt Dennis' presentation on Cassandra best practices on EC2

2011-08-29 Thread Jeremy Hanna
Just wanted to let people know about a great presentation that Matt Dennis did here at the Cassandra Austin meetup. It's on Cassandra best practices on EC2. We found the presentation extremely helpful. http://www.slideshare.net/mattdennis/cassandra-on-ec2

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-28 Thread Jeremy Hanna
, that can lead to serious hotspots. For more on this with ec2, see: http://www.slideshare.net/mattdennis/cassandra-on-ec2/5 where he talks about alternating zones. On Aug 25, 2011, at 10:45 AM, mcasandra wrote: Thanks for the update Jeremy Hanna wrote: It appears though that when

minor compaction of secondary index that no longer exists?

2011-08-28 Thread Jeremy Hanna
I was watching compactionstats via opscenter and saw one of my nodes was minor compacting a secondary index column family. Problem is I removed all of my secondary indexes on Friday and just double checked on the CLI with 'show keyspaces;' and sure enough, no secondary indexes. Is this a bug?

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-25 Thread Jeremy Hanna
the replication around intelligently. On Aug 23, 2011, at 6:02 AM, Jeremy Hanna wrote: On Aug 23, 2011, at 3:43 AM, aaron morton wrote: Dropped messages in ReadRepair is odd. Are you also dropping mutations ? There are two tasks performed on the ReadRepair stage. The digests

Re: Memory overhead of vector clocks…. how often are they pruned?

2011-08-24 Thread Jeremy Hanna
At the point that book was written (about a year ago it was finalized), vector clocks were planned. In August or September of last year, they were removed. 0.7 was released in January. The ticket for vector clocks is here and you can see the reasoning for not using them at the bottom.

4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
messages, but nothing coming out of DEBUG in the logs to indicate the time taken that I can see. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote: On Aug 23, 2011, at 2:25 AM

Re: hints system CF getting out of control

2011-08-19 Thread Jeremy Hanna
: I would assume it's because it thinks some node is down and is creating hints for it. On Thu, Aug 18, 2011 at 6:31 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: We're trying to bootstrap some new nodes and it appears when adding a new node that there is a lot of logging on hints being

hints system CF getting out of control

2011-08-18 Thread Jeremy Hanna
We're trying to bootstrap some new nodes and it appears when adding a new node that there is a lot of logging on hints being flushed and compacted. It's been taking about 75 minutes thus far to bootstrap for only about 10 GB of data. It's ballooned up to over 40 GB on the new node. I do 'ls

Re: What causes dropped messages?

2011-08-16 Thread Jeremy Hanna
http://wiki.apache.org/cassandra/FAQ#dropped_messages As to what's causing them - look in the logs and it will do the equivalent of a nodetool tpstats right after the dropped messages messages. That should give you a clue as to why there are dropped messages - which thread pools are backed up

Re: Client traffic encryption best practices....

2011-08-12 Thread Jeremy Hanna
Yes - that ticket was done by Nirmal Ranganathan for the intention of getting support in Cassandra. That's just for a java client though. In the future, I wonder if the CQL driver level is the right place for client encryption. On Aug 11, 2011, at 11:26 PM, Vijay wrote:

Re: Client traffic encryption best practices....

2011-08-12 Thread Jeremy Hanna
/browse/THRIFT-151 C# (patch attached but no progress in a while): https://issues.apache.org/jira/browse/THRIFT-181 PHP (patch attached but no progress in a while): https://issues.apache.org/jira/browse/THRIFT-948 On Aug 12, 2011, at 9:39 AM, Jeremy Hanna wrote: Yes - that ticket was done by Nirmal

Re: cassandra 0.8.2 build failure (missing 2 artifacts)

2011-08-05 Thread Jeremy Hanna
That is something we have to update, thanks for mentioning that. We should just be depending on apache hadoop components now that we are no longer supporting hadoop output streaming. On Aug 5, 2011, at 10:27 AM, Dean Hiller wrote: oh, cloudera repo is down like a previous poster just

Re: Cloudera repo down?

2011-08-05 Thread Jeremy Hanna
It won't be required in the future: https://issues.apache.org/jira/browse/CASSANDRA-2998 On Aug 5, 2011, at 1:34 PM, Martin Lansler wrote: It solved itself as the cloudera repo is up again now... -Martin On Fri, Aug 5, 2011 at 12:06 PM, Martin Lansler martin.lans...@gmail.com wrote: Hi,

Re: Install Cassandra on EC2

2011-08-03 Thread Jeremy Hanna
Some quick thoughts that might be helpful: - use ephemeral instances and RAID0 over the local volumes for both cassandra's data as well as the log directory. The log directory because if you crash due to heap size, the heap dump will be stored in the log directory. you don't want that to go

Re: Brisk and Hadoop question

2011-07-31 Thread Jeremy Hanna
Check out http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig and that whole page to see an intro to configuring your cluster. Brisk extends these basic ideas. On Jul 31, 2011, at 12:31 PM, mcasandra wrote: Is it possible to add brisk nodes for analytics to already existing real

  1   2   3   >