Also I just wanted to point out that there is a cassandra-drivers slack room on
the ASF slack where folks that work on the different drivers interact.
(added to both user and dev list thread)
> On Aug 14, 2024, at 5:59 PM, Dinesh Joshi wrote:
>
> Hi Vincent,
>
> This is the Cassandra user's m
For full disclosure, I've been in the Apache Cassandra community since 2010 and
at DataStax since 2012.
So DataStax moved on to focus on things for their customers, effectively
putting most development effort into DataStax Enterprise. However, there have
been a lot of fixes and improvements co
Generally if you foresee the partitions getting out of control in terms of
size, a method often employed is to bucket according to some criteria. For
example, if I have a time series use case, I might bucket by month or week.
That presumes you can foresee it though. As far as limiting that ca
Of the 16 active committers, 8 are not at DataStax. See
http://wiki.apache.org/cassandra/Committers. That said, active involvement
varies and there are other contributors inside DataStax and in the community.
You can look at the dev mailing list as well to look for involvement in more
detail
I need to update those to be current with the Cassandra source download.
You’re right, you would just use what’s in the examples directory now for Pig.
You should be able to run the examples, but generally you need to specify the
partitioner of the cluster, the host name of a node in the clust
With RHEL, there is a problem with snappy 1.0.5. You’d need to use 1.0.4.1
which works fine but you need to download it separately and put it in your lib
directory. You can find the 1.0.4.1 file from
https://github.com/apache/cassandra/tree/cassandra-1.1.12/lib
Jeremy
On 29 Nov 2013, at 10:1
rt#Oozie
>
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
>
> I try to test these options and see if it works-
>
> Thanks in advance
>
>
>
>
>
>
>
>
>
>
>
> 2013/11/28 Jeremy Hanna
>
>> If I rememb
If I remember correctly when I configured pig, cassandra, and oozie to work
together, I just used vanilla pig but gave it the jars it needed.
What is the problem you’re experiencing that you are unable to do this?
Jeremy
On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera
wrote:
> hi all;
For those in the Europe area, there will be a Cassandra Summit EU 2013 in
London in the month of October. On 17 October, there will be the main
conference sessions and the 16th and 18th there will be Cassandra workshops.
http://www.datastax.com/cassandraeurope2013
The speakers have been announ
1/security/security_features
On 5 Sep 2013, at 17:51, "Hartzman, Leslie"
wrote:
> Thanks for the info.
>
> So open-source Cassandra does not provide for auditing?
>
> -Original Message-
> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
> Sent: Thursd
For open-source Cassandra, there is a framework for security (see the security
book-thing in the sidebar):
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html
For those wanting additional things like auditing and other features, there's
DataStax Enterprise:
http://www.datasta
In order to narrow down the problem, I would start without the request
parameters and see if that works. Then I would add the request parameters one
at a time to see what breaks things. Often pig is not very helpful with its
error messages, so I've had to use this method a lot.
On 21 Aug 2013
If you were using leveled compaction on any column families in 1.0, you'll need
to run offline scrub on those column families.
On 13 Aug 2013, at 15:38, Romain HARDOUIN wrote:
> Hi all,
>
> We are migrating from C* 1.0.6 to 1.1.12 and after reading DataStax
> documentation (http://www.datast
The CHANGES and NEWS links pointed to the 1.2.8-tentative.
The 1.2.8 links are:
CHANGES.txt:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/1.2.8
NEWS.txt:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags
Are you on SSDs?
On 27 Jun 2013, at 14:24, "Desimpel, Ignace" wrote:
> On a test with 3 cassandra servers version 1.2.5 with replication factor 1
> and leveled compaction, I did a store last night and I did not see any
> problem with Cassandra. On all 3 machine the compaction is stopped alread
If you have rapidly expiring data, then tombstones are probably filling your
disk and your heap (depending on how you order the data on disk). To check to
see if your queries are affected by tombstones, you might try using the query
tracing that's built-in to 1.2.
See:
http://www.datastax.com/d
These pages may have some helpful background for you:
http://www.datastax.com/docs/1.1/configuration/storage_configuration#compression-options
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
Cheers,
Jeremy
On Mar 9, 2013, at 9:27 PM, Kanwar Sangha wrote:
> Hi – Can some
does this help? Links at the bottom show the cql statements to add/modify
users:
http://www.datastax.com/docs/1.2/security/native_authentication
On Feb 26, 2013, at 4:06 PM, C.F.Scheidecker Antunes
wrote:
> Hello all,
>
> Cassandra has changed and now has a default authentication and authori
Fwiw - here is are some changes that a friend said should make C*'s Hadoop
support work with CDH4 - for ColumnFamilyRecordReader.
https://gist.github.com/jeromatron/4967799
On Feb 16, 2013, at 8:23 AM, Edward Capriolo wrote:
> Here is the deal.
>
> http://wiki.apache.org/hadoop/Defining%20Hado
See https://issues.apache.org/jira/browse/CASSANDRA-5168 - should be fixed in
1.1.10 and 1.2.2.
On Jan 30, 2013, at 9:18 AM, Tejas Patil wrote:
> While reading data from Cassandra in map-reduce, I am getting
> "InvalidRequestException(why:Start token sorts after end token)"
>
> Below is the c
Hi Naveen,
You can start with http://wiki.apache.org/cassandra/HadoopSupport but there's
also a commercial product that you can use, DataStax Enterprise:
http://www.datastax.com/docs/datastax_enterprise2.2/solutions/hadoop_index
which makes things more streamlined, but it's a commercial product
You can do check nodetool compactionstats to see progress for current cleanup
operations. It essentially traverses all of your sstables and removes data
that the node isn't responsible for. So that's the overall operation, so you
would estimate in terms of how long it would take to go through
LCS works well in specific circumstances, this blog post gives some good
considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
On Nov 8, 2012, at 1:33 PM, Aaron Turner wrote:
> "kill performance" is relative. Leveled Compaction basically costs 2x disk
> IO. Look at
On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote:
> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman
> wrote:
>> Not sure I understand your question (if there is one..)
>>
>> You are more than welcome to do CL ONE and assuming you have hadoop nodes
>> in the right places on your ring things
t; On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna
> wrote:
> The Dachis Group (where I just came from, now at DataStax) uses pig with
> cassandra for a lot of things. However, we weren't using the widerow
> implementation yet since wide row support is new to 1.1.x and we were on 0
The Dachis Group (where I just came from, now at DataStax) uses pig with
cassandra for a lot of things. However, we weren't using the widerow
implementation yet since wide row support is new to 1.1.x and we were on 0.7,
then 0.8, then 1.0.x.
I think since it's new to 1.1's hadoop support, it s
It's always had data locality (since hadoop support was added in 0.6).
You don't need to specify a partition, you specify the input predicate with
ConfigHelper or the cassandra.input.predicate property.
On Oct 2, 2012, at 2:26 PM, "Hiller, Dean" wrote:
> So you're saying that you can access th
Another option that may or may not work for you is the support in Cassandra
1.1+ to use a secondary index as an input to your mapreduce job. What you
might do is add a field to the column family that represents which virtual
column family that it is part of. Then when doing mapreduce jobs, you
Generally the main knob for compaction performance is
compaction_throughput_in_mb in cassandra.yaml. It defaults to 16. You can use
nodetool setcompactionthroughput' to set it on a running server. The next time
Cassandra server starts it will use what's in the yaml again. You might try
usin
A couple of guesses:
- are you mixing versions of Cassandra? Streaming differences between versions
might throw this error. That is, are you bulk loading with one version of
Cassandra into a cluster that's a different version?
- (shot in the dark) is your cluster overwhelmed for some reason?
I
Are there any deletions in your data? The Hadoop support doesn't filter out
tombstones, though you may not be filtering them out in your code either. I've
used the hadoop support for doing a lot of data validation in the past and as
long as you're sure that the code is sound, I'm pretty confid
Starting with 1.6.0_34, you'll need xss set to 180k. It's updated with the
forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12).
https://issues.apache.org/jira/browse/CASSANDRA-4631
See also the comments on https://issues.apache.org/jira/browse/CASSANDRA-4602
for the reference to wh
For an individual node, you can check the status of building indexes using
nodetool compactionstats. And similarly, if you want to speed up building the
indexes (and you have the extra IO) you can increase or unthrottle your
compaction throughput temporarily - nodetool setcompactionthrough 0 to
rote:
> Thanks Jeremy, but this doesn't work for me. I am using cql3, because I need
> new features like composite keys. The manual you pointed to is for 2.0.
> I have suspicion that cql3 does not support dynamic tables at all. Is there a
> manual for cql3?
>
> -----Orig
you can use the cqlsh help but it will eventually refer you to a cql reference
such as this one that says what the options are. Looks like you need just
'default_validation'.
http://www.datastax.com/docs/1.0/references/cql/index#cql-column-family-storage-parameters
On Jul 6, 2012, at 2:13 PM,
Sorry - it was at the austin cassandra meetup and we didn't record the
presentation. I wonder if this would be a popular topic to have at the
upcoming Cassandra SF event which would be recorded...
On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote:
> Hi!
>
> I found the slides of the lecture
when doing a truncate, it has to talk to all of the nodes in the ring to
perform the operation. by the error, it looks like one of the nodes was
unreachable for some reason. you might do a nodetool ring in the cli do a
'describe cluster;' and see if your ring is okay.
So I think the operation
The hive support is going to be integrated into the main source tree with this
ticket:
https://issues.apache.org/jira/browse/CASSANDRA-4131
You can go to https://github.com/riptano/hive to find the
CassandraStorageHandler right now though.
For 1.0.8, the CassandraStorage class for the Pig suppor
I backported this to 0.8.4 and it didn't fix the problem we were seeing (as I
outlined in my parallel post) but if it fixes it for you, then beautiful. Just
wanted to let you know our experience with similar symptoms.
On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote:
> Fixed in https://issue
fwiw - we had a similar problem reading at quorum with 0.8.4 when reading with
hadoop. The symptom we see is when reading a column family with hadoop using
quorum using 0.8.4, we have lots of minor compactions as a result of heavy
writes. When we read at CL.ONE or move to 1.0.8 the problem is
some time back, I created the account cassandra_jobs on twitter. if you email
the user list or better yet just cc cassandra_jobs on twitter, I'll retweet it
there so that the information can get out to more people.
https://twitter.com/#!/cassandra_jobs
cheers,
Jeremy
you may be running into this -
https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
really affects the execution of the job itself though.
On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
> Hi,
>
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Ti
I haven't used that in particular, but it's pretty trivial to do that with Pig
and I would imagine it would just do the right thing under the covers. It's a
simple join with Pig. We use pygmalion to get data from the Cassandra bag. A
simple example would be:
DEFINE FromCassandraBag org.pygmal
By chance are you in EC2?
On Feb 24, 2012, at 8:33 AM, Patrik Modesto wrote:
> Hi Jeremy,
>
> I've seen the page and tried the values but to no help.
>
> Here goes tcpdump of one failed TCP connection:
>
> 15:06:20.231421 IP 10.0.18.87.9160 > 10.0.18.87.39396: Flags [P.], seq
> 137891735:13790
Check out the troubleshooting section of the hadoop support - we ran into the
same thing and tried to update that with some info on how to get around it:
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
On Feb 24, 2012, at 7:20 AM, Patrik Modesto wrote:
> Hi,
>
> I can see some st
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS,
over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined
here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a
simpler solution, there is DataStax's enterprise product whic
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source
download of cassandra there's a contrib/pig section that has a wordcount
example.
On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote:
> Hi,
>
> I'm trying to experiment with Hive using Data in Cassandra. Brisk look
On Jan 12, 2012, at 6:36 PM, Mohit Anchlia wrote:
> What's the best way to install C*? Any good links?
http://www.slideshare.net/mattdennis/cassandra-on-ec2 has some interesting
points that aren't immediately obvious - it's mdennis in the cassandra irc
channel if you had any questions about th
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll
want to look in the section on cluster configuration. DataStax also has a
product that makes it pretty simple to use Hadoop with Cassandra if you don't
mind paying for it - http://www.datastax.com/products/enterprise
This might be helpful:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
On Dec 30, 2011, at 1:59 PM, Dom Wong wrote:
> Hi, could anyone tell me whether this is possible with Cassandra using an
> appropriately sized EC2 cluster.
>
> 100,000 clients writing 50k each
to achieve this.
>
> -R
>
> On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna
> wrote:
> We do this all the time. Take a look at
> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
> mapreduce or pig to get data out of cassandra. If it
One way to get a good bird's eye view of the cluster would be to install
DataStax Opscenter - the community edition is free. You can do a lot of checks
from a web interface that are based on the jmx hooks that are in Cassandra. We
use it and it's helped us a lot. Hope it helps for what you're
33 AM, Praveen Sadhu wrote:
> Have you tried Brisk?
>
>
>
> On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna"
> wrote:
>
>> We do this all the time. Take a look at
>> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can
>> u
We do this all the time. Take a look at
http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
mapreduce or pig to get data out of cassandra. If it's going to a separate
hadoop cluster, I don't think you'd need to co-locate task trackers or data
nodes on your cassandra
Traditionally there are two places to go. Twitter's ruby client at
https://github.com/twitter/cassandra or the newer cql driver at
http://code.google.com/a/apache-extras.org/p/cassandra-ruby/. The latter might
be nice for green field applications but CQL is still gaining features. Some
peopl
If you're getting lots of timeout exceptions with mapreduce, you might take a
look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
We saw that and tweaked a variety of things - all of which are listed there.
Ultimately, we also boosted hadoop's tolerance for them as well and it
For those interested in Apache Cassandra related jobs - either hiring or in
search of - there is now a @Cassandra_Jobs account on Twitter. You can
either send posts to that account on twitter or send them to me at this
email address with a public link to the job posting and I will tweet them.
Che
On Nov 29, 2011, at 12:25 PM, Don Smith wrote:
> cli's "show keyspaces" command shows way too much information by default.
>
> I think by default it should show just one line per keyspace. A "-v" option
> could show more info.
If you are using 1.x, there is a describe command for specific ke
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote:
> Jeremy Hanna gmail.com> writes:
>
>>
>> If you are only interested in loading one row, why do you need to use Pig?
>> Is
> it an extremely wide row?
>>
>> Unless you are using an ordered
If you are only interested in loading one row, why do you need to use Pig? Is
it an extremely wide row?
Unless you are using an ordered partitioner, you can't limit the rows you
mapreduce over currently - you have to mapreduce over the whole column family.
That will change probably in 1.1. H
https://issues.apache.org/jira/browse/CASSANDRA-3488
On Nov 12, 2011, at 9:52 AM, Jeremy Hanna wrote:
> It sounds like that's just a message in compactionstats that's a no-op. This
> is reporting for about an hour that it's building a secondary index on a
> specific
> On Fri, Nov 11, 2011 at 9:10 PM, Jeremy Hanna
> wrote:
>> We're using 0.8.4 in our cluster and two nodes needed rebuilding. When
>> building and streaming data to the nodes, there were multiple instances of
>> building secondary indexes. We haven't had seco
We're using 0.8.4 in our cluster and two nodes needed rebuilding. When
building and streaming data to the nodes, there were multiple instances of
building secondary indexes. We haven't had secondary indexes in that keyspace
since like mid-August. Is that a bug?
Thanks,
Jeremy
Nice! Thanks Ed.
On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote:
> Hey all,
>
> I know there are several tickets in the pipe that should make it possible do
> use secondary indexes to run map reduce jobs that do not have to ingest the
> entire dataset such as:
>
> https://issues.apache.
cable rock in our backpack and hopefully clears up where that setting is
actually used. I'll update the storage configuration wiki to include that
caveat as well.
On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote:
> Thanks for the insights. I may first try disabling hinted handoff for
Just for informational purposes, Pete and I tried to troubleshoot it via
twitter. I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1.
He's going to dig in to see if there's something else going on.
// Cassandra-cli stuff
// bin/cassandra-cli -h localhost -p 9160
create keyspace
I thought I would share something valuable that Jacob Perkins (who recently
started with us) shared. We were seeing blacklisted task trackers and
occasionally failed jobs. These were almost always based on TimedOutExceptions
from Cassandra. We've been fixing underlying reasons for those excep
Take a look at http://www.datastax.com/dev/blog/bulk-loading
I'm sure there is a way to make it more seamless for what you want to do and it
could be built on, but the recent bulk loading additions will provide the best
foundation.
On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote:
> We are tryi
> So to move data from node with token 0, the new node needs to have
> initial token set to 170141183460469231731687303715884105727 ?
I would do this route.
> Another idea: could I move token to 1, and then use token 0 on the new node?
nodetool move prior to 0.8 is a very heavy operation.
I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727
On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote:
> What could you do if the initial_token is 0?
>
> On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna
> wrote:
>> Yeah - I would bootstrap at
Yeah - I would bootstrap at initial_token of -1 the current one. Then once
that has bootstrapped, then decommission the old one. Avoid trying to use
removetoken on anything before 0.8.3. Use decommission if you can if you're
dealing with a live node.
On Sep 12, 2011, at 10:42 AM, Kyle Gibson
Turned out that wasn't a problem - I put some notes on the ticket.
On Sep 10, 2011, at 6:22 PM, Jeremy Hanna wrote:
> I tried looking through the source to see if the log statements would happen
> regardless but it doesn't look like it. Also I looked at one of the nodes
>
rowse/CASSANDRA-3176
On Sep 10, 2011, at 5:50 PM, Jeremy Hanna wrote:
> INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java
> (line 323) Started hinted handoff for endpoint /10.1.2.3
> INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java
> (l
We just tried to disable hinted handoff by setting:
hinted_handoff_enabled: false
in all the nodes of our cluster and restarting them. When they come back up,
we continue to see things like this:
INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line
323) Started hinted h
t; 2) You have something doing writes that you're not aware of, I guess
> you could track that down using wireshark to see where the write
> messages are coming from
>
> On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna
> wrote:
> > Oh and we're running 0.8.4 and the RF
Oh and we're running 0.8.4 and the RF is 3.
On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote:
> In addition, the mutation stage and the read stage are backed up like:
>
> Pool NameActive Pending Blocked
> ReadStage32
0 0
InternalResponseStage 0 0 0
HintedHandoff 0 0 0
CompactionManager n/a29
MessagingServicen/a 0,34
On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote:
> We are experiencing mass
We are experiencing massive writes to column families when only doing reads
from Cassandra. A set of 5 hadoop jobs are reading from Cassandra and then
writing out to hdfs. That is the only thing operating on the cluster. We are
reading at CL.QUORUM with hadoop and have written with CL.QUORUM.
We run 0.8 in production and it's been working well for us. There are some new
settings that we had to tune for - for example, the default concurrent
compaction is the number of cores. We had to tune that down because we also
run hadoop jobs on our nodes.
On Sep 8, 2011, at 4:44 PM, Anand Som
The voting started on Monday and is a 72 hour vote. So if there aren't any
problems that people find, it should be released sometime Thursday (7
September).
On Sep 7, 2011, at 10:41 AM, Roshan Dawrani wrote:
> Hi,
>
> Quick check: is there a tentative date for release of Cassandra 0.8.5?
>
>
Thanks William - so you were able to get everything running correctly, right?
FWIW, we're in the process of upgrading to 0.8.4 and found that all we needed
was that first link you mentioned - the VersionedValue modification. It's
running fine on our staging cluster and we're in the process of m
I dont remember setting up snitch.
>
> The servers are all in a VPC, the only thing I did was configure the seed IP
> so all the nodes can see each other.
>
> Ben
>
> On Sat, Sep 3, 2011 at 11:13 PM, Jeremy Hanna
> wrote:
> I would look at http://www.slideshar
I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2
Also, people generally do raid0 on the ephemerals.
EBS is a bad fit for cassandra - see the presentation above. However, that
means you'll need to have a backup strategy, which is also mentioned in the
presentation.
Also ar
We moved off of ubuntu because of kernel issues in the AMIs we found in 10.04
and 10.10 in ec2. So we're now on debian squeeze with ext4. It's been great
for us.
One thing that bit us is we'd been using property file snitch and the
availability zones as racks and had an equal number of nodes
rive the current time in
> nano seconds though?
>
> On Tue, Aug 30, 2011 at 2:39 PM, Jeremy Hanna
> wrote:
>> Yes - the reason why internally Cassandra uses milliseconds * 1000 is
>> because System.nanoTime javadoc says "This method can only be used to
>> mea
/repos/asf/cassandra/trunk/contrib/pig. Are there any
> other resource that you can point me to? There seems to be a lack of samples
> on this subject.
>
> On Tue, Aug 30, 2011 at 10:56 PM, Jeremy Hanna
> wrote:
> FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to
0.
>
> Anyone sees problem with this approach?
>
> On Tue, Aug 30, 2011 at 2:20 PM, Edward Capriolo
> wrote:
>>
>>
>> On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna
>> wrote:
>>>
>>> I would not use nano time with cassandra. Internall
Ed- you're right - milliseconds * 1000. That's right. The other stuff about
nano time still stands, but you're right - microseconds. Sorry about that.
On Aug 30, 2011, at 1:20 PM, Edward Capriolo wrote:
>
>
> On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna
> wr
I would not use nano time with cassandra. Internally and throughout the
clients, milliseconds is pretty much a standard. You can get into trouble
because when comparing nanoseconds with milliseconds as long numbers,
nanoseconds will always win. That bit us a while back when we deleted
someth
FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to
potentially move to Brisk because of the simplicity of operations there.
Not sure what you mean about the true power of Hadoop. In my mind the true
power of Hadoop is the ability to parallelize jobs and send each task to wher
Just wanted to let people know about a great presentation that Matt Dennis did
here at the Cassandra Austin meetup. It's on Cassandra best practices on EC2.
We found the presentation extremely helpful.
http://www.slideshare.net/mattdennis/cassandra-on-ec2
I was watching compactionstats via opscenter and saw one of my nodes was minor
compacting a secondary index column family. Problem is I removed all of my
secondary indexes on Friday and just double checked on the CLI with 'show
keyspaces;' and sure enough, no secondary indexes. Is this a bug?
in token order, that can lead to
serious hotspots. For more on this with ec2, see:
http://www.slideshare.net/mattdennis/cassandra-on-ec2/5 where he talks about
alternating zones.
On Aug 25, 2011, at 10:45 AM, mcasandra wrote:
> Thanks for the update
>
> Jeremy Hanna wrote:
>&
ext
token of a different rack (depending on which it is looking for). So that is
why alternating by rack is important. That might be able to be smarter in the
future which would be nice - to not have to care and let Cassandra spread the
replication around intelligently.
On Aug 23, 2011, at 6:02 A
At the point that book was written (about a year ago it was finalized), vector
clocks were planned. In August or September of last year, they were removed.
0.7 was released in January. The ticket for vector clocks is here and you can
see the reasoning for not using them at the bottom.
https
m unreasonable - about a MB. I turned up
logging to DEBUG for that class and I get plenty of dropped READ_REPAIR
messages, but nothing coming out of DEBUG in the logs to indicate the time
taken that I can see.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cass
On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote:
>> We've been having issues where as soon as we start doing heavy writes (via
>> hadoop) recently, it really hammers 4 nodes out of 20. We're using random
>> partitioner and we've set the initial tokens for our 20 nodes according to
>> the ge
We've been having issues where as soon as we start doing heavy writes (via
hadoop) recently, it really hammers 4 nodes out of 20. We're using random
partitioner and we've set the initial tokens for our 20 nodes according to the
general spacing formula, except for a few token offsets as we've re
:
> I would assume it's because it thinks some node is down and is
> creating hints for it.
>
> On Thu, Aug 18, 2011 at 6:31 PM, Jeremy Hanna
> wrote:
>> We're trying to bootstrap some new nodes and it appears when adding a new
>> node that there is a lot
We're trying to bootstrap some new nodes and it appears when adding a new node
that there is a lot of logging on hints being flushed and compacted. It's been
taking about 75 minutes thus far to bootstrap for only about 10 GB of data.
It's ballooned up to over 40 GB on the new node. I do 'ls -
1 - 100 of 246 matches
Mail list logo