Re: adding node without bootstrap

2011-09-24 Thread Radim Kolar


If you join a node with auto_bootstrap=false you had better be working 
at quorum or higher to avoid stale/not found reads. You should then 
repair the node right away to get all the missing data back on the 
node. This is not suggested. It is best to leave auto_boostrap=true 
and let Cassandra handle this on the front end.
This do not works. I joined ring with node without bootstrap and result 
is like this:


216.17.99.40datacenter1 rack1   Up Normal  1.17 GB 
99.64%  83030609119105147711596238577753588267
64.6.104.18 datacenter1 rack1   Up Normal  43.15 KB
0.36%   83648735508289295779178617154261005054


Well, this was expected. But running repair on both nodes didnt do anything:

 INFO [GossipStage:1] 2011-09-25 08:18:34,287 Gossiper.java (line 715) 
Node /216.17.99.40 is now part of the cluster
 INFO [GossipStage:1] 2011-09-25 08:18:34,287 Gossiper.java (line 681) 
InetAddress /216.17.99.40 is now UP
 INFO [AntiEntropySessions:1] 2011-09-25 08:22:16,066 
AntiEntropyService.java (line 648) No neighbors to repair with for test 
on 
(83030609119105147711596238577753588267,83648735508289295779178617154261005054]: 
manual-repair-04dd27f0-401b-4452-b0eb-853beeda197b completed.


Data are not moved to new node. Maybe tokens are not too random. I 
deleted new node and retried:


64.6.104.18 datacenter1 rack1   Up Normal  45.52 KB
56.94%  9762979552315026283322466206354139578
216.17.99.40datacenter1 rack1   Up Normal  1.17 GB 
43.06%  83030609119105147711596238577753588267


and still nothing, while running repair on both nodes.

 INFO [AntiEntropySessions:1] 2011-09-25 08:29:13,447 
AntiEntropyService.java (line 648) No neighbors to repair with for test 
on 
(83030609119105147711596238577753588267,9762979552315026283322466206354139578]: 
manual-repair-87bfcc67-2b99-4285-8571-e5bd168ef5e0 completed.


Can you try this too? i cant get scenario: make 1 node - add data, add 
second node without bootstrap then repair on both work.


Seed nodes in cassandra.yaml can not be hostnames

2011-09-24 Thread Radim Kolar
I just discovered that using host names for seed nodes in cassandra.yaml 
do not work. This is done on purpose?


messages stopped for 3 minutes?

2011-09-24 Thread Yang
I constantly see TimedOutException , then followed by
UnavailableException in my logs,
so I added some extra debugging to Gossiper. notifyFailureDetector()



void notifyFailureDetector(InetAddress endpoint, EndpointState
remoteEndpointState)
{
IFailureDetector fd = FailureDetector.instance;
EndpointState localEndpointState = endpointStateMap.get(endpoint);
logger.debug("notify failure detector");
/*
 * If the local endpoint state exists then report to the FD only
 * if the versions workout.
*/
if ( localEndpointState != null )
{
logger.debug("notify failure detector, endpoint");
int localGeneration =
localEndpointState.getHeartBeatState().getGeneration();
int remoteGeneration =
remoteEndpointState.getHeartBeatState().getGeneration();
if ( remoteGeneration > localGeneration )
{
localEndpointState.updateTimestamp();
logger.debug("notify failure detector --- report 1");
fd.report(endpoint);
return;
}




then I found that this method stopped being called for a period of 3
minutes, so of course the detector considers the other side to be
dead.

but since these 2 boxes are in the same EC2 region, same security
group, there is no reason there is a network issue that long. so I
ran a background job that just does

echo | nc $the_other_box 7000   in a loop

and this always works fine, without failing to contact the 7000 port.


so somehow the messages were not delivered or received, how could I debug this?
(extra logging attached)

Thanks
Yang


ss
Description: Binary data


CMS GC initial-mark taking 6 seconds , bad?

2011-09-24 Thread Yang
I see the following in my GC log

1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
real=6.07 secs]

so there is a stop-the-world period of 6 seconds. does this sound bad
? or 6 seconds is OK  and we should expect the built-in
fault-tolerance of Cassandra handle this?

Thanks
Yang


Re: Moving to a new cluster

2011-09-24 Thread Yan Chunlu
thanks!  is that similar problem described in this thread?


http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-repair-caused-high-disk-space-usage-td6695542.html

On Sun, Sep 25, 2011 at 11:33 AM, aaron morton wrote:

> It can result in a lot of data on the node you run repair on. Where a lot
> means perhaps 2 or more  times more data.
>
> My unscientific approach is to repair one CF at a time so you can watch the
> disk usage and repair the smaller CF's first. After the repair compact if
> you need to.
>
> I think  the amount of extra data will be related to how out of sync things
> are, so once you get repair working smoothly it will be less of problem.
>
> Cheers
>
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/09/2011, at 3:04 AM, Yan Chunlu wrote:
>
>
> hi Aaron:
>
> could you explain more about the issue about repair make space usage going
> crazy?
>
> I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is because
> the repair never works on 0.7.4 for me.
> more specifically, 
> CASSANDRA-2280
>  and CASSANDRA-2156 
> .
>
>
> from your description, I really worried about 0.8.6 might make it worse...
>
> thanks!
>
> On Thu, Sep 22, 2011 at 7:25 AM, aaron morton wrote:
>
>> How much data is on the nodes in cluster 1 and how much disk space on
>> cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go
>> crazy and use a lot of space.
>>
>> If you are not regularly running repair I would also repair before the
>> move.
>>
>> The repair after the copy is a good idea but should technically not be
>> necessary. If you can practice the move watch the repair to see if much is
>> transferred (check the logs). There is always a small transfer, but if you
>> see data been transferred for several minutes I would investigate.
>>
>> When you start a repair it will repair will the other nodes it replicates
>> data with. So you only need to run it every RF nodes. Start it one one,
>> watch the logs to see who it talks to and then start it on the first node it
>> does not talk to. And so on.
>>
>> Add a snapshot before the clean (repair will also snapshot before it runs)
>>
>> Scrub is not needed unless you are migrating or you have file errors.
>>
>> If your cluster is online, consider running the clean every RFth node
>> rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have less
>> impact on clients.
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/09/2011, at 10:27 AM, Philippe wrote:
>>
>> Hello,
>> We're currently running on a 3-node RF=3 cluster. Now that we have a
>> better grip on things, we want to replace it with a 12-node RF=3 cluster of
>> "smaller" servers. So I wonder what the best way to move the data to the new
>> cluster would be. I can afford to stop writing to the current cluster for
>> whatever time is necessary. Has anyone written up something on this subject
>> ?
>>
>> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes in
>> cluster 2 are node2.1->2.12)
>>
>>- stop writing to current cluster & drain it
>>- get a snapshot on each node
>>- Since it's RF=3, each node should have all the data, so assuming I
>>set the tokens correctly I would move the snapshot from node1.1 to 
>> node2.1,
>>2.2, 2.3 and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is because 
>> the
>>range for node1.1 is now spread across 2.1->2.4
>>- Run repair & clean & scrub on each node (more or less in //)
>>
>> What do you think ?
>> Thanks
>>
>>
>>
>
>


Re: progress of sstableloader keeps 0?

2011-09-24 Thread Yan Chunlu
yes, I did.  thought 0.8 is downward compatible. is there other ways to load
0.7's data into 0.8?  will copy the data dir directly will work?   I would
like to put load of three nodes into one node.

 thanks!

On Sun, Sep 25, 2011 at 11:52 AM, aaron morton wrote:

> Looks like it is complaining that you are trying to load a 0.7 SSTable in
> 0.8.
>
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/09/2011, at 5:23 PM, Yan Chunlu wrote:
>
> sorry I did not look into it  after check it I found version mismatch
> exception is in the log:
> ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-17,5,main]
> java.lang.RuntimeException: Cannot recover SSTable
> /disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch.
> (current version is g).
> at
> org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
> at
> org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
> at
> org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
> at
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>
>
> does that mean I need to run scrub before running the loader?  could I just
> delete it and keep going?  thanks!
>
> On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis  wrote:
>
>> Did you check for errors in logs on both loader + target?
>>
>> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu 
>> wrote:
>> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
>> > sstableloader to load the snapshot data to another 1 node
>> cluster(N=RF=1).
>> >
>> > after execute  "bin/sstableloader  /disk2/mykeyspace/"
>> >
>> > it says"Starting client (and waiting 30 seconds for gossip) ..."
>> > "Streaming revelant part of  cf1.db. to [10.23.2.4]"
>> > then showing the progress indicator and stopped. nothing changed after
>> > then.
>> > progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]
>> >
>> > I use nodetool to check the node 10.23.2.4, nothing changed. no data
>> copied
>> > to it. and the data dir also keep its original size. is there anything
>> > wrong? how can I tell what was going on there?
>> > thanks!
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>


Re: Possibility of going OOM using get_count

2011-09-24 Thread aaron morton
The changes in get_count() are designed to stop counts for very large rows 
running out of memory as they try to hold millions of columns in memory. 

So if you ask to count all the cols in a row with 1M cols, it will (by default) 
read the first 1024 columns, and then the next 1024 using the last column read 
as the first column for the next page. 

The important part is that it is actually reading the columns. Tombstones mean 
we do not know if a column should be a member of the result set for a query 
until it is read and reconciled with all the other versions of a column. e.g. 3 
sstables have each have a value for a column, if one is a tombstone then the 
column may or may not be deleted. We do not know until all 3 column versions 
are reconciled.

get_count() is like get_slice() but we do not return the columns, just the 
count of them. Counting 1M columns still takes a long time. And find the 
999,980th column will also take a long time, but if you know the name of the 
999,980th column it will be mucho faster. 

Some experiments I did a while ago on query plans 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ - cass 1.0 will 
probably invalidate this.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/09/2011, at 6:01 PM, Boris Yen wrote:

> 
> 
> On Fri, Sep 23, 2011 at 12:28 PM, aaron morton  
> wrote:
> Offsets have been discussed in previously. IIRC the main concerns were either:
> 
> There is no way to reliably count to start the offset, i.e. we do not lock 
> the row
> 
> In the new get_count function, cassandra does the internal paging in order to 
> get the total count. Without locking the row,  the count could still be 
> unreliable (someone might be deleting some columns while cassandra is 
> counting the columns). 
>  
> 
> Or performance related in, as there is not a reliable way to skip 10,000 
> columns other than counting 10,000 columns. With a start col we can search. 
> 
> 
> I am just curious, basically "skip 10,000 columns to get the start column" 
> can be done as what cassandra does for new get_count function (internal 
> paging). I just can not think of a reason why it is doable for get_count but 
> it can not be done for the offset. 
> 
> I know the result might not be reliable and the performance might be varied 
> depends on the offset, but if cassandra can using internal paging to get 
> count, it should be able the apply the same method to get the start column 
> for the offset.
>  
> Cheers
>   
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/09/2011, at 8:50 PM, Boris Yen wrote:
> 
>> I was wondering if it is possible to use similar way as CASSANDRA-2894 to 
>> have the slice_predict support the offset concept? With the offset, it would 
>> be much easier to implement the paging from the client side.
>> 
>> Boris
>> 
>> On Mon, Sep 19, 2011 at 9:45 PM, Jonathan Ellis  wrote:
>> Unfortunately no, because you don't know what the actual
>> last-column-counted was.
>> 
>> On Mon, Sep 19, 2011 at 4:25 AM, aaron morton  
>> wrote:
>> > get_count() supports the same predicate as get_slice. So you can implement
>> > the paging yourself.
>> > Cheers
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 19/09/2011, at 8:45 PM, Tharindu Mathew wrote:
>> >
>> >
>> > On Mon, Sep 19, 2011 at 12:40 PM, Benoit Perroud  
>> > wrote:
>> >>
>> >> The workaround for 0.7 is calling get_slice and count on client side.
>> >> It's heavier, sure, but you will then be able to set start column
>> >> accordingly.
>> >
>> > I was afraid of that :(
>> > Will follow that method. Thanks.
>> >>
>> >>
>> >> 2011/9/19 Tharindu Mathew :
>> >> > Thanks Aaron and Jake for the replies.
>> >> > Any chance of a possible workaround to use for Cassandra 0.7?
>> >> >
>> >> > On Mon, Sep 19, 2011 at 3:48 AM, aaron morton 
>> >> > wrote:
>> >> >>
>> >> >> Cool
>> >> >> Thanks, A
>> >> >> -
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>> >> >>
>> >> >> This is fixed in 1.0
>> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2894
>> >> >>
>> >> >> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew 
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi everyone,
>> >> >>> I noticed this line in the API docs,
>> >> >>>
>> >> >>> The method is not O(1). It takes all the columns from disk to
>> >> >>> calculate
>> >> >>> the answer. The only benefit of the method is that you do not need to
>> >> >>> pull
>> >> >>> all the columns over Thrift interface to count them.
>> >> >>>
>> >> >>> Does this mean if a row has a large number of columns calling this
>> >> >>> method
>> >> >>> might make it go OOM?
>> >> >>> Thanks in advance.
>> >> >>> --
>> >> >>> Reg

Re: progress of sstableloader keeps 0?

2011-09-24 Thread aaron morton
Looks like it is complaining that you are trying to load a 0.7 SSTable in 0.8. 


Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/09/2011, at 5:23 PM, Yan Chunlu wrote:

> sorry I did not look into it  after check it I found version mismatch 
> exception is in the log:
> ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java (line 
> 139) Fatal exception in thread Thread[Thread-17,5,main]
> java.lang.RuntimeException: Cannot recover SSTable 
> /disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch. 
> (current version is g).
> at 
> org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
> at 
> org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
> at 
> org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
> at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
> 
> 
> does that mean I need to run scrub before running the loader?  could I just 
> delete it and keep going?  thanks!
> 
> On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis  wrote:
> Did you check for errors in logs on both loader + target?
> 
> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu  wrote:
> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
> > sstableloader to load the snapshot data to another 1 node cluster(N=RF=1).
> >
> > after execute  "bin/sstableloader  /disk2/mykeyspace/"
> >
> > it says"Starting client (and waiting 30 seconds for gossip) ..."
> > "Streaming revelant part of  cf1.db. to [10.23.2.4]"
> > then showing the progress indicator and stopped. nothing changed after
> > then.
> > progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]
> >
> > I use nodetool to check the node 10.23.2.4, nothing changed. no data copied
> > to it. and the data dir also keep its original size. is there anything
> > wrong? how can I tell what was going on there?
> > thanks!
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 



Re: Moving to a new cluster

2011-09-24 Thread aaron morton
It can result in a lot of data on the node you run repair on. Where a lot means 
perhaps 2 or more  times more data.

My unscientific approach is to repair one CF at a time so you can watch the 
disk usage and repair the smaller CF's first. After the repair compact if you 
need to. 

I think  the amount of extra data will be related to how out of sync things 
are, so once you get repair working smoothly it will be less of problem.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23/09/2011, at 3:04 AM, Yan Chunlu wrote:

> 
> hi Aaron:
> 
> could you explain more about the issue about repair make space usage going 
> crazy?
> 
> I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is because the 
> repair never works on 0.7.4 for me.
> more specifically, CASSANDRA-2280 and CASSANDRA-2156.
> 
> 
> from your description, I really worried about 0.8.6 might make it worse...
> 
> thanks!
> 
> On Thu, Sep 22, 2011 at 7:25 AM, aaron morton  wrote:
> How much data is on the nodes in cluster 1 and how much disk space on cluster 
> 2 ? Be aware that Cassandra 0.8 has an issue where repair can go crazy and 
> use a lot of space. 
> 
> If you are not regularly running repair I would also repair before the move.
> 
> The repair after the copy is a good idea but should technically not be 
> necessary. If you can practice the move watch the repair to see if much is 
> transferred (check the logs). There is always a small transfer, but if you 
> see data been transferred for several minutes I would investigate. 
> 
> When you start a repair it will repair will the other nodes it replicates 
> data with. So you only need to run it every RF nodes. Start it one one, watch 
> the logs to see who it talks to and then start it on the first node it does 
> not talk to. And so on. 
> 
> Add a snapshot before the clean (repair will also snapshot before it runs)
> 
> Scrub is not needed unless you are migrating or you have file errors.
> 
> If your cluster is online, consider running the clean every RFth node rather 
> than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have less impact on 
> clients. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/09/2011, at 10:27 AM, Philippe wrote:
> 
>> Hello,
>> We're currently running on a 3-node RF=3 cluster. Now that we have a better 
>> grip on things, we want to replace it with a 12-node RF=3 cluster of 
>> "smaller" servers. So I wonder what the best way to move the data to the new 
>> cluster would be. I can afford to stop writing to the current cluster for 
>> whatever time is necessary. Has anyone written up something on this subject ?
>> 
>> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes in 
>> cluster 2 are node2.1->2.12)
>> stop writing to current cluster & drain it
>> get a snapshot on each node
>> Since it's RF=3, each node should have all the data, so assuming I set the 
>> tokens correctly I would move the snapshot from node1.1 to node2.1, 2.2, 2.3 
>> and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is because the range for 
>> node1.1 is now spread across 2.1->2.4
>> Run repair & clean & scrub on each node (more or less in //)
>> What do you think ?
>> Thanks
> 
> 



Re: Moving to a new cluster

2011-09-24 Thread aaron morton
No, Run it on one node at a time. 

Looks like I was a little off on my previous statement. Perhaps it  would have 
been better to say if you have more than RF nodes run it on every RF'th node at 
the same time. But make sure you run it on all nodes eventually. 

Aaron

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/09/2011, at 8:52 PM, Jonas Borgström wrote:

> On 09/22/2011 01:25 AM, aaron morton wrote:
> *snip*
>> When you start a repair it will repair will the other nodes it
>> replicates data with. So you only need to run it every RF nodes. Start
>> it one one, watch the logs to see who it talks to and then start it on
>> the first node it does not talk to. And so on. 
> 
> Is this new in 0.8 or has it always been this way?
> 
> From
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> 
> """
> Unless your application performs no deletes, it is vital that production
> clusters run nodetool repair periodically on all nodes in the cluster.
> """
> 
> So for a 3 node cluster using RF=3, is it sufficient to run "nodetool
> repair" on one node?
> 
> / Jonas



Re: Moving to a new cluster

2011-09-24 Thread aaron morton
Thanks sylvain, will look into the new stuff. 



-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/09/2011, at 9:09 PM, Sylvain Lebresne wrote:

> 2011/9/22 Jonas Borgström :
>> On 09/22/2011 01:25 AM, aaron morton wrote:
>> *snip*
>>> When you start a repair it will repair will the other nodes it
>>> replicates data with. So you only need to run it every RF nodes. Start
>>> it one one, watch the logs to see who it talks to and then start it on
>>> the first node it does not talk to. And so on.
> 
> This is not totally true because of
> https://issues.apache.org/jira/browse/CASSANDRA-2610.
> Basically, doing this won't make sure the full cluster is in sync
> (there is a fair
> chance it will, but it's not guaranteed).
> It will be true in 1.0 (though in 1.0 it will be simpler and more
> efficient to just run
> 'nodetool repair --partitioner-range' on every node).
> 
> 
>> Is this new in 0.8 or has it always been this way?
>> 
>> From
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> 
>> """
>> Unless your application performs no deletes, it is vital that production
>> clusters run nodetool repair periodically on all nodes in the cluster.
>> """
>> 
>> So for a 3 node cluster using RF=3, is it sufficient to run "nodetool
>> repair" on one node?
> 
> Technically, in the 3 nodes RF=3 case, you would need to do repair on
> 2 nodes to make sure the cluster has been fully repaired. But it becomes
> fairly complicated to know which nodes exactly once you get more than
> 3 nodes in the cluster or you have RF > 3, so to be safe I would advise
> sticking to the wiki instruction (until 1.0 at least).
> 
>> 
>> / Jonas
>> 



Hadoop settings if running into blacklisted task trackers with Cassandra

2011-09-24 Thread Jeremy Hanna
I thought I would share something valuable that Jacob Perkins (who recently 
started with us) shared.  We were seeing blacklisted task trackers and 
occasionally failed jobs.  These were almost always based on TimedOutExceptions 
from Cassandra.  We've been fixing underlying reasons for those exceptions.  
However, one thing Jacob found when getting timeout errors with elastic search 
+ hadoop, if he gave elastic search a few more tries before failing the jobs, 
things finished.  So he cranked those up.  Granted if you crank them too high, 
your jobs that might have otherwise failed, don't have a chance to fail.  But 
for us, it was that we just needed to generally give Cassandra a few more 
tries.  We're still getting the gremlins out here and there, but you can set 
this at the job level or on the task trackers themselves.  It gives Cassandra a 
few more tries for each task for that job so that it doesn't blacklist that 
node for the job as quickly and doesn't fail the job as easily.  An example 
configuration (for job configuration or for the task trackers' mapred-site.xml) 
is:


  mapred.max.tracker.failures
  20


  mapred.max.tracker.failures
  20


  mapred.map.max.attempts
  20


  mapred.reduce.max.attempts
  20


Just thought I would share this because I've seen others experience this 
problem.  It's not a complete solution but it can come in handy if you want to 
make Hadoop more fault tolerant with Cassandra.

frequent node UP/Down?

2011-09-24 Thread Yang
I'm using 1.0.0


there seems to be too many node Up/Dead events detected by the failure
detector.
I'm using  a 2 node cluster on EC2, in the same region, same security
group, so I assume the message drop
rate should be fairly low.
but in about every 5 minutes, I'm seeing some node detected as down,
and then Up again quickly, like the following


 INFO 20:30:12,726 InetAddress /10.71.111.222 is now dead.
 INFO 20:30:32,154 InetAddress /10.71.111.222 is now UP


does the "1 in every 5 minutes" sound roughly right for your setup? I
just want to make sure the unresponsiveness is not
caused by something like memtable flushing, or GC, which I can
probably further tune.


Thanks
Yang


Re: Tool to access Data in Cassandra

2011-09-24 Thread Andrey V. Panov
I did propotyping very small Cassandra data browser on top of Wicket and
Hector. Try it :) http://goo.gl/lozFo

On 23 September 2011 08:41, mcasandra  wrote:

> Are there any tools that let you scroll over data in Cassandra in html or
> UI?
>
> We are planning to encrypt data before storing in cassandra so I think we
> also need a tool where we can plug-in decryption logic. Does anyone if
> there
> is such a tool available that we can enhance?
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tool-to-access-Data-in-Cassandra-tp6822348p6822348.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Could not reach schema agreement when adding a new node.

2011-09-24 Thread Dikang Gu
I found this in the system.log when adding a new node to the cluster.

Anyone familiar with this?

ERROR [HintedHandoff:2] 2011-09-24 18:01:30,498 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[HintedHandoff:2,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not reach
schema agreement with /192.168.1.9 in 6ms
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: Could not reach schema agreement with
/192.168.1.9 in 6ms
at
org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:290)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:301)
at
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
at
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:394)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more

Thanks.

-- 
Dikang Gu

0086 - 18611140205


Re: Can not connect to cassandra 0.7 using CLI

2011-09-24 Thread Julio Julio
Eric Evans  rackspace.com> writes:

> 
> On Thu, 2010-12-02 at 08:11 +1100, Joshua Partogi wrote:
> > It is set to localhost I didn't change it and it is the same as
> > configured
> > in 0.6.8. Why doesn't it work out of the box?
> > 
> > Thanks heaps. 
> 
> Try "netstat -nl | grep 9160".  Is the node listening on 9160?  Which
> interface is it bound to?
> 


I've got the same problem :( I've tried "netstat -nl | grep 9160" 
and received nothing on the console output. In my cassandra.yaml
file I've got:
listen_address: localhost
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true

(it might be a clue what is wrong) when I type "cassandra" on 
the terminal I get this:

julio@julio-System-Product-Name:~$ log4j:ERROR setFile
(null,true) 
call failed.
java.io.FileNotFoundException: /var/log/cassandra/system.log 
(Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.
(FileOutputStream.java:207)

and more ... ;/

what should I do? maybe I make silly mistakes but I'm completely 
new to noSQL and Cassandra. Please, help me!

best regards
Julio




Re: Increasing thrift_framed_transport_size_in_mb

2011-09-24 Thread Radim Kolar

Dne 24.9.2011 0:05, Jonathan Ellis napsal(a):

Really large messages are not encouraged because they will fragment
your heap quickly.  Other than that, no.
what is recommended chunk size for storing multi gigabyte files in 
cassandra? 64MB is okay or its too large?


Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration

2011-09-24 Thread Tharindu Mathew
Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew wrote:

> Hi,
>
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
>
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
>
> Have a new operation in the thrift interface that allows us to do,
>
> Map<(CF+key), List> client.get_endpoints ( List)
>
> Functionality would be similar to node tools#getEndpoints.
>
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
>
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/


Re: Can't connect to MX4J endpoint on Ubuntu

2011-09-24 Thread Sorin Julean
Hey,

 Do a:   grep -i mx4 system.log | less

 and look for: Mx4jTool.java (line 67) mx4j successfuly loaded

 Also make sure you have the latest mx4j-tool from:
http://sourceforge.net/projects/mx4j/files/MX4J%20Binary/3.0.2/


Kind regards,
Sorin


On Fri, Sep 23, 2011 at 11:55 PM, Iwona Bialynicka-Birula <
iwona...@atigeo.com> wrote:

> Hello,
>
> ** **
>
> I am trying to monitor Cassandra 8.0 using MX4J as described in
> http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J. I have
> mx4j-tools.jar in Cassandra’s lib folder and when Cassandra starts in prints
> out:
>
> HttpAdaptor version 3.0.1 started on port 8081
>
> ** **
>
> But when I try to open http://localhost:8081/ in my browser I get “unable
> to connect”. I checked that Cassandra daemon (only) is listening on port
> 8081 and that there are no errors in the logs.
>
> ** **
>
> Any ideas, why Cassandra is not answering on the 8081 MX4J port?
>
> ** **
>
> Thanks,
>
> Iwona
>