Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-07 Thread Patrik Modesto
You're right, I wasn't looking in the right logs. Unfortunately I'd
need to restart hadoop takstracker with loglevel DEBUG and that is not
possilbe at the moment. Pitty it happens only in the production with
terrabytes of data, not in the test...

Regards,
P.

On Tue, Mar 6, 2012 at 14:31, Florent Lefillâtre flefi...@gmail.com wrote:
 CFRR.getProgress() is called by child mapper tasks on each TastTracker node,
 so the log must appear on
 ${hadoop_log_dir}/attempt_201202081707_0001_m_00_0/syslog (or somethings
 like this) on TaskTrackers, not on client job logs.
 Are you sure to see the good log file, I say that because in your first mail
 you link the client job log.
 And may be you can log the size of each split in CFIF.




 Le 6 mars 2012 13:09, Patrik Modesto patrik.mode...@gmail.com a écrit :

 I've added a debug message in the CFRR.getProgress() and I can't find
 it in the debug output. Seems like the getProgress() has not been
 called at all;

 Regards,
 P.

 On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna jeremy.hanna1...@gmail.com
 wrote:
  you may be running into this -
  https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
  really affects the execution of the job itself though.
 
  On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
 
  Hi,
 
  I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
  Timeouts I get are not because of the Cassandra can't handle the
  requests. I've noticed there are several tasks that show proggess of
  several thousands percents. Seems like they are looping their range of
  keys. I've run the job with debug enabled and the ranges look ok, see
  http://pastebin.com/stVsFzLM
 
  Another difference between cassandra-all 0.8.7 and 0.8.10 is the
  number of mappers the job creates:
  0.8.7: 4680
  0.8.10: 595
 
  Task       Complete
  task_201202281457_2027_m_41       9076.81%
  task_201202281457_2027_m_73       9639.04%
  task_201202281457_2027_m_000105       10538.60%
  task_201202281457_2027_m_000108       9364.17%
 
  None of this happens with cassandra-all 0.8.7.
 
  Regards,
  P.
 
 
 
  On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
  patrik.mode...@gmail.com wrote:
  I'll alter these settings and will let you know.
 
  Regards,
  P.
 
  On Tue, Feb 28, 2012 at 09:23, aaron morton aa...@thelastpickle.com
  wrote:
  Have you tried lowering the  batch size and increasing the time out?
  Even
  just to get it to work.
 
  If you get a TimedOutException it means CL number of servers did not
  respond
  in time.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
 
  Hi aaron,
 
  this is our current settings:
 
       property
           namecassandra.range.batch.size/name
           value1024/value
       /property
 
       property
           namecassandra.input.split.size/name
           value16384/value
       /property
 
  rpc_timeout_in_ms: 3
 
  Regards,
  P.
 
  On Mon, Feb 27, 2012 at 21:54, aaron morton aa...@thelastpickle.com
  wrote:
 
  What settings do you have for cassandra.range.batch.size
 
  and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
  increasing
 
  the second ?
 
 
  Cheers
 
 
  -
 
  Aaron Morton
 
  Freelance Developer
 
  @aaronmorton
 
  http://www.thelastpickle.com
 
 
  On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
 
 
  On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
  edlinuxg...@gmail.com
 
  wrote:
 
 
  Did you see the notes here?
 
 
 
  I'm not sure what do you mean by the notes?
 
 
  I'm using the mapred.* settings suggested there:
 
 
      property
 
          namemapred.max.tracker.failures/name
 
          value20/value
 
      /property
 
      property
 
          namemapred.map.max.attempts/name
 
          value20/value
 
      /property
 
      property
 
          namemapred.reduce.max.attempts/name
 
          value20/value
 
      /property
 
 
  But I still see the timeouts that I haven't with cassandra-all 0.8.7.
 
 
  P.
 
 
  http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
 
 
 
 
 




Re: Schema change causes exception when adding data

2012-03-07 Thread Tharindu Mathew
Hi Folks,

Managed to solve this problem to an extent. I used the thrift api just for
this client and did a thread sleep until the schema comes into agreement.

The addColumnFamily(CF, boolean) is not available in the Hector I use.
Anyway, I checked the code in Hector trunk. The approach is almost
identical, so it should be good.

Sometimes still the schema does not come into agreement... I wonder whether
this issue is solved the newer versions?

Anyway, Aaron has given a workaround in another thread for this problem.

Thanks for all the help folks.

On Tue, Mar 6, 2012 at 11:32 PM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 Maybe I didn't understand, but if you use Hector's
 addColumnFamily(CF, true);
 it should wait for schema agreement.
 Will that solve your problem?

 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Tue, Mar 6, 2012 at 7:55 PM, Jeremiah Jordan 
 jeremiah.jor...@morningstar.com wrote:

  That is the best one I have found.


 On 03/01/2012 03:12 PM, Tharindu Mathew wrote:

 There are 2. I'd like to wait till there are one, when I insert the value.

 Going through the code, calling client.describe_schema_versions() seems
 to give a good answer to this. And I discovered that if I wait till there
 is only 1 version, I will not get this error.

 Is this the best practice if I want to check this programatically?

 On Thu, Mar 1, 2012 at 11:15 PM, aaron morton aa...@thelastpickle.comwrote:

 use describe cluster in the CLI to see how many schema versions there
 are.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 2/03/2012, at 12:25 AM, Tharindu Mathew wrote:



 On Thu, Mar 1, 2012 at 11:47 AM, Tharindu Mathew mcclou...@gmail.comwrote:

 Jeremiah,

 Thanks for the reply.

 This is what we have been doing, but it's not reliable as we don't know
 a definite time that the schema would get replicated. Is there any way I
 can know for sure that changes have propagated?

 [Edit: corrected to a question]


 Then I can block the insertion of data until then.


 On Thu, Mar 1, 2012 at 4:33 AM, Jeremiah Jordan 
 jeremiah.jor...@morningstar.com wrote:

  The error is that the specified colum family doesn’t exist.  If you
 connect with the CLI and describe the keyspace does it show up?  Also,
 after adding a new column family programmatically you can’t use it
 immediately, you have to wait for it to propagate.  You can use calls to
 describe schema to do so, keep calling it until every node is on the same
 schema.



 -Jeremiah



 *From:* Tharindu Mathew [mailto:mcclou...@gmail.com]
 *Sent:* Wednesday, February 29, 2012 8:27 AM
 *To:* user
 *Subject:* Schema change causes exception when adding data



 Hi,

 I have a 3 node cluster and I'm dynamically updating a keyspace with a
 new column family. Then, when I try to write records to it I get the
 following exception shown at [1].

 How do I avoid this. I'm using Hector and the default consistency
 level of QUORUM is used. Cassandra version 0.7.8. Replication Factor is 1.

 How can I solve my problem?

 [1] -
 me.prettyprint.hector.api.exceptions.HInvalidRequestException:
 InvalidRequestException(why:unconfigured columnfamily proxySummary)

 at
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)

 at
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

 at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:156)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(KeyspaceServiceImpl.java:401)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:67)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:59)

 at
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

 at
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:72)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(ThriftMultigetSliceQuery.java:58)



 --
 Regards,

 Tharindu



 blog: http://mackiemathew.com/






 --
 Regards,

 Tharindu

  blog: http://mackiemathew.com/




 --
 Regards,

 Tharindu

  blog: http://mackiemathew.com/





 --
 Regards,

 Tharindu

  blog: http://mackiemathew.com/





-- 
Regards,

Tharindu

blog: http://mackiemathew.com/
tokLogo.png

Node joining / unknown

2012-03-07 Thread R. Verlangen
Hi there,

I'm currently in a really weird situation.
- Nodetool ring says node X is joining (this already takes 12 hours, with
no activity)
- When I try to remove the token, it says: Exception in thread main
java.lang.UnsupportedOperationException: Token not found.
- Removetoken status = No token removals in process.

How to get that node out of my cluster?

With kind regards,
Robin Verlangen


Re: Repairing nodes when two schema versions appear

2012-03-07 Thread Tharindu Mathew
Thanks Aaron.

This is great. A couple of questions if you don't mind...

1. Can this schema version issue happen in newer versions of Cassandra
(1.0) ?

2. When the node is UP and we do this, even though logs errors, still would
everything come back to normal just like we shut down and delete and
restart?

On Wed, Mar 7, 2012 at 2:07 AM, aaron morton aa...@thelastpickle.comwrote:

 Go to one of the nodes, stop it and delete the Migrations and Schema files
 in the system keyspace.

 When you restart the node it will stream the migrations the other. Note
 that if the node is UP and accepting traffic it may log errors about
 missing CF's during this time.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 1:43 AM, Tharindu Mathew wrote:

 Hi,

 I try to add column families programatically and end up with 2 schema
 versions in the Cassandra cluster. Using Cassandra 0.7.

 Is there a way to bring this back to normal (to one schema version)
 through the cli or through the API?

 --
 Regards,

 Tharindu

 blog: http://mackiemathew.com/





-- 
Regards,

Tharindu

blog: http://mackiemathew.com/


Re: Secondary indexes don't go away after metadata change

2012-03-07 Thread aaron morton
Migrations are a delta (e.g. update CF).

I have a quick look at the code; it does appear to be cancelling any in 
progress index builds when an index is dropped. There may be other code that 
does it though. 

Perhaps indexes should not be built until the node has schema agreement with 
others. Not sure on the implications of that. 

Anyways can you please file a bug here 
https://issues.apache.org/jira/browse/CASSANDRA 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 10:32 AM, Frisch, Michael wrote:

 Sure enough it does.  Looking back in the logs when the node was first coming 
 online I can see it applying migrations and submitting index builds on 
 indexes that are deleted in the newest version of the schema.  This may be a 
 silly question but shouldn’t it just apply the most recent version of the 
 schema on a new node?  Is there a reason to apply the migrations?
  
 - Mike
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Tuesday, March 06, 2012 4:14 AM
 To: user@cassandra.apache.org
 Subject: Re: Secondary indexes don't go away after metadata change
  
 When the new node comes online the history of schema changes are streamed to 
 it. I've not looked at the code but it could be that schema migrations are 
 creating Indexes. That are then deleted from the schema but not from the DB 
 it's self.
  
 Does that fit your scenario ? When the new node comes online does it log 
 migrations been applied and then indexes been created ?
  
 Cheers
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:
 
 
 Thank you very much for your response.  It is true that the older, previously 
 existing nodes are not snapshotting the indexes that I had removed.  I’ll go 
 ahead and just delete those SSTables from the data directory.  They may be 
 around still because they were created back when we used 0.8.
  
 The more troubling issue is with adding new nodes to the cluster though.  It 
 built indexes for column families that have had all indexes dropped weeks or 
 months in the past.  It also will snapshot the index SSTables that it 
 created.  The index files are non-empty as well, some are hundreds of 
 megabytes.
  
 All nodes have the same schema, none list themselves as having the rows 
 indexed.  I cannot drop the indexes via the CLI either because it says that 
 they don’t exist.  It’s quite perplexing.
  
 - Mike
  
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Monday, March 05, 2012 3:58 AM
 To: user@cassandra.apache.org
 Subject: Re: Secondary indexes don't go away after metadata change
  
 The secondary index CF's are marked as no longer required / marked as 
 compacted. under 1.x they would then be deleted reasonably quickly, and 
 definitely deleted after a restart. 
  
 Is there a zero length .Compacted file there ? 
  
 Also, when adding a new node to the ring the new node will build indexes for 
 the ones that supposedly don’t exist any longer.  Is this supposed to happen? 
  Would this have happened if I had deleted the old SSTables from the 
 previously existing nodes?
 Check you have a consistent schema using describe cluster in the CLI. And 
 check the schema is what you think it is using show schema. 
  
 Another trick is to do a snapshot. Only the files in use are included the 
 snapshot. 
  
 Hope that helps. 
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:
 
 
 
 I have a few column families that I decided to get rid of the secondary 
 indexes on.  I see that there aren’t any new index SSTables being created, 
 but all of the old ones remain (some from as far back as September).  Is it 
 safe to just delete then when the node is offline?  Should I run clean-up or 
 scrub?
  
 Also, when adding a new node to the ring the new node will build indexes for 
 the ones that supposedly don’t exist any longer.  Is this supposed to happen? 
  Would this have happened if I had deleted the old SSTables from the 
 previously existing nodes?
  
 The nodes in question have either been upgraded from v0.8.1 = v1.0.2 
 (scrubbed at this time) = v1.0.6 or from v1.0.2 = v1.0.6.  The secondary 
 index was dropped when the nodes were version 1.0.6.  The new node added was 
 also 1.0.6.
  
 - Mike



Re: Node joining / unknown

2012-03-07 Thread aaron morton
 - When I try to remove the token, it says: Exception in thread main 
 java.lang.UnsupportedOperationException: Token not found.
Am assuming you ran nodetool removetoken on a node other than the joining node? 
What  did nodetool ring look like on that machine ? 

Take a look at nodetool netstats on the joining node to see if streaming has 
failed. If it's dead then…

1) Try restarting the joining node and run nodetool repair on it immediately. 
Note: am assuming QUOURM CL otherwise things may get inconsistent. 
or
2) Stop the node. Try to get remove the token again from another node. Node 
that removing a token will stream data around the place as well. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,
 
 I'm currently in a really weird situation. 
 - Nodetool ring says node X is joining (this already takes 12 hours, with no 
 activity)
 - When I try to remove the token, it says: Exception in thread main 
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.
 
 How to get that node out of my cluster?
 
 With kind regards,
 Robin Verlangen



Re: Repairing nodes when two schema versions appear

2012-03-07 Thread aaron morton
 1. Can this schema version issue happen in newer versions of Cassandra (1.0) 
 ?
AFAIK yes. Schema changes are assumed to happen infrequently, and to only be 
started if the cluster is in schema agreement. Clients (include the CLI and 
CQL) take care of this. 

 2. When the node is UP and we do this, even though logs errors, still would 
 everything come back to normal just like we shut down and delete and restart?
You need to shut it down to delete the files, then start. Assuming RF =3 and 
QUOURM CL, if the restarting node fails to read or write data requests will 
still succeed. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 9:12 PM, Tharindu Mathew wrote:

 Thanks Aaron. 
 
 This is great. A couple of questions if you don't mind...
 
 1. Can this schema version issue happen in newer versions of Cassandra (1.0) 
 ?
 
 2. When the node is UP and we do this, even though logs errors, still would 
 everything come back to normal just like we shut down and delete and restart?
 
 On Wed, Mar 7, 2012 at 2:07 AM, aaron morton aa...@thelastpickle.com wrote:
 Go to one of the nodes, stop it and delete the Migrations and Schema files in 
 the system keyspace. 
 
 When you restart the node it will stream the migrations the other. Note that 
 if the node is UP and accepting traffic it may log errors about missing CF's 
 during this time. 
 
 Cheers 
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 7/03/2012, at 1:43 AM, Tharindu Mathew wrote:
 
 Hi,
 
 I try to add column families programatically and end up with 2 schema 
 versions in the Cassandra cluster. Using Cassandra 0.7.
 
 Is there a way to bring this back to normal (to one schema version) through 
 the cli or through the API?
 
 -- 
 Regards,
 
 Tharindu
 
 blog: http://mackiemathew.com/
 
 
 
 
 
 -- 
 Regards,
 
 Tharindu
 
 blog: http://mackiemathew.com/
 



Re: Old data coming alive after adding node

2012-03-07 Thread Stefan Reek
After the old data came up we were able to delete it again. And it is 
stable now.
We are in the process of upgrading to 1.0, but as you said that's a 
painful process.

I just hope 0.6 will keep running till we're done with the upgrade.
Anyway thanks for the help.

Cheers,

Stefan



On 03/06/2012 07:02 PM, aaron morton wrote:

All our writes/deletes are done with CL.QUORUM.
Our reads are done with CL.ONE. Although the reads that confirmed the 
old data were done with CL.QUORUM.



According to 
https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 0.6.6 
has the same patch
for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions 
in 0.6.6 and up also purged tombstones.

My bad. As you were.

After the repair did the un-deleted data remain un-deleted ? Are you 
back to a stable situation ?


Without a lot more detail I am at a bit of a loss.

I know it's painful but migrating to 1.0 *really* will make your life 
so much easier and faster. At some point you may hit a bug or a 
problem in 0.6 and the solution may be to upgrade, quickly.


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 11:13 PM, Stefan Reek wrote:


Hi Aaron,

Thanks for the quick reply.
All our writes/deletes are done with CL.QUORUM.
Our reads are done with CL.ONE. Although the reads that confirmed the 
old data were done with CL.QUORUM.
According to 
https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 
0.6.6 has the same patch
for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions 
in 0.6.6 and up also purged tombstones.
The only suspicious thing I noticed was that after adding the fourth 
node repairs became extremely slow and heavy.
Running it degraded the performance of the whole cluster and the new 
node even went OOM when running it.


Cheers,

Stefan

On 03/06/2012 10:51 AM, aaron morton wrote:
After we added a fourth node, keeping RF=3, some old data appeared 
in the database.
What CL are you working at ? (Should not matter too much with repair 
working, just asking)



We don't run compact on the nodes explicitly as I understand that 
running repair will trigger a
major compaction. I'm not entirely sure if it does so, but in any 
case the tombstones will be removed by a minor

compaction.
In 0.6.x tombstones were only purged during a major / manual 
compaction. Purging during minor compaction came in during 0.7

https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467


Can anyone think of any reason why the old data reappeared?
It sounds like you are doing things correctly. The complicating 
factor is 0.6 is so very old.



If I wanted to poke around some more I would conduct reads as CL one 
against nodes and see if they return the deleted data or not. This 
would help me understand if the tombstone is still out there.


I would also poke around a lot in the logs to make sure repair was 
running as expected and completing. If you find anything suspicious 
post examples.


Finally I would ensure CL QUROUM was been used.

Hope that helps.


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com http://www.thelastpickle.com/

On 6/03/2012, at 10:13 PM, Stefan Reek wrote:


Hi,

We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
After we added a fourth node, keeping RF=3, some old data appeared 
in the database.
As far as I understand this can only happen if nodetool repair 
wasn't run for more than GCGraceSeconds.

Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
We have  a scheduled cronjob to run repair once each week on every 
node, each on another day.

I'm sure that none of the nodes ever skipped running a repair.
We don't run compact on the nodes explicitly as I understand that 
running repair will trigger a
major compaction. I'm not entirely sure if it does so, but in any 
case the tombstones will be removed by a minor
compaction. So I expected that the reappearing data, which is a 
couple of months old in some cases, was long gone

by the time we added the node.

Can anyone think of any reason why the old data reappeared?

Stefan










Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-07 Thread Florent Lefillâtre
If you want try a test, in the CFIF.getSubSplits(String, String,
TokenRange, Configuration) method, replace the loop on
'range.rpc_endpoints' by the same loop on 'range.endpoints'.
This method split token range of each node with describe_splits method, but
I think there is something wrong when you create Cassandra connection on
host '0.0.0.0'.




Le 7 mars 2012 09:07, Patrik Modesto patrik.mode...@gmail.com a écrit :

 You're right, I wasn't looking in the right logs. Unfortunately I'd
 need to restart hadoop takstracker with loglevel DEBUG and that is not
 possilbe at the moment. Pitty it happens only in the production with
 terrabytes of data, not in the test...

 Regards,
 P.

 On Tue, Mar 6, 2012 at 14:31, Florent Lefillâtre flefi...@gmail.com
 wrote:
  CFRR.getProgress() is called by child mapper tasks on each TastTracker
 node,
  so the log must appear on
  ${hadoop_log_dir}/attempt_201202081707_0001_m_00_0/syslog (or
 somethings
  like this) on TaskTrackers, not on client job logs.
  Are you sure to see the good log file, I say that because in your first
 mail
  you link the client job log.
  And may be you can log the size of each split in CFIF.
 
 
 
 
  Le 6 mars 2012 13:09, Patrik Modesto patrik.mode...@gmail.com a écrit
 :
 
  I've added a debug message in the CFRR.getProgress() and I can't find
  it in the debug output. Seems like the getProgress() has not been
  called at all;
 
  Regards,
  P.
 
  On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna jeremy.hanna1...@gmail.com
  wrote:
   you may be running into this -
   https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure
 if it
   really affects the execution of the job itself though.
  
   On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
  
   Hi,
  
   I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
   Timeouts I get are not because of the Cassandra can't handle the
   requests. I've noticed there are several tasks that show proggess of
   several thousands percents. Seems like they are looping their range
 of
   keys. I've run the job with debug enabled and the ranges look ok, see
   http://pastebin.com/stVsFzLM
  
   Another difference between cassandra-all 0.8.7 and 0.8.10 is the
   number of mappers the job creates:
   0.8.7: 4680
   0.8.10: 595
  
   Task   Complete
   task_201202281457_2027_m_41   9076.81%
   task_201202281457_2027_m_73   9639.04%
   task_201202281457_2027_m_000105   10538.60%
   task_201202281457_2027_m_000108   9364.17%
  
   None of this happens with cassandra-all 0.8.7.
  
   Regards,
   P.
  
  
  
   On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
   patrik.mode...@gmail.com wrote:
   I'll alter these settings and will let you know.
  
   Regards,
   P.
  
   On Tue, Feb 28, 2012 at 09:23, aaron morton 
 aa...@thelastpickle.com
   wrote:
   Have you tried lowering the  batch size and increasing the time
 out?
   Even
   just to get it to work.
  
   If you get a TimedOutException it means CL number of servers did
 not
   respond
   in time.
  
   Cheers
  
   -
   Aaron Morton
   Freelance Developer
   @aaronmorton
   http://www.thelastpickle.com
  
   On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
  
   Hi aaron,
  
   this is our current settings:
  
property
namecassandra.range.batch.size/name
value1024/value
/property
  
property
namecassandra.input.split.size/name
value16384/value
/property
  
   rpc_timeout_in_ms: 3
  
   Regards,
   P.
  
   On Mon, Feb 27, 2012 at 21:54, aaron morton 
 aa...@thelastpickle.com
   wrote:
  
   What settings do you have for cassandra.range.batch.size
  
   and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
   increasing
  
   the second ?
  
  
   Cheers
  
  
   -
  
   Aaron Morton
  
   Freelance Developer
  
   @aaronmorton
  
   http://www.thelastpickle.com
  
  
   On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
  
  
   On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
   edlinuxg...@gmail.com
  
   wrote:
  
  
   Did you see the notes here?
  
  
  
   I'm not sure what do you mean by the notes?
  
  
   I'm using the mapred.* settings suggested there:
  
  
   property
  
   namemapred.max.tracker.failures/name
  
   value20/value
  
   /property
  
   property
  
   namemapred.map.max.attempts/name
  
   value20/value
  
   /property
  
   property
  
   namemapred.reduce.max.attempts/name
  
   value20/value
  
   /property
  
  
   But I still see the timeouts that I haven't with cassandra-all
 0.8.7.
  
  
   P.
  
  
   http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
  
  
  
  
  
 
 



Re: Schema change causes exception when adding data

2012-03-07 Thread Tyler Hobbs
On Wed, Mar 7, 2012 at 2:10 AM, Tharindu Mathew mcclou...@gmail.com wrote:

 Sometimes still the schema does not come into agreement... I wonder
 whether this issue is solved the newer versions?


I believe there has been some improvement in this area in recent versions,
but I think it's also still true that you should make sure you're syncing
all of your clocks with NTP to help prevent schema disagreements.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Final buffer length 4690 to accomodate data size of 2347 for RowMutation error caused node death

2012-03-07 Thread Jonathan Ellis
Thanks, Thomas.

Row cache/CLHCP confirms our suspected culprit.  We've committed a fix
for 1.0.9.

On Wed, Mar 7, 2012 at 11:08 AM, Thomas van Neerijnen
t...@bossastudios.com wrote:
 Sorry, for the delay in replying.

 I'd like to stress that I've been working on this cluster for many months
 and this was the first and so far last time I got this error so I couldn't
 guess how to duplicate. Sorry I can't be more help.

 Anyways, here's the details requested:
 Row caching is enabled, at the time the error occurred using
 ConcurrentLinkedHashCacheProvider.
 It's the Apache packaged version with JNA pulled in as a dependency when I
 installed so yes.
 We're using Hector 1.0.1.
 I'm not sure what was happening at the time the error occured altho the
 empty super columns are expected, assuming my understanding of super columns
 being deleted is correct, which is to say if I delete a super column from a
 row it'll tombstone it and delete the data.
 The schema for PlayerCity is as follows:

 create column family PlayerCity
   with column_type = 'Super'
   and comparator = 'UTF8Type'
   and subcomparator = 'BytesType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'BytesType'
   and rows_cached = 400.0
   and row_cache_save_period = 0
   and row_cache_keys_to_save = 2147483647
   and keys_cached = 20.0
   and key_cache_save_period = 14400
   and read_repair_chance = 1.0
   and gc_grace = 864000
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';


 On Fri, Feb 24, 2012 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I've filed https://issues.apache.org/jira/browse/CASSANDRA-3957 as a
 bug.  Any further light you can shed here would be useful.  (Is row
 cache enabled?  Is JNA installed?)

 On Mon, Feb 20, 2012 at 5:43 AM, Thomas van Neerijnen
 t...@bossastudios.com wrote:
  Hi all
 
  I am running the Apache packaged Cassandra 1.0.7 on Ubuntu 11.10.
  It has been running fine for over a month however I encountered the
  below
  error yesterday which almost immediately resulted in heap usage rising
  quickly to almost 100% and client requests timing out on the affected
  node.
  I gave up waiting for the init script to stop Cassandra and killed it
  myself
  after about 3 minutes, restarted it and it has been fine since. Anyone
  seen
  this before?
 
  Here is the error in the output.log:
 
  ERROR 10:51:44,282 Fatal exception in thread
  Thread[COMMIT-LOG-WRITER,5,main]
  java.lang.AssertionError: Final buffer length 4690 to accomodate data
  size
  of 2347 (predicted 2344) for RowMutation(keyspace='Player',
 
  key='36336138643338652d366162302d343334392d383466302d356166643863353133356465',
  modifications=[ColumnFamily(PlayerCity [SuperColumn(owneditem_1019
  []),SuperColumn(owneditem_1024 []),SuperColumn(owneditem_1026
  []),SuperColumn(owneditem_1074 []),SuperColumn(owneditem_1077
  []),SuperColumn(owneditem_1084 []),SuperColumn(owneditem_1094
  []),SuperColumn(owneditem_1130 []),SuperColumn(owneditem_1136
  []),SuperColumn(owneditem_1141 []),SuperColumn(owneditem_1142
  []),SuperColumn(owneditem_1145 []),SuperColumn(owneditem_1218
 
  [636f6e6e6563746564:false:5@1329648704269002,63757272656e744865616c7468:false:3@1329648704269006,656e64436f6e737472756374696f6e54696d65:false:13@1329648704269007,6964:false:4@1329648704269000,6974656d4964:false:15@1329648704269001,6c61737444657374726f79656454696d65:false:1@1329648704269008,6c61737454696d65436f6c6c6563746564:false:13@1329648704269005,736b696e4964:false:7@1329648704269009,78:false:4@1329648704269003,79:false:3@1329648704269004,]),SuperColumn(owneditem_133
  []),SuperColumn(owneditem_134 []),SuperColumn(owneditem_135
  []),SuperColumn(owneditem_141 []),SuperColumn(owneditem_147
  []),SuperColumn(owneditem_154 []),SuperColumn(owneditem_159
  []),SuperColumn(owneditem_171 []),SuperColumn(owneditem_253
  []),SuperColumn(owneditem_422 []),SuperColumn(owneditem_438
  []),SuperColumn(owneditem_515 []),SuperColumn(owneditem_521
  []),SuperColumn(owneditem_523 []),SuperColumn(owneditem_525
  []),SuperColumn(owneditem_562 []),SuperColumn(owneditem_61
  []),SuperColumn(owneditem_634 []),SuperColumn(owneditem_636
  []),SuperColumn(owneditem_71 []),SuperColumn(owneditem_712
  []),SuperColumn(owneditem_720 []),SuperColumn(owneditem_728
  []),SuperColumn(owneditem_787 []),SuperColumn(owneditem_797
  []),SuperColumn(owneditem_798 []),SuperColumn(owneditem_838
  []),SuperColumn(owneditem_842 []),SuperColumn(owneditem_847
  []),SuperColumn(owneditem_849 []),SuperColumn(owneditem_851
  []),SuperColumn(owneditem_852 []),SuperColumn(owneditem_853
  []),SuperColumn(owneditem_854 []),SuperColumn(owneditem_857
  []),SuperColumn(owneditem_858 []),SuperColumn(owneditem_874
  []),SuperColumn(owneditem_884 []),SuperColumn(owneditem_886
  

Re: Node joining / unknown

2012-03-07 Thread R. Verlangen
At this moment the node has joined the ring (after a restart: tried that
before, but now it had finally result).

When I try to run repair on the new node, the log says (the new node is
NODE C):

INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE A
 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE B

And then doesn't do anything anymore. Tried it a couple of times again.
It's just not starting.

Results from netstats on NODE C:

Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0  5
Responses   n/a93   4296


Any suggestions?

Thank you!

2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen





Re: Node joining / unknown

2012-03-07 Thread Brandon Williams
On Wed, Mar 7, 2012 at 3:37 AM, aaron morton aa...@thelastpickle.com wrote:
 2) Stop the node. Try to get remove the token again from another node. Node
 that removing a token will stream data around the place as well.

A node that has never fully joined doesn't need to be removed (and
can't.)  Just shut it down and it will go away after a minute or so.

-Brandon


Re: Node joining / unknown

2012-03-07 Thread igor
Maybe it wait for verification compaction on other node?

 



-Original Message-
From: R. Verlangen ro...@us2.nl
To: user@cassandra.apache.org
Sent: Wed, 07 Mar 2012 22:15
Subject: Re: Node joining / unknown

At this moment the node has joined the ring (after a restart: tried that
before, but now it had finally result).

When I try to run repair on the new node, the log says (the new node is
NODE C):

INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE A
 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE B

And then doesn't do anything anymore. Tried it a couple of times again.
It's just not starting.

Results from netstats on NODE C:

Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0  5
Responses   n/a93   4296


Any suggestions?

Thank you!

2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen





Re: Node joining / unknown

2012-03-07 Thread R. Verlangen
@Brandon: Thank you for the information. I'll do that next time.

@Igor: Any ways to find out whether that is the current state? And if so,
how to solve it?

2012/3/7 i...@4friends.od.ua

 Maybe it wait for verification compaction on other node?





 -Original Message-
 From: R. Verlangen ro...@us2.nl
 To: user@cassandra.apache.org
 Sent: Wed, 07 Mar 2012 22:15
 Subject: Re: Node joining / unknown

 At this moment the node has joined the ring (after a restart: tried that
 before, but now it had finally result).

 When I try to run repair on the new node, the log says (the new node is
 NODE C):

 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE A
  INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE B

 And then doesn't do anything anymore. Tried it a couple of times again.
 It's just not starting.

 Results from netstats on NODE C:

 Mode: NORMAL
 Not sending any streams.
 Not receiving any streams.
 Pool NameActive   Pending  Completed
 Commandsn/a 0  5
 Responses   n/a93   4296


 Any suggestions?

 Thank you!

 2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen






Re: Node joining / unknown

2012-03-07 Thread igor
just run nodetool compactionstat on other nodes.


-Original Message-
From: R. Verlangen ro...@us2.nl
To: user@cassandra.apache.org
Sent: Wed, 07 Mar 2012 23:09
Subject: Re: Node joining / unknown

@Brandon: Thank you for the information. I'll do that next time.

@Igor: Any ways to find out whether that is the current state? And if so,
how to solve it?

2012/3/7 i...@4friends.od.ua

 Maybe it wait for verification compaction on other node?





 -Original Message-
 From: R. Verlangen ro...@us2.nl
 To: user@cassandra.apache.org
 Sent: Wed, 07 Mar 2012 22:15
 Subject: Re: Node joining / unknown

 At this moment the node has joined the ring (after a restart: tried that
 before, but now it had finally result).

 When I try to run repair on the new node, the log says (the new node is
 NODE C):

 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE A
  INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE B

 And then doesn't do anything anymore. Tried it a couple of times again.
 It's just not starting.

 Results from netstats on NODE C:

 Mode: NORMAL
 Not sending any streams.
 Not receiving any streams.
 Pool NameActive   Pending  Completed
 Commandsn/a 0  5
 Responses   n/a93   4296


 Any suggestions?

 Thank you!

 2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen






Large SliceRanges: Reading all results in to memory vs. reading smaller result sub-sets at a time?

2012-03-07 Thread Kevin
When dealing with large SliceRanges, it better to read all the results in to
memory (by setting count to the largest value possible), or is it better
to divide the query in to smaller SliceRange queries? Large in this case
being on the order of millions of rows.

 

There's a footnote concerning SliceRanges on the main Apache Cassandra
project site that reads:

 

.Thrift will materialize the whole result into memory before returning it
to the client, so be aware that you may be better served by iterating
through slices by passing the last value of one call in as the start of the
next instead of increasing count arbitrarily large.

 

. but it doesn't delve in to the reasons why going about things that way is
better. 

 

Can someone shed some light on this? And would the same logic apply to large
KeyRanges?

 



Re: Way to force the propagation of a schema change?

2012-03-07 Thread Tamar Fraenkel
It is my understanding that Hector does it as well (you can send true to
state you want to wait for schema agreement).

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Fri, Mar 2, 2012 at 4:02 PM, Carlo Pires carlopi...@gmail.com wrote:

 After adding a column family you must code your app to wait until schema
 versions became 1 in the cluster to add a new row. This is something that
 must be solved in the application layer.

 In pycassa (python api) the system manager code already do this. I don't
 know about others clients.


 2012/3/2 Tharindu Mathew mcclou...@gmail.com

 Hi everyone,

 I add a column family dynamically and notice that when describe schema
 versions return 2 values. Then it quickly changes back to 1. Sometimes this
 stays at 2 and does not change. Then I cannot insert values to the created
 column family, as it causes an exception.

 Is there a way to force the schema propagation through the thrift API
 (not the CLI)?

 Thanks in advance.

 --
 Regards,

 Tharindu

 blog: http://mackiemathew.com/




 --
   Carlo Pires
   62 8209-1444 TIM
   62 3251-1383
   Skype: carlopires

tokLogo.png