Re: Data Modeling- another question

2012-08-28 Thread Guy Incognito
i would respectfully disagree, what you have said is true but it really 
depends on the use case.


1) do you expect to be doing updates to individual fields of an item, or 
will you always update all fields at once?  if you are doing separate 
updates then the first is definitely easier to handle updates.
2) do you expect to do paging of the list?  this will be easier with the 
json approach, as in the first your item may span across a page boundary 
- not an insurmountable problem by any means, but more complicated 
nonetheless.  this is not

an issue obviously if all your items have the same number of fields.
3) do you expect to read or delete multiple items individually? you may 
have to do multiple reads/deletes of a row if the items are not adjacent 
to each other as you cannot do 'disjoint' slices of columns at the 
moment.  with the json approach you can just specify individual columns 
and you're done.  again this is less of an issue if items have a known 
set of fields, but your list of columns to read/delete may get quite 
large fairly quickly


the first is definitely better if you want to update individual fields, 
read-then-write is not a good idea in cassandra.  but it is more 
complicated for most usage scenarios, so you have to work out if you 
really need the extra flexibility.


On 24/08/2012 13:54, samal wrote:

First is better choice, each filed can be updated separately(write only).
Second you have to take care json yourself (read first-modify-then write).

On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com mailto:roshni.rajago...@wal-mart.com 
wrote:


Hi,

Suppose I have a column family to associate a user to a dynamic
list of items. I want to store 5-10 key  information about the
item,  no specific sorting requirements are there.
I have two options

A) use composite columns
UserId1 : {
 itemid1:Name = Betty Crocker,
 itemid1:Descr = Cake
itemid1:Qty = 5
 itemid2:Name = Nutella,
 itemid2:Descr = Choc spread
itemid2:Qty = 15
}

B) use a json with the data
UserId1 : {
 itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
 itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
}

Which do you suggest would be better?


Regards,
Roshni

This email and any files transmitted with it are confidential and
intended solely for the individual or entity to whom they are
addressed. If you have received this email in error destroy it
immediately. *** Walmart Confidential ***






Re: Data Modeling- another question

2012-08-28 Thread samal
yes, you are right, it depend on use cases.
I suggested it is a better choice not only choice. JSON will be better if
any filed change re-write whole data without reading.
I tend to use JSON more, where my data does not change or very rarely, Like
storing demoralized JSON data for analytic purpose.
I prefer CF and [:scoped] method for frequently updating filed.
{
this.user.cart.category.p1.name:''
this.user.cart.category.p1.unit:''
this.user.cart.category.p1.desc:''
this.user.cart.category.p2.name:''
this.user.cart.category.p2.unit:''
this.user.cart.category.p2.desc:''
}

Yes you are right, Its really about understating app data and its behavior,
not JSON or column, according to that designing DM.

On Tue, Aug 28, 2012 at 12:20 PM, Guy Incognito dnd1...@gmail.com wrote:

  i would respectfully disagree, what you have said is true but it really
 depends on the use case.

 1) do you expect to be doing updates to individual fields of an item, or
 will you always update all fields at once?  if you are doing separate
 updates then the first is definitely easier to handle updates.
 2) do you expect to do paging of the list?  this will be easier with the
 json approach, as in the first your item may span across a page boundary -
 not an insurmountable problem by any means, but more complicated
 nonetheless.  this is not
 an issue obviously if all your items have the same number of fields.
 3) do you expect to read or delete multiple items individually?  you may
 have to do multiple reads/deletes of a row if the items are not adjacent to
 each other as you cannot do 'disjoint' slices of columns at the moment.
 with the json approach you can just specify individual columns and you're
 done.  again this is less of an issue if items have a known set of fields,
 but your list of columns to read/delete may get quite large fairly quickly

 the first is definitely better if you want to update individual fields,
 read-then-write is not a good idea in cassandra.  but it is more
 complicated for most usage scenarios, so you have to work out if you really
 need the extra flexibility.


 On 24/08/2012 13:54, samal wrote:

 First is better choice, each filed can be updated separately(write only).
 Second you have to take care json yourself (read first-modify-then write).

 On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:

 Hi,

 Suppose I have a column family to associate a user to a dynamic list of
 items. I want to store 5-10 key  information about the item,  no specific
 sorting requirements are there.
 I have two options

 A) use composite columns
 UserId1 : {
  itemid1:Name = Betty Crocker,
  itemid1:Descr = Cake
 itemid1:Qty = 5
  itemid2:Name = Nutella,
  itemid2:Descr = Choc spread
 itemid2:Qty = 15
 }

 B) use a json with the data
 UserId1 : {
  itemid1 = {name: Betty Crocker,descr: Cake, Qty: 5},
  itemid2 ={name: Nutella,descr: Choc spread, Qty: 15}
 }

 Which do you suggest would be better?


 Regards,
 Roshni

 This email and any files transmitted with it are confidential and
 intended solely for the individual or entity to whom they are addressed. If
 you have received this email in error destroy it immediately. *** Walmart
 Confidential ***






Re: Commit log periodic sync?

2012-08-28 Thread rubbish me
Thanks again Aaron.

 I think case I would not expect to see data lose. If you are still in a test 
 scenario can you try to reproduce the problem ? If possible can you reproduce 
 it with a single node ?

We will try that later this week. 


We did the same exercise this week, this time we did a flush and snapshot 
before the DR actually happened - as an attempt to identify if the commit logs 
fsync was the problem. 

We can clearly see stables were created for the flush command. 
And those sstables were loaded in when the nodes started up again after the DR 
exercise. 

At this point we believed all nodes had all the data, so we let them serving 
client requests while we run repair on the nodes. 

Data created before the last flush was still missing, according to the client 
that talked to DC1 (the disaster DC). 

We had a look at the log of one of the DC1 nodes. The suspicious thing was that 
latest sstable was being compacted during streaming sessions of the repair. But 
no error was reported. 

Here comes my questions:
- if during the streaming session, the sstable that was about to stream out but 
was being compacted, would we see error in the log?
- could this lead to data not found?
- is it safe to let a node serving read/write requests while repair is running?

Many thanks again. 

- A




aaron morton aa...@thelastpickle.com 於 27 Aug 2012 09:08 寫道:

 Brutally. kill -9.
 that's fine. I was thinking about reboot -f -n
 
 We are wondering if the fsync of the commit log was working.
 I would say yes only because there other reported problems. 
 
 I think case I would not expect to see data lose. If you are still in a test 
 scenario can you try to reproduce the problem ? If possible can you reproduce 
 it with a single node ?
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/08/2012, at 11:00 AM, rubbish me rubbish...@googlemail.com wrote:
 
 Thanks, Aaron, for your reply - please see the inline.
 
 
 On 24 Aug 2012, at 11:04, aaron morton wrote:
 
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
 Is the VM doing anything wacky with the IO ?
 
 Could be.  But I thought we would ask here first.  This is a bit difficult 
 to prove cos we dont have the control over these VMs.
 
  
 
 As part of a DR exercise, we killed all 6 nodes in DC1,
 Nice disaster. Out of interest, what was the shutdown process ?
 
 Brutally. kill -9.
 
 
 
 We noticed that data that was written an hour before the exercise, around 
 the last memtables being flushed,was not found in DC1. 
 To confirm, data was written to DC 1 at CL LOCAL_QUORUM before the DR 
 exercise. 
 
 Was the missing data written before or after the memtable flush ? I'm 
 trying to understand if the data should have been in the commit log or the 
 memtables. 
 
 Missing data was those written after the last flush.  These data was 
 retrievable before the DR exercise.
 
 
 Can you provide some more info on how you are detecting it is not found in 
 DC 1?
 
 
 We tried hector, consistencylevel=local quorum.  We had missing column or 
 the whole row.  
 
 We tried cassandra-cli on DC1 nodes, same.
 
 However once we run the same query on DC2, C* must have then done a 
 read-repair. That particular piece of result data would appear in DC1 again.
 
 
 If we understand correctly, commit logs are being written first and then 
 to disk every 10s. 
 Writes are put into a bounded queue and processed as fast as the IO can 
 keep up. Every 10s a sync messages is added to the queue. Not that the 
 commit log segment may rotate at any time which requires a sync. 
 
 A loss of data across all nodes in a DC seems odd. If you can provide some 
 more information we may be able to help. 
 
 
 We are wondering if the fsync of the commit log was working.  But we saw no 
 errors / warning in logs.  Wondering if there is way to verify
 
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/08/2012, at 6:01 AM, rubbish me rubbish...@googlemail.com wrote:
 
 Hi all
 
 First off, let's introduce the setup. 
 
 - 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
 - keyspace's RF=3 in each DC
 - Hector as client.
 - client talks only to DC1 unless DC1 can't serve the request. In which 
 case talks only to DC2
 - commit log was periodically sync with the default setting of 10s. 
 - consistency policy = LOCAL QUORUM for both read and write. 
 - we are running on production linux VMs (not ideal but this is out of our 
 hands)
 -
 As part of a DR exercise, we killed all 6 nodes in DC1, hector starts 
 talking to DC2, all the data was still there, everything continued to work 
 perfectly. 
 
 Then we brought all nodes, one by one, in DC1 up. We saw a message saying 
 all the commit logs were replayed. No errors reported.  We didn't run 
 repair at this time. 
 
 We noticed that data that was 

Re: Cassandra 1.1.4 RPM required

2012-08-28 Thread Hiller, Dean
You are probably inside a company and the company has a proxy which is doing 
basic auth is my guess…try your company username /password or do it from home.

Dean

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Monday, August 27, 2012 10:59 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Cassandra 1.1.4 RPM required

Dear Aaron, Its required username and password which I have not. Can yo share 
direct link?
There is no security on the wiki, you should be able to see 
http://wiki.apache.org/cassandra/GettingStarted

What about this page ? http://wiki.apache.org/cassandra/DebianPackaging

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/08/2012, at 8:14 PM, Marco Schirrmeister 
ma...@schirrmeister.netmailto:ma...@schirrmeister.net wrote:


On Aug 23, 2012, at 12:15 PM, Adeel Akbar wrote:

Dear Aaron, Its required username and password which I have not. Can yo share 
direct link?


There is no username and password for the Datastax rpm repository.
http://rpm.datastax.com/community/

But there is no 1.1.4 version yet from Datastax.


If you really need a 1.1.4 rpm. You can give my build a shot.
I just started rolling my own packages for some reasons.
Until my public rpm repo goes online, you can grab here the cassandra rpm.
http://people.ogilvy.de/~mschirrmeister/linux/cassandra/

If you want, test it out. It's just a first build and not heavily tested.


Marco




Re: Automating nodetool repair

2012-08-28 Thread Edward Capriolo
You can consider adding -pr. When iterating through all your hosts
like this. -pr means primary range, and will do less duplicated work.

On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner synfina...@gmail.com wrote:
 I use cron.  On one box I just do:

 for n in node1 node2 node3 node4 ; do
nodetool -h $n repair
sleep 120
 done

 A lot easier then managing a bunch of individual crontabs IMHO
 although I suppose I could of done it with puppet, but then you always
 have to keep an eye out that your repairs don't overlap over time.

 On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
 edward.sargis...@globalrelay.net wrote:
 Hi all,
 So nodetool repair has to be run regularly on all nodes. Does anybody have
 any interesting strategies or tools for doing this or is everybody just
 setting up cron to do it?

 For example, one could write some Puppet code to splay the cron times around
 so that only one should be running at once.
 Or, perhaps, a central orchestrator that is given some known quiet time and
 works its way through the list, running nodetool repair one at a time (using
 RPC?) until it runs out of time.

 Cheers,
 Edward
 --

 Edward Sargisson

 senior java developer
 Global Relay

 edward.sargis...@globalrelay.net


 866.484.6630
 New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
 (+65.3158.1301)

 Global Relay Archive supports email, instant messaging, BlackBerry,
 Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
 and more.


 Ask about Global Relay Message — The Future of Collaboration in the
 Financial Services World


 All email sent to or from this address will be retained by Global Relay’s
 email archiving system. This message is intended only for the use of the
 individual or entity to which it is addressed, and may contain information
 that is privileged, confidential, and exempt from disclosure under
 applicable law.  Global Relay will not be liable for any compliance or
 technical information provided herein.  All trademarks are the property of
 their respective owners.



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero


Re: Cassandra upgrade 1.1.4 issue

2012-08-28 Thread Adeel Akbar

I have upgraded jdk from 1.6_u14 to 1.7_u06 and now its working.


Thanks  Regards

*Adeel**Akbar*

On 8/24/2012 8:50 PM, Eric Evans wrote:

On Fri, Aug 24, 2012 at 5:00 AM, Adeel Akbar
adeel.ak...@panasiangroup.com wrote:

I have upgraded cassandra on ring and one node successfully upgraded first
node. On second node I got following error. Please help me to resolve this
issue.

[root@X]# /u/cassandra/apache-cassandra-1.1.4/bin/cassandra -f
xss =  -ea
-javaagent:/u/cassandra/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms502M -Xmx502M
-Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
Segmentation fault

Segmentation faults can be caused by software bugs, or by faulty
hardware.  If it is a software bug, it's very unlikely to be a
Cassandra bug (there should be nothing we could do to cause a JVM
segfault).

I would take a close look at what is different between these two
hosts, starting with the version of JVM.  If you have a core dump,
that might provide some insight (and if you don't, it wouldn't hurt to
get one).

Cheers,





Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-28 Thread Yang
Thanks Nick.

it would be nice to pack such work-arounds together with Cassandra, and
enable it with a command line arg in the start up script,  so that the
end-user has a more
smooth usage experience - we did spend some time before to debug the
ports issues, and find a trick for that.
I imagine a lot of users will have to go through the same process since
closing all ports is the default EC2 network security policy for most
companies. Or at least add a link for the fix in the installation wikis
such as 
http://www.datastax.com/docs/1.1/install/install_amihttps://webmail.iac.com/owa/redir.aspx?C=PLVE6taKpU--Dxw69WVEOtdUcArCWM8IUH6LBjdXcM7STlqwkARq8mA8Nva_mtGaSKmmva4pWxE.URL=http%3a%2f%2fwww.datastax.com%2fdocs%2f1.1%2finstall%2finstall_ami



Yang

On Mon, Aug 27, 2012 at 9:03 PM, Nick Bailey n...@datastax.com wrote:

 The problem still exists. There was a discussion about resolving
 inside cassandra here:

 https://issues.apache.org/jira/browse/CASSANDRA-2967

 But the ultimate resolution was that since workarounds like the one
 you mentioned exist it would be left as is for now.

 On Mon, Aug 27, 2012 at 6:07 PM, Yang tedd...@gmail.com wrote:
  no, the priblem is that jmx listens on 7199, once an incoming connection
 is
  made, it literally tells the other side come and connect to me on these
 2
  rmi ports,  and open up  2 random  Rmi ports
 
  we used to use the trick in the above link to resolve this
 
  On Aug 27, 2012 3:04 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 
  In cassandra-env.sh, search on JMX_PORT and it is set to 7199 (ie.
 Fixed)
  so that solves your issue, correct?
 
  Dean
 
  From: Yang tedd...@gmail.commailto:tedd...@gmail.com
  Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
  user@cassandra.apache.orgmailto:user@cassandra.apache.org
  Date: Monday, August 27, 2012 3:44 PM
  To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
  user@cassandra.apache.orgmailto:user@cassandra.apache.org
  Subject: JMX(RMI) dynamic port allocation problem still exists?
 
  ow, does Cassandra come with an out-of-the box solution to fix the above
  problem? or do I have
  to create that little javaagent jar myself?



Re: An experiment using Spring Data w/ Cassandra (initially via JPA/Kundera)

2012-08-28 Thread Vivek Mishra
Hi,
This support is now implemented in kundera(latest trunk branch).

Let me know, if you have any other question.

On Thu, Jul 19, 2012 at 11:21 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Thanks. Team is working on to extend support for
 SimpleJPARepository(including implementation for ManagedType).

 -Vivek


 On Thu, Jul 19, 2012 at 9:06 AM, Roshan codeva...@gmail.com wrote:

 Hi Brian

 This is basically a wonderful news for me, because we are using lots of
 spring support in the project. Good luck and keep post.

 Cheers

 /Roshan.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/An-experiment-using-Spring-Data-w-Cassandra-initially-via-JPA-Kundera-tp7581319p7581320.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive
 at Nabble.com.





Re: Automating nodetool repair

2012-08-28 Thread Aaron Turner
Funny you mention that... i just was hearing on #cassandra this
morning that it repairs the replica set by default.  I was thinking of
repairing every 3rd node (RF=3), but running -pr seems cleaner.

Do you know if this (repairing a replica vs node) was introduced in 1.0 or 1.1?

On Tue, Aug 28, 2012 at 7:03 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 You can consider adding -pr. When iterating through all your hosts
 like this. -pr means primary range, and will do less duplicated work.

 On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner synfina...@gmail.com wrote:
 I use cron.  On one box I just do:

 for n in node1 node2 node3 node4 ; do
nodetool -h $n repair
sleep 120
 done

 A lot easier then managing a bunch of individual crontabs IMHO
 although I suppose I could of done it with puppet, but then you always
 have to keep an eye out that your repairs don't overlap over time.

 On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
 edward.sargis...@globalrelay.net wrote:
 Hi all,
 So nodetool repair has to be run regularly on all nodes. Does anybody have
 any interesting strategies or tools for doing this or is everybody just
 setting up cron to do it?

 For example, one could write some Puppet code to splay the cron times around
 so that only one should be running at once.
 Or, perhaps, a central orchestrator that is given some known quiet time and
 works its way through the list, running nodetool repair one at a time (using
 RPC?) until it runs out of time.

 Cheers,
 Edward
 --

 Edward Sargisson

 senior java developer
 Global Relay

 edward.sargis...@globalrelay.net


 866.484.6630
 New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
 (+65.3158.1301)

 Global Relay Archive supports email, instant messaging, BlackBerry,
 Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
 and more.


 Ask about Global Relay Message — The Future of Collaboration in the
 Financial Services World


 All email sent to or from this address will be retained by Global Relay’s
 email archiving system. This message is intended only for the use of the
 individual or entity to which it is addressed, and may contain information
 that is privileged, confidential, and exempt from disclosure under
 applicable law.  Global Relay will not be liable for any compliance or
 technical information provided herein.  All trademarks are the property of
 their respective owners.



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: Automating nodetool repair

2012-08-28 Thread Mohit Agarwal
Is there any reason why cassandra doesn't do nodetool repair out of the box
at some fixed intervals?

On Tue, Aug 28, 2012 at 9:08 PM, Aaron Turner synfina...@gmail.com wrote:

 Funny you mention that... i just was hearing on #cassandra this
 morning that it repairs the replica set by default.  I was thinking of
 repairing every 3rd node (RF=3), but running -pr seems cleaner.

 Do you know if this (repairing a replica vs node) was introduced in 1.0 or
 1.1?

 On Tue, Aug 28, 2012 at 7:03 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  You can consider adding -pr. When iterating through all your hosts
  like this. -pr means primary range, and will do less duplicated work.
 
  On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner synfina...@gmail.com
 wrote:
  I use cron.  On one box I just do:
 
  for n in node1 node2 node3 node4 ; do
 nodetool -h $n repair
 sleep 120
  done
 
  A lot easier then managing a bunch of individual crontabs IMHO
  although I suppose I could of done it with puppet, but then you always
  have to keep an eye out that your repairs don't overlap over time.
 
  On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
  edward.sargis...@globalrelay.net wrote:
  Hi all,
  So nodetool repair has to be run regularly on all nodes. Does anybody
 have
  any interesting strategies or tools for doing this or is everybody just
  setting up cron to do it?
 
  For example, one could write some Puppet code to splay the cron times
 around
  so that only one should be running at once.
  Or, perhaps, a central orchestrator that is given some known quiet
 time and
  works its way through the list, running nodetool repair one at a time
 (using
  RPC?) until it runs out of time.
 
  Cheers,
  Edward
  --
 
  Edward Sargisson
 
  senior java developer
  Global Relay
 
  edward.sargis...@globalrelay.net
 
 
  866.484.6630
  New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |
  Singapore
  (+65.3158.1301)
 
  Global Relay Archive supports email, instant messaging, BlackBerry,
  Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter,
 Facebook
  and more.
 
 
  Ask about Global Relay Message — The Future of Collaboration in the
  Financial Services World
 
 
  All email sent to or from this address will be retained by Global
 Relay’s
  email archiving system. This message is intended only for the use of
 the
  individual or entity to which it is addressed, and may contain
 information
  that is privileged, confidential, and exempt from disclosure under
  applicable law.  Global Relay will not be liable for any compliance or
  technical information provided herein.  All trademarks are the
 property of
  their respective owners.
 
 
 
  --
  Aaron Turner
  http://synfin.net/ Twitter: @synfinatic
  http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix
  Windows
  Those who would give up essential Liberty, to purchase a little
 temporary
  Safety, deserve neither Liberty nor Safety.
  -- Benjamin Franklin
  carpe diem quam minimum credula postero



 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero



RE: Expanding cluster to include a new DR datacenter

2012-08-28 Thread Bryce Godfrey
So in an interesting turn of events, this works on my other 4 keyspaces but 
just not this 'EBonding' one which will not recognize the changes.  I can 
probably get around this by dropping and re-creating this keyspace since its 
uptime is not too important for us.

[default@AlertStats] describe AlertStats;
Keyspace: AlertStats:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [Fisher:3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 3:50 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

Can you describe your schema again with TierPoint in it?
On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia 
[mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... and strategy_options={Fisher:2};
9511e292-f1b6-3f78-b781-4c90aeb6b0f6
Waiting for schema agreement...
... schemas agree across the cluster

Then I do a describe and it still shows the old strategy.  Is there something 
else that I need to do?  I've exited and restarted Cassandra-cli and it still 
shows the SimpleStrategy for that keyspace.  Other nodes show the same 
information.

[default@EBonding] describe EBonding;
Keyspace: EBonding:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]


From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com]
Sent: Thursday, August 23, 2012 11:06 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Expanding cluster to include a 

Advantage of pre-defining column metadata

2012-08-28 Thread A J
For static column family what is the advantage in pre-defining column metadata ?

I can see ease of understanding type of values that the CF contains
and that clients will reject incompatible insertion.

But are there any major advantages in terms of performance or
something else that makes it beneficial to define the metadata upfront
?

Thanks.


Re: Node forgets about most of its column families

2012-08-28 Thread Edward Sargisson

For the record, we just had a recurrence of this.
This time, when the node (#5) came back it didn't properly rejoin the ring.
We stopped every node and brought them back one by one to get the ring 
to link up correctly.

Then, all the even nodes (#2, #4, #6) had out of data schemas.

nodetool resetlocalschema works.
But the following nodetool repair crashes. It has to be stopped and then 
re-started.


Are there any suggestions for logging or similar so that we can get a 
clue next time this happens.


Cheers,
Edward


On 12-08-24 11:18 AM, Edward Sargisson wrote:

Sadly, I don't think we can get much.

All I know about the repro is that it was around a node restart. I've 
just tried that and everything's fine. I see now ERROR level messages 
in the logs.


Clearly, some other conditions are required but we don't know them as yet.

Many thanks,
Edward


On 12-08-24 03:29 AM, aaron morton wrote:
If this is still a test environment can you try to reproduce the 
fault ? Or provide some more details on the sequence of events?


If you still have the logs around can you see if any ERROR level 
messages were logged?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 8:33 AM, Edward Sargisson 
edward.sargis...@globalrelay.net 
mailto:edward.sargis...@globalrelay.net wrote:



Ah, yes, I forgot that bit thanks!

1.1.2 running on Centos.

Running nodetool resetlocalschema then nodetool repair fixed the 
problem but not understanding what happened is a concern.


Cheers,
Edward


On 12-08-23 12:40 PM, Rob Coli wrote:

On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
edward.sargis...@globalrelay.net  wrote:

I was wondering if anybody had seen the following behaviour before and how
we might detect it and keep the application running.

I don't know the answer to your problem, but anyone who does will want
to know in what version of Cassandra you are encountering this issue.
:)

=Rob



--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net 
mailto:edward.sargis...@globalrelay.net



*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
Singapore (+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for 
the use of the individual or entity to which it is addressed, and 
may contain information that is privileged, confidential, and exempt 
from disclosure under applicable law.  Global Relay will not be 
liable for any compliance or technical information provided herein.  
All trademarks are the property of their respective owners.






--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | 
Singapore (+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein.  All 
trademarks are the property of their respective owners.




--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the 

Re: Advantage of pre-defining column metadata

2012-08-28 Thread Edward Capriolo
Setting the metadata will set the validation. If you insert to a
column that is supposed to only INT values Cassandra will reject non
INT data on insert time.

Also comparator can not be changed, you only get once chance to set
the column sorting.


On Tue, Aug 28, 2012 at 3:34 PM, A J s5a...@gmail.com wrote:
 For static column family what is the advantage in pre-defining column 
 metadata ?

 I can see ease of understanding type of values that the CF contains
 and that clients will reject incompatible insertion.

 But are there any major advantages in terms of performance or
 something else that makes it beneficial to define the metadata upfront
 ?

 Thanks.


Re: Automating nodetool repair

2012-08-28 Thread Edward Sargisson

Thanks a very nice approach.

If every nodetool repair uses -pr does that satisfy the requirement to 
run a repair before GCGraceSeconds expires? In otherwords, will we get a 
correct result using -pr everywhere.


Secondly, what's the need for sleep 120?

Cheers,
Edward

On 12-08-28 07:03 AM, Edward Capriolo wrote:

You can consider adding -pr. When iterating through all your hosts
like this. -pr means primary range, and will do less duplicated work.

On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner synfina...@gmail.com wrote:

I use cron.  On one box I just do:

for n in node1 node2 node3 node4 ; do
nodetool -h $n repair
sleep 120
done

A lot easier then managing a bunch of individual crontabs IMHO
although I suppose I could of done it with puppet, but then you always
have to keep an eye out that your repairs don't overlap over time.

On Mon, Aug 27, 2012 at 4:52 PM, Edward Sargisson
edward.sargis...@globalrelay.net wrote:

Hi all,
So nodetool repair has to be run regularly on all nodes. Does anybody have
any interesting strategies or tools for doing this or is everybody just
setting up cron to do it?

For example, one could write some Puppet code to splay the cron times around
so that only one should be running at once.
Or, perhaps, a central orchestrator that is given some known quiet time and
works its way through the list, running nodetool repair one at a time (using
RPC?) until it runs out of time.

Cheers,
Edward
--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net


866.484.6630
New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry,
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook
and more.


Ask about Global Relay Message — The Future of Collaboration in the
Financial Services World


All email sent to or from this address will be retained by Global Relay’s
email archiving system. This message is intended only for the use of the
individual or entity to which it is addressed, and may contain information
that is privileged, confidential, and exempt from disclosure under
applicable law.  Global Relay will not be liable for any compliance or
technical information provided herein.  All trademarks are the property of
their respective owners.



--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
carpe diem quam minimum credula postero


--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*— *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




RE: Expanding cluster to include a new DR datacenter

2012-08-28 Thread Bryce Godfrey
I believe what may be really going on is that my schema is in a bad or corrupt 
state.  I also have one keyspace that I just cannot drop an existing column 
family from even though it shows no errors.

So right now I was able to get 4 of my 6 keyspaces over to Network Topology 
strategy.

I think I got into this bad state after pointing Opscenter at this cluster for 
the first time, as it started throwing errors after that and crashed a couple 
of my nodes until I stopped it and its agents.

Is there a way I can confirm this or go about cleaning up/restoring the proper 
schema?

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Tuesday, August 28, 2012 11:09 AM
To: user@cassandra.apache.org
Subject: RE: Expanding cluster to include a new DR datacenter

So in an interesting turn of events, this works on my other 4 keyspaces but 
just not this 'EBonding' one which will not recognize the changes.  I can 
probably get around this by dropping and re-creating this keyspace since its 
uptime is not too important for us.

[default@AlertStats] describe AlertStats;
Keyspace: AlertStats:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [Fisher:3]

From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 3:50 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

Can you describe your schema again with TierPoint in it?
On Mon, Aug 27, 2012 at 3:22 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Same results.  I restarted the node also to see if it just wasn't picking up 
the changes and it still shows Simple.

When I specify the DC for strategy_options I should be using the DC name from 
properfy file snitch right?  Ours is Fisher and TierPoint so that's what I 
used.

From: Mohit Anchlia 
[mailto:mohitanch...@gmail.commailto:mohitanch...@gmail.com]
Sent: Monday, August 27, 2012 1:21 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

In your update command is it possible to specify RF for both DC? You could just 
do DC1:2, DC2:0.
On Mon, Aug 27, 2012 at 11:16 AM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
Show schema output show the simple strategy still
[default@unknown] show schema EBonding;
create keyspace EBonding
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 2}
  and durable_writes = true;

This is the only thing I see in the system log at the time on all the nodes:

INFO [MigrationStage:1] 2012-08-27 10:54:18,608 ColumnFamilyStore.java (line 
659) Enqueuing flush of Memtable-schema_keyspaces@1157216346(183/228 
serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,612 Memtable.java (line 264) Writing 
Memtable-schema_keyspaces@1157216346(183/228 serialized/live bytes, 4 ops)
INFO [FlushWriter:765] 2012-08-27 10:54:18,627 Memtable.java (line 305) 
Completed flushing 
/opt/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-he-34817-Data.db
 (241 bytes) for commitlog p$


Should I turn the logging level up on something to see some more info maybe?

From: aaron morton 
[mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com]
Sent: Monday, August 27, 2012 1:35 AM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

I did a quick test on a clean 1.1.4 and it worked

Can you check the logs for errors ? Can you see your schema change in there ?

Also what is the output from show schema; in the cli ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 25/08/2012, at 6:53 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:

Yes

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
9511e292-f1b6-3f78-b781-4c90aeb6b0f6: [10.20.8.4, 10.20.8.5, 10.20.8.1, 
10.20.8.2, 10.20.8.3]

From: Mohit Anchlia 
[mailto:mohitanchlia@mailto:mohitanchlia@gmail.comhttp://gmail.com/]
Sent: Friday, August 24, 2012 1:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

That's interesting can you do describe cluster?
On Fri, Aug 24, 2012 at 12:11 PM, Bryce Godfrey 
bryce.godf...@azaleos.commailto:bryce.godf...@azaleos.com wrote:
So I'm at the point of updating the keyspaces from Simple to NetworkTopology 
and I'm not sure if the changes are being accepted using Cassandra-cli.

I issue the change:

[default@EBonding] update keyspace EBonding
... with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
... 

Re: Automating nodetool repair

2012-08-28 Thread Aaron Turner
On Tue, Aug 28, 2012 at 1:42 PM, Edward Sargisson
edward.sargis...@globalrelay.net wrote:
 Thanks a very nice approach.

 If every nodetool repair uses -pr does that satisfy the requirement to run a
 repair before GCGraceSeconds expires? In otherwords, will we get a correct
 result using -pr everywhere.

Yep.

 Secondly, what's the need for sleep 120?

just give the cluster a chance to settle down between repairs...
there's no real need for it, just is there because.

-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: Automating nodetool repair

2012-08-28 Thread Omid Aladini
  Secondly, what's the need for sleep 120?

 just give the cluster a chance to settle down between repairs...
 there's no real need for it, just is there because.

Actually, repair could cause unreplicated data to be streamed and new
sstables to be created. New sstables could cause pending compactions
and increase the potential number of sstables a row could be spread
across. Therefore you might need more disk seeks to read a row and
have slower read response time. If the read response time is critical,
it's a good idea to wait for pending compactions to settle before
repairing other neighbouring ranges that overlap replicas.

-- Omid

 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero


Re: Node forgets about most of its column families

2012-08-28 Thread Peter Schuller
I can confirm having seen this (no time to debug). One method of
recovery is to jump the node back into the ring with auto_bootstrap
set to false and an appropriate token set, after deleting system
tables. That assumes you're willing to have the node take a few bad
reads until you're able to disablegossip and make other nodes not send
requests to it. disabling thrift would also be advised, or even
firewalling it prior to restart.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-28 Thread Dave Brosius
If i understand you correctly, you are only ever querying for the rows 
where is_exported = false, and turning them into trues. What this means 
is that eventually you will have 1 row in the secondary index table with 
350K columns that you will never look at.


It seems to me you that perhaps you should just hold your own manual 
index cf that points to non exported rows, and just delete those 
columns when they are exported.



On 08/28/2012 05:23 PM, Edward Kibardin wrote:
I have a column family with the secondary index. The secondary index 
is basically a binary field, but I'm using a string for it. The field 
called *is_exported* and can be *'true'* or *'false'*. After request 
all loaded rows are updated with *is_exported = 'false'*.


I'm polling this column table each ten minutes and exporting new rows 
as they appear.


But here the problem: I'm seeing that time for this query grows pretty 
linear with amount of data in column table, and currently it takes 
*from 12 to 20 seconds (!!!) to find 5000 rows*. From my 
understanding, indexed request should not depend on number of rows in 
CF but from number of rows per one index value (cardinality), as it's 
just another hidden CF like:


true : rowKey1 rowKey2 rowKey3 ...
false: rowKey1 rowKey2 rowKey3 ...

I'm using Pycassa to query the data, here the code I'm using:

column_family = pycassa.ColumnFamily(cassandra_pool, 
column_family_name, read_consistency_level=2)

is_exported_expr = create_index_expression('is_exported', 'false')
clause = create_index_clause([is_exported_expr], count = 5000)
column_family.get_indexed_slices(clause)

Am I doing something wrong, but I expect this operation to work MUCH 
faster.


Any ideas or suggestions?

Some config info:
 - Cassandra 1.1.0
 - RandomPartitioner
 - I have 2 nodes and replication_factor = 2 (each server has a full 
data copy)

 - Using AWS EC2, large instances
 - Software raid0 on ephemeral drives

Thanks in advance!





Re: can you use hostnames in the topology file?

2012-08-28 Thread aaron morton
All host names are resolved to IP addresses. 

  does the listen_address have to be hardwired to that same EXACT hostname for 
 lookup purposes as well?
Not sure exactly what you mean here. 
I think if you only supply the host name, and your DNS is set correctly will 
work. 

Hope that helps. 
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 5:00 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 In the example, I see all ips being used, but our machines are on dhcp so I 
 would prefer using hostnames for everything(plus if a machine goes down, I 
 can bring it back online on another machine with a different ip but same 
 hostname).
 
 If I use hostname, does the listen_address have to be hardwired to that same 
 EXACT hostname for lookup purposes as well?  Or will localhost grab the 
 hostname though it looks like it grabs the ip.
 
 Thanks,
 Dean



Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-28 Thread aaron morton
 dataset... just under 4 months of data is less then 2GB!  I'm pretty
 thrilled.
Be thrilled by all the compressions ! :)

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 6:10 AM, Aaron Turner synfina...@gmail.com wrote:

 On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.
 
 Sort of. We only want one instance of the row per SSTable created.
 
 Ah, good clarification, although I think for my purposes they're one
 in the same.
 
 
 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?
 
 Less SSTables means less compaction. So go as high as you can on the
 bufferSizeInMB param for the
 SSTableSimpleUnsortedWriter.
 
 Ok.
 
 There is also a SSTableSimpleWriter. Because it expects rows to be ordered
 it does not buffer and can create bigger sstables.
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java
 
 Hmmm prolly not realistic in my situation... doing so would likely
 thrash the disks on my PG server a lot more and kill my read
 throughput and that server is already hitting a wall.
 
 
 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical
 
 ingest all the histories!
 
 Actually, I was a little worried about how much space that would
 take... my estimates was ~305GB/year, which is a lot when you consider
 the 300-400GB/node limit (something I didn't know about at the time).
 However, compression has turned out to be extremely efficient on my
 dataset... just under 4 months of data is less then 2GB!  I'm pretty
 thrilled.
 
 
 -- 
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: sstableloader error

2012-08-28 Thread aaron morton
 WARN 21:41:15,200 Failed attempt 1 to connect to /10.245.28.232 to stream 
 null. Retrying in 2 ms. (java.net.ConnectException: Connection timed out)
If you let sstable run does it complete ?

 I am running cassandra on foreground. So, on all of the cassandra nodes i get 
 the below message:
  INFO 21:40:30,335 Node /192.168.11.11 is now part of the cluster
This is the bulk load process joining the ring to send the file around. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 10:56 AM, Swathi Vikas swat.vi...@yahoo.com wrote:

 Hi,
  
 I had uploaded data using sstablelaoder to a single node cluster earlier 
 without any problem. Now, while trying to upload to 3 node cluster it is 
 giving me below error:
  
 localhost:~/apache-cassandra-1.0.7/sstableloader_folder # bin/sstableloader 
 DEMO/
 Starting client (and waiting 30 seconds for gossip) ...
 Streaming revelant part of DEMO/UMD-hc-1-Data.db to [/10.245.28.232, 
 /10.245.28.231, /10.245.28.230]
 progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 21:41:15,200 
 Failed attempt 1 to connect to /10.245.28.232 to stream null. Retrying in 
 2 ms. (java.net.ConnectException: Connection timed out)
 progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 
 0MB/s)]^Clocalhost:~/apache-cassandra-1.0.7/sstableloader_folder #
 I am running cassandra on foreground. So, on all of the cassandra nodes i get 
 the below message:
  INFO 21:40:30,335 Node /192.168.11.11 is now part of the cluster
  INFO 21:40:30,336 InetAddress /192.168.11.11 is now UP
  INFO 21:41:55,320 InetAddress /192.168.11.11 is now dead.
  INFO 21:41:55,321 FatClient /192.168.11.11 has been silent for 3ms, 
 removing from gossip
 I used ByteOrderPartitioner and filled intial token on all nodes.
 I have set seeds as 10.245.28.230,10.245.28.231
 I have properly set listen address, rpc_address(0.0.0.0) and ports
  
 One thing i noticed is that, when i try to connect to this cluster using 
 client(libQtCassandra) and try to create column family, all the nodes respond 
 and column family got created properly.
  
 Can anyone help me please.
  
 Thanks and Regards,
 Swat.vikas