from:"Sasha Dolgy"

Re: which high level Java client

2012-06-28 Thread Sasha Dolgy

Not following this thread too much, but there is also
https://github.com/Netflix/astyanax/

Astyanax is currently in use at Netflix http://movies.netflix.com/.
Issues generally are fixed as quickly as possbile and releases done
frequently.

-sd

On Thu, Jun 28, 2012 at 2:39 PM, Poziombka, Wade L 
wade.l.poziom...@intel.com wrote:

 I use Pelops and have been very happy.  In my opinion the interface is
 cleaner than that with Hector.  I personally do like the serializer
 business.

 -Original Message-
 From: Radim Kolar [mailto:h...@filez.com]
 Sent: Thursday, June 28, 2012 5:06 AM
 To: user@cassandra.apache.org
 Subject: Re: which high level Java client

 i do not have experience with other clients, only hector. But timeout
 management in hector is really broken. If you expect your nodes to timeout
 often (for example, if you are using WAN) better to try something else
 first.




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: portability between enterprise and community version

2012-06-13 Thread Sasha Dolgy

I consistently move keyspaces from linux machines onto windows machines for
development purposes.  I've had no issues ... but would probably be
hesitant in rolling this out into a productive instance.  Depends on the
level of risk you want to take.  : )  Run some tests ... mix things up and
share your experiences ... Personally, I could see some value in not really
caring what OS my cassandra instances are running on ... just that the
JVM's are consistent and the available hardware resources are sufficient

I don't speak for the vendors mentioned in this thread, but traditionally,
the first step towards supportability is finding the problems /
identifying the risks and see if they can be resolved ...

-sd

On Wed, Jun 13, 2012 at 10:26 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  Repair (streaming) will not work.

 ** **

 Probably schema update will not work also, it was long time ago, don’t
 remember.

 ** **

 Migration of the cluster between Windows and Linux also not an easy task,
 a lot of manual work.

 ** **

 Finally, mixed Cassandra environments are not supported as by DataStax as
 by anyone else.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* Abhijit Chanda [mailto:abhijit.chan...@gmail.com]
 *Sent:* Wednesday, June 13, 2012 10:54
 *To:* user@cassandra.apache.org
 *Subject:* Re: portability between enterprise and community version

 ** **

 Hi Viktor Jevadokimov,

 May i know what are the issues i may face if i mix windows cluster along
 with linux cluster.

signature-logo29.png

Re: RESTful API for GET

2012-06-12 Thread Sasha Dolgy

https://github.com/hmsonline/virgil Brian O'Neill posted this a while ago
... sits on top of Cassandra to give you the RESTful API you want

Another option ... http://code.google.com/p/restish/

Or, you could simply build your own ...


On Tue, Jun 12, 2012 at 8:46 AM, Tom fivemile...@gmail.com wrote:

 Hi James,

 No, Cassandra doesn't supports a RESTful api.

 As Tamar points out, you have to supply this functionality yourself
 specifically for your data model.

 When designing your RESTful server application:
 - consider using a RESTful framework (for example: Jersey)
 - use a cassandra client to access your Cassandra data (for example:
 astyanax)

 Good luck,
 Tom



 On 06/11/2012 11:15 PM, James Pirz wrote:

 Hi,

 Thanks for the reply,
 But can you tell me how do you form your request URLs,
 I mean does Cassandra support a native RESTful api for talking to the
 system, and if yes, on which specific port it is listening for the
 coming requests ? and what does it expect for the format for the URLs ?

 Thanks in advance,

 James

 On Mon, Jun 11, 2012 at 11:09 PM, Tamar Fraenkel ta...@tok-media.com
 mailto:ta...@tok-media.com wrote:

Hi!
I am using java and jersey.
Works fine,

*Tamar Fraenkel *

Senior Software Engineer, TOK Media

Inline image 1

ta...@tok-media.com mailto:ta...@tok-media.com

Tel: +972 2 6409736
Mob: +972 54 8356490
Fax: +972 2 5612956





On Tue, Jun 12, 2012 at 9:06 AM, James Pirz james.p...@gmail.com
mailto:james.p...@gmail.com wrote:

Dear all,

I am trying to query the system, specifically performing a GET
for a specific key, through Jmeter (or CURL)
and I am wondering what is the best pure RESTful API for the
system (with the lowest overhead) that I can use.

Thanks,
James







-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: how to configure cassandra as multi tenant

2012-06-11 Thread Sasha Dolgy

Google, man.

http://wiki.apache.org/cassandra/MultiTenant
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/about-multitenant-datamodel-td7575966.html
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/For-multi-tenant-is-it-good-to-have-a-key-space-for-each-tenant-td6723290.html

On Mon, Jun 11, 2012 at 11:37 AM, MOHD ARSHAD SALEEM 
marshadsal...@tataelxsi.co.in wrote:

  Hi Aaron,

 Can you send me some particular link related to multi tenant research

 Regards
 Arshad
  --
 *From:* aaron morton [aa...@thelastpickle.com]
 *Sent:* Thursday, June 07, 2012 3:34 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: how to configure cassandra as multi tenant

  Cassandra is not designed to run as a multi tenant database.

  There have been some recent discussions on this, search the user group
 for more detailed answers.

  Cheers

-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 7/06/2012, at 7:03 PM, MOHD ARSHAD SALEEM wrote:

   Hi All,

 I wanted to know how to use cassandra as a multi tenant .

 Regards
 Arshad





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: how to configure cassandra as multi tenant

2012-06-11 Thread Sasha Dolgy

Arshad,

I used google with the following query: apache cassandra multitenant

Suggest you do the same? As was mentioned earlier, there has been a lot of
discussion about this topic for the past year -- especially on this mailing
list. If you want to use Thrift or, to make your life easier, using Hector
or a similar API, you can create keyspaces however you want ... aligned to
your design / architecture to support Multitenancy. If it's code specific
help you want ... check out the maililng lists / resources for the various
API's that make working with Thrift easier:

Hector
Pycassa
PHPCassa

etc.

-sd

On Mon, Jun 11, 2012 at 12:05 PM, MOHD ARSHAD SALEEM
marshadsal...@tataelxsi.co.in wrote:

Hi Sasha,

Thanks for your reply. but what you send this is just to create keyspace
manually
using command prompt.how to create keyspace(Multi tenant) automatically
using cassandra API's.

Regards
Arshad

--
*From:* Sasha Dolgy [sdo...@gmail.com]
*Sent:* Monday, June 11, 2012 3:09 PM

*To:* user@cassandra.apache.org
*Subject:* Re: how to configure cassandra as multi tenant

Google, man.

http://wiki.apache.org/cassandra/MultiTenant

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/about-multitenant-datamodel-td7575966.html

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/For-multi-tenant-is-it-good-to-have-a-key-space-for-each-tenant-td6723290.html

On Mon, Jun 11, 2012 at 11:37 AM, MOHD ARSHAD SALEEM
marshadsal...@tataelxsi.co.in wrote:

Hi Aaron,

Can you send me some particular link related to multi tenant research

Regards
Arshad
--
*From:* aaron morton [aa...@thelastpickle.com]
*Sent:* Thursday, June 07, 2012 3:34 PM
*To:* user@cassandra.apache.org
*Subject:* Re: how to configure cassandra as multi tenant

Cassandra is not designed to run as a multi tenant database.

There have been some recent discussions on this, search the user group
for more detailed answers.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/06/2012, at 7:03 PM, MOHD ARSHAD SALEEM wrote:

Hi All,

I wanted to know how to use cassandra as a multi tenant .

Regards
Arshad

--
Sasha Dolgy
sasha.do...@gmail.com

Zurich / Swiss / Alps meetup

2012-05-17 Thread Sasha Dolgy

All,

A year ago I made a simple query to see if there were any users based in
and around Zurich, Switzerland or the Alps region, interested in
participating in some form of Cassandra User Group / Meetup.  At the time,
1-2 replies happened.  I didn't do much with that.

Let's try this again.  Who all is interested?  I often am jealous about all
the fun I miss out on with the regular meetups that happen stateside ...

Regards,
-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Matthew Dennis's Cassandra On EC2

2012-05-17 Thread Sasha Dolgy

Although, probably inappropriate, I would be willing to contribute some
funds for someone to recreate it with animated stick-figures.

thanks. ;)

On Thu, May 17, 2012 at 6:02 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 Sorry - it was at the austin cassandra meetup and we didn't record the
 presentation.  I wonder if this would be a popular topic to have at the
 upcoming Cassandra SF event which would be recorded...

Re: unsubscribe

2012-04-16 Thread Sasha Dolgy

List-Help: mailto:user-h...@cassandra.apache.org
List-Unsubscribe:
mailto:user-unsubscr...@cassandra.apache.orguser-unsubscr...@cassandra.apache.org
http://wiki.apache.org/cassandra/FAQ#unsubscribe
On Mon, Apr 16, 2012 at 8:53 AM, Dirk Dittmar d.ditt...@wortzwei.de wrote:

RE: Using Thrift

2012-04-02 Thread Sasha Dolgy

Best to read about maven.  Save you some grief.
On Apr 2, 2012 3:05 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in
wrote:

  I didn’t fine slf4j files in distribution. So I downloaded them can you
 help me how to configure it.



 *From:* Dave Brosius [mailto:dbros...@mebigfatguy.com]
 *Sent:* Monday, April 02, 2012 6:28 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Using Thrift



 For a thrift client, you need the following jars at a minimum

 apache-cassandra-clientutil-*.jar
 apache-cassandra-thrift-*.jar
 libthrift-*.jar
 slf4j-api-*.jar
 slf4j-log4j12-*.jar

 all of these jars can be found in the cassandra distribution.



 On 04/02/2012 07:40 AM, Rishabh Agrawal wrote:

 Any suggestions….



 *From:* Rishabh Agrawal
 *Sent:* Monday, April 02, 2012 4:42 PM
 *To:* user@cassandra.apache.org
 *Subject:* Using Thrift



 Hello,



 I have just started exploring Cassandra from java side and using wish to
 use thrift as my api. The problem is whenever is I try to compile my java
 code I get following error :



 “package org.slf4j does not exist”



 Can anyone help me with this.



 Thanks and Regards

 Rishabh Agrawal


  --


 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’ available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.


  --


 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’ available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.



 --

 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’ available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.

Re: How to store a list of values?

2012-03-26 Thread Sasha Dolgy

Save the skills in a single column in json format.  Job done.
On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null. Columns
 can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills in
 each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore would 
 not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine there's
 some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which 
 only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward in
 that case.  Though I'm still having some trouble understanding composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl

Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Sasha Dolgy

interesting.  that behaviour _does_ happen in 1.0.8, but doesn't in 1.0.6
on windows 7 with Java 7.  looks to be a problem with the CLI and not the
actual Cassandra service.

just tried it now.

-sd

On Mon, Mar 26, 2012 at 11:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6 runtime,
 Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread Sasha Dolgy

best to open an issue:   https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 err ...  same thing happens with Java 1.6



 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote:

 I'm using the latest of Java 1.6 from Oracle.



 On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 Ben Coverston wrote earlier today:  Use a version of the Java 6
 runtime, Cassandra hasn't been tested at all with the Java 7 runtime

 So I think that might be a good way to start.

 2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor 
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for
 stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl




 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Sasha Dolgy
sasha.do...@gmail.com

design that mimics twitter tweet search

2012-03-18 Thread Sasha Dolgy

Hi All,

With twitter, when I search for words like:  cassandra is the bestest, 4
tweets will appear, including one i just did.  My understand that the
internals of twitter work in that each word in a tweet is allocated,
irrespective of the presence of a  # hash tag, and the tweet id is assigned
to a row for that word.  What is puzzling to me, and hopeful that some
smart people on here can shed some light on -- is how would this work with
Cassandra?

row [ cassandra ]: key - tweetid  / timestamp
row [ bestest ]: key - tweetid / timestamp

I had thought that I could simply pull a list of all column names from each
row (representing each word) and flag all occurrences (tweet id's) that
exist in each row ... however, these rows would get quite long over time.

Am I missing an easier way to get a list of all tweetid's that exist in
multiple rows?

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: design that mimics twitter tweet search

2012-03-18 Thread Sasha Dolgy

yes -- but given i have two keywords, and want to find all tweets that have
cassandra and bestest ... means, retrieving all columns + values in
each row, iterating through both to see if tweet id's in one, exist in the
other and finishing up with a consolidated list of tweet id's that only
exist in both.  just seems clunky to me ... ?

On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud ben...@noisette.ch wrote:

 The simpliest modeling you could have is using the keyword as key, a
 timestamp/time UUID as column name and the tweetid as value

 - cf['keyword']['timestamp'] = tweetid

 then you do a range query to get all tweetid sorted by time (you may
 want them in reverse order) and you can limit to the number of tweets
 displayed on the page.

 As some rows can become large, you could use key patitionning by
 concatening for instance keyword and the month and year.


 2012/3/18 Sasha Dolgy sdo...@gmail.com:
  Hi All,
 
  With twitter, when I search for words like:  cassandra is the bestest,
 4
  tweets will appear, including one i just did.  My understand that the
  internals of twitter work in that each word in a tweet is allocated,
  irrespective of the presence of a  # hash tag, and the tweet id is
 assigned
  to a row for that word.  What is puzzling to me, and hopeful that some
 smart
  people on here can shed some light on -- is how would this work with
  Cassandra?
 
  row [ cassandra ]: key - tweetid  / timestamp
  row [ bestest ]: key - tweetid / timestamp
 
  I had thought that I could simply pull a list of all column names from
 each
  row (representing each word) and flag all occurrences (tweet id's) that
  exist in each row ... however, these rows would get quite long over time.
 
  Am I missing an easier way to get a list of all tweetid's that exist in
  multiple rows?
 
  --
  Sasha Dolgy
  sasha.do...@gmail.com



 --
 sent from my Nokia 3210




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: data model question

2012-03-12 Thread Sasha Dolgy

Alternate would be to add another row to your user CF specific for Facebook
ids.  Column ID would be the Facebook identifier and value would be your
internal uuid.

Consider when you want to add another service like twitter.  Will you then
add another CF per service or just another row specific now to twitter
ID's.  Queries will be easy still as its against a single row in the same
CF.
On Mar 12, 2012 10:14 AM, aaron morton aa...@thelastpickle.com wrote:

 In this case, where you know the query upfront, I add a custom secondary
 index using another CF to support the query. It's a little easier here
 because the data wont change.

 UserLookupCF (using composite types for the key value)

 row_key: system_name:id e.g. facebook:12345 or twitter:12345
 col_name : internal_user_id e.g. 5678
 col_value: empty

 Hope that helps.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/03/2012, at 11:15 PM, Tamar Fraenkel wrote:

 Hi!
 Thanks for the response.
 From what I read, secondary indices are good only for columns with few
 possible values. Is this a good fit for my case? I have unique facebook id
 for every user.
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Sun, Mar 11, 2012 at 11:48 AM, Marcel Steinbach mstei...@gmail.comwrote:

 Either you do that or you could think about using a secondary index on
 the fb user name in your primary cf.

 See http://www.datastax.com/docs/1.0/ddl/indexes

 Cheers

 Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com:

  Hi!
 I need some advise:
 I have user CF, which has a UUID key which is my internal user id.
 One of the column is facebook_id of the user (if exist).

 I need to have the reverse mapping from facebook_id to my UUID.
 My intention is to add a CF for the mapping from Facebook Id to my id:

 user_by_fbid = {
   // key is fb Id, column name is our User Id, value is empty
   13101876963: {
 f94f6b20-161a-4f7e-995f-0466c62a1b6b : 
   }
 }

 Does this makes sense.
 This CF will be used whenever a user log in through Facebook to retrieve
 the internal id.
 Thanks

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956

Re: What is the best way to secure remote Cassandra (dev) server ?

2012-03-02 Thread Sasha Dolgy

Put it on a non-routable internal network.

192.168.x.x
172.16.x.x

Etc...
On Mar 2, 2012 1:56 PM, investtr investt...@gmail.com wrote:

 We have our development Cassandra 1.0.8 server running on EC2 and wanted
 to secure it.
 I read securing the entire server with firewall is one of the options.
 What are the other cheaper options to secure a development server ?



 regards,
 Ramesh

Re: Best way to know the cluster status

2012-02-06 Thread Sasha Dolgy

Tamil, what is the underlying purpose you are trying to achieve?  To
have your webpages know and detect when a node is down?  To have a
monitoring tool detect when a node is down?  PHPCassa allows you to
define multiple nodes.  If one node is down, it should log information
to the webserver logs and continue to work as expected if an alternate
node is available.

Parsing the output of nodetool ring is OK if you want the status at
that very moment.  Something more reliable should be considered,
perhaps using JMX and a proper monitoring tool, like Nagios or
Zenoss...etc.

On Mon, Feb 6, 2012 at 8:59 AM, R. Verlangen ro...@us2.nl wrote:
 You might consider writing some kind of php script that runs nodetool ring
 and parse the output?

 2012/2/6 Tamil selvan R.S tamil.3...@gmail.com

 Hi,
  What is the best way to know the cluster status via php?
  Currently we are trying to connect to individual cassandra instance with
 a specified timeout and if it fails we report the node to be down.
  But this test remains faulty. What are the other ways to test
 availability of nodes in cassandra cluster?
  How does datastax opscenter manage to  do that?

 Regards,
 Tamil Selvan





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Cannot start cassandra node anymore

2012-01-26 Thread Sasha Dolgy

why would you ever want to stop all nodes together?

On Thu, Jan 26, 2012 at 1:24 PM, Carlo Pires carlopi...@gmail.com wrote:
 I found out this is related to schema change. Happens *every time* I create
 drop and new CF with composite types. As workaround I:

 * never stop all nodes together

 To stop a node:
 * repair and compact a node before stopping it
 * stop and start it again
 * if it started fine good if not, remove all data and restart the node (and
 wait...)

Re: CLI exception :: A long is exactly 8 bytes: 1

2012-01-07 Thread Sasha Dolgy

Hi -- Sorry for the delay, and thanks for the response.

Debug didn't print any stack traces and none are in the usual
suspected places...but thanks for that hint.  Didn't know that option
existed.  The age column is an Integer ... Updating to IntegerType
worked.  Thanks.



On Mon, Jan 2, 2012 at 11:23 AM, aaron morton aa...@thelastpickle.com wrote:
 If you use the --debug flag when you start the CLI it always will print full 
 stack traces.

 What is the CF definition ?  I'm guessing the column_metadata specifies that 
 the age column is a Long

 Was there existing data in the age column and if so how was it encoded ? Was 
 the existing data was encoded as a variable length integer value? The 
 standard IntegerType is not compatible with the LongType as the the long is 
 fixed width. If this is the case try re-creating the index using an 
 IntegerType.

 This worked for me…

 [default@dev] create column family User
 ...         with comparator = AsciiType
 ...         and column_metadata =
 ...         [{
 ...             column_name : age,
 ...             validation_class : LongType,
 ...             index_type : 0,
 ...             index_name : IdxAge},
 ...         ];
 2fd1a5c0-352b-11e1--242d50cf1fb6
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@dev]
 [default@dev] get User where age = 1;

 0 Row Returned.
 Elapsed time: 33 msec(s).
 [default@dev]

 Hope that helps.

Re: cassandra site wsod's /mysql site functions

2012-01-03 Thread Sasha Dolgy

Have you looked at PHPCassa [ https://github.com/thobbs/phpcassa ]
instead of using Thrift direct?  I've had no issues with getting it to
work with versions 0.7.x, 0.8.x and now 1.x ... it adds better error
handling and overall, is fairly easy to get going.  Some information
to get you running:  http://thobbs.github.com/phpcassa/

Instead of fighting with the heavy lifting .. it's often recommended
around here to use the purpose built libraries that abstract thrift
for you ...

-sd

On Tue, Jan 3, 2012 at 2:01 PM, Tim Dunphy bluethu...@gmail.com wrote:
 unfortunately not .. :( thanks for checking. still looking for advice
 on this! tx

CLI exception :: A long is exactly 8 bytes: 1

2011-12-30 Thread Sasha Dolgy

Hi Everyone,

Been a while .. without any problems.  Thanks for grinding out a good
product!  On 1.0.6, I applied an update to a column family to add a
secondary index, and now via the CLI, when I perform a get user where
something=1 I receive the following result:

org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 1

This behaviour doesn't seem to be affecting phpcassa or hector
retrieving the results of that query ... is this a silly something
i've done, or something a bit more buggy with the CLI?

Thanks in advance,
-sd

--
Sasha Dolgy
sasha.do...@gmail.com

Re: CLI exception :: A long is exactly 8 bytes: 1

2011-12-30 Thread Sasha Dolgy

as per the wiki link you sent, i change my query to:

get user where something = '1';

Still throws the error ... This was fine *before* I ran the update CF
command ..

To Query Data
get User where age = '12';

On Fri, Dec 30, 2011 at 6:05 PM, Moshiur Rahman moshi.b...@gmail.com wrote:
 I think you need to mention data type in your command. You have to run the
 following command first:
 assume CFName keys as TypeName, i.e., utf8

 Otherwise, you need to mention type with each command, e.g.,
 utf8('keyname').
 http://wiki.apache.org/cassandra/CassandraCli

 Moshiur



 On Fri, Dec 30, 2011 at 10:50 AM, Sasha Dolgy sdo...@gmail.com wrote:

 Hi Everyone,

 Been a while .. without any problems.  Thanks for grinding out a good
 product!  On 1.0.6, I applied an update to a column family to add a
 secondary index, and now via the CLI, when I perform a get user where
 something=1 I receive the following result:

 org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8
 bytes: 1

 This behaviour doesn't seem to be affecting phpcassa or hector
 retrieving the results of that query ... is this a silly something
 i've done, or something a bit more buggy with the CLI?

 Thanks in advance,
 -sd

Re: cassandra as an email store ...

2011-12-16 Thread Sasha Dolgy

Hi Rustam,

Thanks for posting that.

Interesting to see that you opted to use Super Column's:
https://github.com/elasticinbox/elasticinbox/wiki/Data-Model ..
wondering, for the sake of argument/discussion .. if anyone can come
up with an alternative data model that doesn't use SC's.

-sd

On Fri, Dec 16, 2011 at 11:10 AM, Rustam Aliyev rus...@code.az wrote:
 Hi Sasha,

 Replying to the old thread just for reference. We've released a code which
 we use to store emails in Cassandra as an open source project:
 http://elasticinbox.com/

 Hope you find it helpful.

 Regards,
 Rustam.


 On Fri Apr 29 15:20:07 2011, Sasha Dolgy wrote:

 Great read.  thanks.


 On Apr 29, 2011 4:07 PM, sridhar basam s...@basam.org
 mailto:s...@basam.org wrote:
  Have you already looked at some research out of IBM about this usecase?
  Paper is at
 
  http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf
 
  Sridhar



-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: security

2011-11-09 Thread Sasha Dolgy

Firewall with appropriate rules.

 On Tue, Nov 8, 2011 at 6:30 PM, Guy Incognito dnd1...@gmail.com wrote:

 hi,

 is there a standard approach to securing cassandra eg within a corporate
 network?  at the moment in our dev environment, anybody with network
 connectivity to the cluster can connect to it and mess with it.  this would
 not be acceptable in prod.  do people generally write custom authenticators
 etc, or just put the cluster behind a firewall with the appropriate rules to
 limit access?

Re: Value-Added Services Layer

2011-10-27 Thread Sasha Dolgy

I don't have grand visions of having fat clients connect directly to
Cassandra to read/write data.  Too much risk in my opinion.

On Tue, Oct 25, 2011 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you do not think restful API's are useful, try to make a fat client that
 speaks a non http or https protocol and put if on the desktops of thousands
 of corporate computers. Then wait for months/years for approval and firewall
 changes across said corporate network.

Re: Cassandra and Thrift on the Server Side

2011-10-25 Thread Sasha Dolgy

Hi Brian,

It's an interesting one.  Hope you don't mind some feedback .I see you
have been making rounds publicizing the concept and patch (like on my  blog
; )  http://blog.sasha.dolgy.com/2011/05/apache-cassandra-restful-api.html)

For me, and the goals I have, I'm not sure this is fit for purpose.  I built
an API that implements the business processes, business rules, security and
policies of what I require.  I made it RESTful to allow consumers quick and
easy access  The API implements Hector or PHPCassa depending on my
mood.  Both libraries provide an element of Connection Pooling, meaning I
don't have to worry about that in my code.  It just works  The cost to
me writing code that leverages Hector or PHPCassa isn't that high when I
compare it to writing code that would leverage a RESTful interface.  I'd
have to think about Connection Pooling / selecting the best available node,
etc.  I think the cost would be higher unless it's a one or two node
infrastructure or there is a load balancer in front of all of the Cassandra
interfaces so that I don't have to think about it ..

Would I leverage the RESTful interface if it existed with Cassandra?
Probably not.  I am happy with the libraries as they are today .. and they
let me bundle in a bunch of fun (batch mutates, connection pooling, etc).
They aren't overly complicated and make overall development and integration
quite simple.

I definitely think that people who are looking into Apache Cassanda for the
first time may look for this feature and/or CQL ... and in that respects,
it's something good to have.

Probably the best question I read in the JIRA ticket (
https://issues.apache.org/jira/browse/CASSANDRA-3380 ) is:  ...what problem
the REST API solves... , which is still not clear to me...

-sd

On Tue, Oct 25, 2011 at 5:48 AM, Brian ONeill b...@alumni.brown.edu wrote:

 Peter Minearo Peter.Minearo at Reardencommerce.com writes:

  Thrift uses RPC, I was wondering if Cassandra uses Thrift on the server
 side to
  handle the requests from the clients?  I know Thrift is used on the
 client
  side, but what about the server side?   If this is true; is there a
 reason for
  it?  Was a REST API with a JSON payload tried?  Are there any plans to
  create a REST API for Cassandra?


 We started work on an extension to Cassandra that would deliver a
 REST layer.

 Check out:
 http://tinyurl.com/3ktnc9f
 http://code.google.com/a/apache-extras.org/p/virgil/

 -brian

Re: Volunteers needed - Wiki

2011-10-11 Thread Sasha Dolgy

maybe that should be the first wiki update  the TODO

On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe watanabe.m...@gmail.comwrote:

 Hello aaron,
 I raise my hand too.
 If you have to-do list about the wiki, please let us know.

 maki

Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Sasha Dolgy

I was trying to get a range of rows based on a secondary_index that was
defined.  Any rows where age was greater than or equal to ... it didn't
work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
find anything.

The output from help get; on the cli contains the following, which led me
to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...

get cf where col operator value [
and col operator value and ...] [limit limit];
get cf where col operator function(value) [
and col operator function and ...] [limit limit];

- operator: Operator to test the column value with. Supported operators are
  =, , =, , = .

  In Cassandra 0.7 at least one = operator must be present.

[default@sdo]  get user where age = 18;
No indexed columns present in index clause with operator EQ
[default@sdo]  get user where gender = 1 and age = 18
(returns results)

Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...

create column family user
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and memtable_operations = 0.248437498
  and memtable_throughput = 53
  and memtable_flush_after = 1440
  and rows_cached = 0.0
  and row_cache_save_period = 0
  and keys_cached = 20.0
  and key_cache_save_period = 14400
  and read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
  and column_metadata = [
{column_name : 'gender',
validation_class : LongType,
index_name : 'user_gender_idx',
index_type : 0},
{column_name : 'year',
validation_class : LongType,
index_name : 'user_year_idx',
index_type : 0}];



-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Operator on secondary indexes in 0.8.x (GTE/LTE)

2011-10-11 Thread Sasha Dolgy

ah, hadn't even thought of that.  simple.  elegant.

cheers.

On Tue, Oct 11, 2011 at 11:01 PM, Jake Luciani jak...@gmail.com wrote:

 This hasn't changed in AFAIK,  In Brisk we had the same problem in CFS so
 we created a sentinel value that all rows shared then it works.
 CASSANDRA-2915 should fix it.

 On Tue, Oct 11, 2011 at 4:48 PM, Sasha Dolgy sdo...@gmail.com wrote:

 I was trying to get a range of rows based on a secondary_index that was
 defined.  Any rows where age was greater than or equal to ... it didn't
 work.  Is this a continued limitation?  Did a quick look in JIRA, couldn't
 find anything.

 The output from help get; on the cli contains the following, which led
 me to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ...

 get cf where col operator value [
 and col operator value and ...] [limit limit];
 get cf where col operator function(value) [
 and col operator function and ...] [limit limit];

 - operator: Operator to test the column value with. Supported operators
 are
   =, , =, , = .

   In Cassandra 0.7 at least one = operator must be present.

 [default@sdo]  get user where age = 18;
 No indexed columns present in index clause with operator EQ
 [default@sdo]  get user where gender = 1 and age = 18
 (returns results)

 Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ...

 create column family user
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'BytesType'
   and memtable_operations = 0.248437498
   and memtable_throughput = 53
   and memtable_flush_after = 1440
   and rows_cached = 0.0
   and row_cache_save_period = 0
   and keys_cached = 20.0
   and key_cache_save_period = 14400
   and read_repair_chance = 1.0
   and gc_grace = 864000
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'
   and column_metadata = [
 {column_name : 'gender',
 validation_class : LongType,
 index_name : 'user_gender_idx',
 index_type : 0},
 {column_name : 'year',
 validation_class : LongType,
 index_name : 'user_year_idx',
 index_type : 0}];



 --
 Sasha Dolgy
 sasha.do...@gmail.com




 --
 http://twitter.com/tjake




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Volunteers needed - Wiki

2011-10-11 Thread Sasha Dolgy

while on the topic of the wiki ... it's not entirely pleasing to the senses
or at all user friendly ... hacking around on it earlier today, there aren't
that many options on how to give it some flare ... shame really that for
such a cool piece of software, the wiki doesn't scream the same level of
cool.

FWIW, Cassandra doesn't show up on http://wiki.apache.org/general/

On Wed, Oct 12, 2011 at 12:05 AM, Daria Hutchinson da...@datastax.comwrote:

 Sounds like a good place to start!

 Thanks for taking the lead and please let me know how I can help!

 Daria


 On Tue, Oct 11, 2011 at 2:20 PM, aaron morton aa...@thelastpickle.comwrote:

 Thanks Daria, I have a look at whats there and get in touch.

 Right now I'm not thinking beyond getting the wiki complete (e.g. it lists
 all the command line tools) and correct for version 1.0. My main concern was
 people coming away from the site with incorrect information and having a bad
 out of the box experience.

 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12/10/2011, at 7:42 AM, Daria Hutchinson wrote:

 DataStax would like to help with the wiki update effort. For example, we
 have a start on updates for 1.0, such as the storage configuration.

 http://www.datastax.com/docs/1.0/configuration/storage_configuration

 Let me know how we can help.

 Cheers,
 Daria (DataStax Tech Writer)

 Question - Are you planning on maintaining wiki docs by version going
 forward (starting with 1.0)?

 On Tue, Oct 11, 2011 at 1:55 AM, aaron morton aa...@thelastpickle.comwrote:

 @maki thanks,
 Could you take a look at the cli page
 http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of
 online docs in the tool, so we dont need to replicate that. Just a simple
 getting started guide, some examples and a few tips about about what to do
 if things don't work. e.g. often people have problems when using bytes
 comparator. If you could use the sample schema that ships in conf/ that
 would be handy.

 You may want to snapshot the 0.7 CLI page in the same way the 0.6 one was
 and link back http://wiki.apache.org/cassandra/CassandraCli06

 Just update the draft home page to say you are working on it
 http://wiki.apache.org/cassandra/FrontPage_draft_aaron

 @sasha
 I was going to use the draft home page as a todo list, (do every page
 listed on there, and sensibly follow links) and as a checkout system
 http://wiki.apache.org/cassandra/FrontPage_draft_aaron

 @Jérémy
 Thanks I'll keep that in mind.

 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote:

 Hi Aaron,

 I think the CommitLog section is outdated (
 http://wiki.apache.org/cassandra/ArchitectureCommitLog) :

 The CommitLogHeader is no longer exist since this ticket :
 https://issues.apache.org/jira/browse/CASSANDRA-2419

 Regards,

 Jérémy

 2011/10/11 Sasha Dolgy sdo...@gmail.com

 maybe that should be the first wiki update  the TODO


 On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe watanabe.m...@gmail.com
  wrote:

 Hello aaron,
 I raise my hand too.
 If you have to-do list about the wiki, please let us know.

 maki





 --
 Jérémy








-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: ebs or ephemeral

2011-10-10 Thread Sasha Dolgy

just catching the tail end of this discussion.  aaron, in your previous
email, you said And an explanation of why we normally avoid ephemeral. 
 shouldn't this be, avoiding EBS?  EBS was a nightmare for us in terms
of performance.

On Mon, Oct 10, 2011 at 9:23 AM, aaron morton aa...@thelastpickle.comwrote:

 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes.

 see http://thelastpickle.com/2011/06/13/Down-For-Me/
 Cheers

  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/10/2011, at 9:37 PM, Madalina Matei wrote:

 Hi Aaron,

 For a 6 nodes cluster, what RF can we use in order to support 2 failed
 nodes?
 From the article that you sent i understood avoid EMS and use ephemeral.
 am i missing anything?

 Thank you so much for your help,
 Madaina
 On Fri, Oct 7, 2011 at 9:15 AM, aaron morton aa...@thelastpickle.comwrote:

 Data Stax have pre build AMI's here
 http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami


 And an explanation of why we normally avoid ephemeral.

 Also, I would go with 6 nodes. You will then be able to handle up to 2
 failed nodes.

 Hope that helps.

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Sasha Dolgy

It was mentioned in another thread that Twitter uses 0.8 in
productionfor me that was a fairly strong testimonial...
On Sep 14, 2011 9:28 AM, Yan Chunlu springri...@gmail.com wrote:
 is 0.8 ready for production use? as I know currently many companies
 including reddit.com are using 0.7, how does they get rid of the repair
 problem?

 On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com
wrote:
  me neither don't want to repair one CF at the time.
  the node repair took a week and still running, compactionstats and
  netstream shows nothing is running on every node, and also no error
  message, no exception, really no idea what was it doing,

 To add to the list of things repair does wrong in 0.7, we'll have to add
 that
 if one of the node participating in the repair (so any node that share a
 range
 with the node on which repair was started) goes down (even for a short
 time),
 then the repair will simply hang forever doing nothing. And no specific
 error message will be logged. That could be what happened. Again, recent
 releases of 0.8 fix that too.

 --
 Sylvain

  I stopped yesterday. maybe I should run repair again while disable
  compaction on all nodes?
  thanks!
 
  On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
  peter.schul...@infidyne.com wrote:
 
   I think it is a serious problem since I can not repair. I am
   using cassandra on production servers. is there some way to fix it
   without upgrade? I heard of that 0.8.x is still not quite ready in
   production environment.
 
  It is a serious issue if you really need to repair one CF at the time.
  However, looking at your original post it seems this is not
  necessarily your issue. Do you need to, or was your concern rather the
  overall time repair took?
 
  There are other things that are improved in 0.8 that affect 0.7. In
  particular, (1) in 0.7 compaction, including validating compactions
  that are part of repair, is non-concurrent so if your repair starts
  while there is a long-running compaction going it will have to wait,
  and (2) semi-related is that the merkle tree calculation that is part
  of repair/anti-entropy may happen out of synch if one of the nodes
  participating happen to be busy with compaction. This in turns causes
  additional data to be sent as part of repair.
 
  That might be why your immediately following repair took a long time,
  but it's difficult to tell.
 
  If you're having issues with repair and large data sets, I would
  generally say that upgrading to 0.8 is recommended. However, if you're
  on 0.7.4, beware of
  https://issues.apache.org/jira/browse/CASSANDRA-3166
 
  --
  / Peter Schuller (@scode on twitter)

AntiEntropyService.getNeighbors pulls information from where?

2011-09-12 Thread Sasha Dolgy

This relates to the issue i opened the other day:
https://issues.apache.org/jira/browse/CASSANDRA-3175 ..  basically,
'nodetool ring' throws an exception on two of the four nodes.

In my fancy little world, the problems appear to be related to one of
the nodes thinking that someone is their neighbor ... and that someone
moved away a long time ago

/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5]
2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7]
2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9]
2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.

Appears only in the logs for one node that is generating the issue. 172.16.12.10

Where do I find where the AntiEntropyService.getNeighbors(tablename,
range) is pulling it's information from?

On the two nodes that work:

[default@system] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11,
172.16.14.12, 172.16.14.10]
[default@system]

From the two nodes that don't work:

[default@unknown] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11,
172.16.14.12, 172.16.14.10]
UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10
[default@unknown]

Really now.  Where does 10.130.185.136 exist?  It's in none of the
configurations I have AND the full ring has been shut down and started
up ... not trying to give Vijay a hard time by posting here btw!

Just thinking it could be something super silly ... that a wider
audience has come across.

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: AntiEntropyService.getNeighbors pulls information from where?

2011-09-12 Thread Sasha Dolgy

use system;
del LocationInfo[52696e67];

i ran this on the nodes that had the problems.  stopped, started the
nodes, it re-did it's job  job done.  all fixed with a new bug!
https://issues.apache.org/jira/browse/CASSANDRA-3186

On Tue, Sep 13, 2011 at 2:09 AM, aaron morton aa...@thelastpickle.com wrote:
 I'm pretty sure I'm behind on how to deal with this problem.

 Best I know is to start the node with -Dcassandra.load_ring_state=false as 
 a JVM option. But if the ghost IP address is in gossip it will not work, and 
 it should be in gossip.

 Does the ghost IP show up in nodetool ring ?

 Anyone know a way to remove a ghost IP from gossip that does not have a token 
 associated with it ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote:

 This relates to the issue i opened the other day:
 https://issues.apache.org/jira/browse/CASSANDRA-3175 ..  basically,
 'nodetool ring' throws an exception on two of the four nodes.

 In my fancy little world, the problems appear to be related to one of
 the nodes thinking that someone is their neighbor ... and that someone
 moved away a long time ago

 /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5]
 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not
 proceed on repair because a neighbor (/10.130.185.136) is dead:
 manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
 /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7]
 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not
 proceed on repair because a neighbor (/10.130.185.136) is dead:
 manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
 /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9]
 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not
 proceed on repair because a neighbor (/10.130.185.136) is dead:
 manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.

 Appears only in the logs for one node that is generating the issue. 
 172.16.12.10

 Where do I find where the AntiEntropyService.getNeighbors(tablename,
 range) is pulling it's information from?

 On the two nodes that work:

 [default@system] describe cluster;
 Cluster Information:
 Snitch: org.apache.cassandra.locator.Ec2Snitch
 Partitioner: org.apache.cassandra.dht.RandomPartitioner
 Schema versions:
 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11,
 172.16.14.12, 172.16.14.10]
 [default@system]

 From the two nodes that don't work:

 [default@unknown] describe cluster;
 Cluster Information:
 Snitch: org.apache.cassandra.locator.Ec2Snitch
 Partitioner: org.apache.cassandra.dht.RandomPartitioner
 Schema versions:
 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11,
 172.16.14.12, 172.16.14.10]
 UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10
 [default@unknown]

 Really now.  Where does 10.130.185.136 exist?  It's in none of the
 configurations I have AND the full ring has been shut down and started
 up ... not trying to give Vijay a hard time by posting here btw!

 Just thinking it could be something super silly ... that a wider
 audience has come across.

 --
 Sasha Dolgy
 sasha.do...@gmail.com





-- 
Sasha Dolgy
sasha.do...@gmail.com

Ec2Snitch nodetool issue after upgrade to 0.8.5

2011-09-10 Thread Sasha Dolgy

Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only
one minor problem.  It relates to Ec2Snitch when running a 'nodetool
ring' from two of the four nodes.  the rest are all working fine:

Address         DC          Rack        Status State   Load Owns    Token
       148362247927262972740864614603570725035
172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02%
1909554714494251628118265338228798
172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11%
56713727820156410577229101238628035242
172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33%
113427455640312821154458202477256070484
172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53%
14836224792726297274086461460357072503

works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.

the nodes where the error appears are on the 172.16.12.0/24 network
and this is what is shown when nodetool ring is run:

Address         DC          Rack        Status State   Load Owns    Token

       148362247927262972740864614603570725035
172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02%
1909554714494251628118265338228798
172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11%
56713727820156410577229101238628035242
Exception in thread main java.lang.NullPointerException
       at 
org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
       at 
org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
       at 
org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
       at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
       at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
       at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
       at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
       at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
       at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
       at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
       at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
       at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
       at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
       at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
       at sun.rmi.transport.Transport$1.run(Transport.java:159)
       at java.security.AccessController.doPrivileged(Native Method)
       at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
       at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
       at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
       at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)

I've stopped the node and started it ... still doesn't make a
difference.  I've also shut down all nodes in the ring so that it was
fully offline, and then brought them all back up ... issue still
persists on two of the nodes.

There are no firewall rules restricting traffic between these nodes.
For example, on a node where the ring throws the exception, the two
hosts that don't show up i can still get nestats for:

nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
Mode: Normal
 Nothing streaming to /172.16.14.10
 Nothing streaming from /172.16.14.10
Pool NameActive   Pending  Completed
Commandsn/a 0  3
Responses   n/a 1   1483

nodetool -h 172.16.12.11 -p 9090 netstats

Re: Ec2Snitch nodetool issue after upgrade to 0.8.5

2011-09-10 Thread Sasha Dolgy

maybe it's related to this ...
https://issues.apache.org/jira/browse/CASSANDRA-3114

odd thing is, we haven't moved to Ec2Snitch ... been using it for
quite a long time now ...

On Sat, Sep 10, 2011 at 1:42 PM, Sasha Dolgy sdo...@gmail.com wrote:
 Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only
 one minor problem.  It relates to Ec2Snitch when running a 'nodetool
 ring' from two of the four nodes.  the rest are all working fine:

 Address         DC          Rack        Status State   Load Owns    Token
        148362247927262972740864614603570725035
 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02%
 1909554714494251628118265338228798
 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11%
 56713727820156410577229101238628035242
 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33%
 113427455640312821154458202477256070484
 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53%
 14836224792726297274086461460357072503

 works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.

 the nodes where the error appears are on the 172.16.12.0/24 network
 and this is what is shown when nodetool ring is run:

 Address         DC          Rack        Status State   Load Owns    Token

        148362247927262972740864614603570725035
 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02%
 1909554714494251628118265338228798
 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11%
 56713727820156410577229101238628035242
 Exception in thread main java.lang.NullPointerException
        at 
 org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
        at 
 org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
        at 
 org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
        at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
        at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
        at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
        at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
        at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
        at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
        at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
        at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
        at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
        at sun.rmi.transport.Transport$1.run(Transport.java:159)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
        at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
        at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
        at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

 I've stopped the node and started it ... still doesn't make a
 difference.  I've also shut down all nodes in the ring so that it was
 fully offline, and then brought them all back up ... issue still
 persists on two of the nodes.

 There are no firewall rules restricting traffic between these nodes.
 For example, on a node where the ring throws the exception, the two
 hosts that don't show up i can still get nestats for:

 nodetool -h 172.16.12.11 -p 9090

Re: Ec2Snitch nodetool issue after upgrade to 0.8.5

2011-09-10 Thread Sasha Dolgy

Of course.  Hoping one day I create an issue related to Ec2 that CAN
be reproduced...

https://issues.apache.org/jira/browse/CASSANDRA-3175

On Sat, Sep 10, 2011 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Can you create a Jira ticket?

Re: Is Cassandra suitable for this use case?

2011-08-25 Thread Sasha Dolgy

You can chunk the files into pieces and store the pieces in Cassandra...
Munge all the pieces back together when delivering back to the client...
On Aug 25, 2011 6:33 PM, Ruby Stevenson ruby...@gmail.com wrote:
 hi Evgeny

 I appreciate the input. The concern with HDFS is that it has own
 share of problems - its name node, which essentially a metadata
 server, load all files information into memory (roughly 300 MB per
 million files) and its failure handling is far less attractive ... on
 top of configuring and maintaining two separate components and two API
 for handling data. I am still holding out hopes that there might be
 some better way of go about it?

 Best Regards,

 Ruby

 On Thu, Aug 25, 2011 at 11:10 AM, Evgeniy Ryabitskiy
 evgeniy.ryabits...@wikimart.ru wrote:
 Hi,

 If you want to store files with partition/replication, you could use
 Distributed File System(DFS).
 Like http://hadoop.apache.org/hdfs/
 or any other:
 http://en.wikipedia.org/wiki/Distributed_file_system

 Still you could use Cassandra to store any metadata and filepath in DFS.

 So: Cassandra + HDFS would be my solution.

 Evgeny.

Re: Changing the CLI, not a great idea!

2011-07-28 Thread Sasha Dolgy

Unfortunately, the perception that I have as a business consumer and
night-time hack, is that more importance and effort is placed on
ensuring information is up to date and correct on the
http://www.datastax.com/docs/0.8/index website and less on keeping the
wiki up to date or relevant... which forces people to be introduced to
a for-profit company to get relevant information ... which just so
happens to employ a substantial amount of Apache Cassandra
contributors ... not that there's anything wrong with that, right?

On Thu, Jul 28, 2011 at 10:46 AM, David Boxenhorn da...@citypath.com wrote:
 This is part of a much bigger problem, one which has many parts, among them:

 1. Cassandra is complex. Getting a gestalt understanding of it makes me
 think I understand how Alzheimer's patients must feel.
 2. There is no official documentation. Perhaps everything is out there
 somewhere, who knows?
 3. Cassandra is a moving target. Books are out of date before they hit the
 press.
 4. Most of the important knowledge about Cassandra exists in a kind of oral
 history, that is hard to keep up with, and even harder to understand once
 it's long past.

 I think it is clear that we need a better one-stop-shop for good
 documentation. What hasn't been talked about much - but I think it's just as
 important - is a good one-stop-shop for Cassandra's oral history.

 (You might think this list is the place, but it's too noisy to be useful,
 except at the very tip of the cowcatcher. Cassandra needs a canonized
 version of its oral history.)

Re: Equalizing nodes storage load

2011-07-22 Thread Sasha Dolgy

are you trying to balance load or owns ?  owns looks fine ...
33.33% each ... which to me says balanced.

how did you calculate your tokens?


On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib
mina.nag...@bloomdigital.com wrote:

 Address         Status State   Load            Owns    Token
 xx.xx.x.105     Up     Normal  41.98 GB        33.33%  
 37809151880104273718152734159085356828
 xx.xx.x.107     Up     Normal  59.4 GB         33.33%  
 94522879700260684295381835397713392071
 xx.xx.x.18      Up     Normal  74.65 GB        33.33%  
 151236607520417094872610936636341427313

Re: Cassandra training in Bangalore, India

2011-07-21 Thread Sasha Dolgy

I am quite certain if you find enough people and pony up the fees a few
people on this list would be willing to make the journey...
On Jul 21, 2011 8:02 AM, samal sa...@wakya.in wrote:
 As per my knowledge, there is not such expert training available in India
as
 of now.
 As Sameer said there is enough online material available from where you
can
 learn.I have been playing with Cassandra since beginning. We can plan for
 Meetup/learning session near Mumbai/Pune region.

Re: Need help json2sstable

2011-07-20 Thread Sasha Dolgy

You are missing  after  

On Wed, Jul 20, 2011 at 8:03 AM, Nilabja Banerjee
nilabja.baner...@gmail.com wrote:
 Hi All,

 Here Is my Json structure.


 {Fetch_CC :{
                 cc:{ :1000,
                      :ICICI,
                          :,
                          city:{
                              name:banglore
    };
    };
 }

 If the structure is incorrect, please give me one small structre to use
 below utility.
 I am using 0.7.5 version.
 Now how can I can use Json2SStable utilities? Please provide me the steps.
 What are the things I have configure?

 Thank You




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: best example of indexing

2011-07-20 Thread Sasha Dolgy

Examples exist in the conf directory of the distribution...
On Jul 20, 2011 11:48 AM, CASSANDRA learner cassandralear...@gmail.com
wrote:
 Hi Guys,

 Can you please give me the best example of creating index on a column
 family. As I am completely new to this, Can you please give me a simple
and
 good example.

Re: One node down but it thinks its fine...

2011-07-13 Thread Sasha Dolgy

any firewall changes?  ping is fine ... but if you can't get from
node(a) to nodes(n) on the specific ports...

On Wed, Jul 13, 2011 at 6:47 PM, samal sa...@wakya.in wrote:
 Check seed ip is same in all node and should not be loopback ip on cluster.

 On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski ray.slakin...@gmail.com
 wrote:

 One of our nodes, which happens to be the seed thinks its Up and all the
 other nodes are down. However all the other nodes thinks the seed is down
 instead. The logs for the seed node show everything is running as it should
 be. I've tried restarting the node, turning on/off gossip and thrift and
 nothing seems to get the node to see the rest of its ring as up and running.
 I have also tried restarting one of the other nodes, which had no affect on
 the situation. Below is the ring outputs for the seed and one other node in
 the ring, plus a ping to show that the seed can ping the other node.

 # bin/nodetool -h 0.0.0.0 ring
 Address Status State Load Owns Token
  141784319550391026443072753096570088105
 127.0.0.1 Up Normal 4.61 GB 16.67% 0
 xx.xxx.30.210 Down Normal ? 16.67% 28356863910078205288614550619314017621
 xx.xx.90.87 Down Normal ? 16.67% 56713727820156410577229101238628035242
 xx.xx.22.236 Down Normal ? 16.67% 85070591730234615865843651857942052863
 xx.xx.97.96 Down Normal ? 16.67% 113427455640312821154458202477256070484
 xx.xxx.17.122 Down Normal ? 16.67% 141784319550391026443072753096570088105


 # ping xx.xxx.30.210
 PING xx.xxx.30.210 (xx.xxx.30.210) 56(84) bytes of data.
 64 bytes from xx.xxx.30.210: icmp_req=1 ttl=61 time=0.299 ms
 64 bytes from xx.xxx.30.210: icmp_req=2 ttl=61 time=0.287 ms
 ^C
 --- xx.xxx.30.210 ping statistics ---
 2 packets transmitted, 2 received, 0% packet loss, time 999ms
 rtt min/avg/max/mdev = 0.287/0.293/0.299/0.006 ms


 # bin/nodetool -h xx.xxx.30.210 ring
 Address Status State Load Owns Token
  141784319550391026443072753096570088105
 xx.xxx.23.40 Down Normal ? 16.67% 0
 xx.xxx.30.210 Up Normal 10.58 GB 16.67%
 28356863910078205288614550619314017621
 xx.xx.90.87 Up Normal 10.47 GB 16.67%
 56713727820156410577229101238628035242
 xx.xx.22.236 Up Normal 9.63 GB 16.67%
 85070591730234615865843651857942052863
 xx.xx.97.96 Up Normal 10.68 GB 16.67%
 113427455640312821154458202477256070484
 xx.xxx.17.122 Up Normal 10.18 GB 16.67%
 141784319550391026443072753096570088105

 --
 Ray Slakinski







-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Survey: Cassandra/JVM Resident Set Size increase

2011-07-12 Thread Sasha Dolgy

I'll post more tomorrow ... However, we set up one node in a single node
cluster and have left it with no datareviewing memory consumption
graphs...it increased daily until it gobbled (highly technical term) all
memory...the system is now running just below 100% memory usagewhich i
find peculiar seeings that it is doing nothingwith no data and
no peers.
On Jul 12, 2011 3:29 PM, Chris Burroughs chris.burrou...@gmail.com
wrote:
### Preamble

There have been several reports on the mailing list of the JVM running
Cassandra using too much memory. That is, the resident set size is
(max java heap size + mmaped segments) and continues to grow until the
process swaps, kernel oom killer comes along, or performance just
degrades too far due to the lack of space for the page cache. It has
been unclear from these reports if there is a pattern. My hope here is
that by comparing JVM versions, OS versions, JVM configuration etc., we
will find something. Thank you everyone for your time.

Some example reports:
- http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html
-

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html
- https://issues.apache.org/jira/browse/CASSANDRA-2868
-

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
-

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html

For reference theories include (in no particular order):
- memory fragmentation
- JVM bug
- OS/glibc bug
- direct memory
- swap induced fragmentation
- some other bad interaction of cassandra/jdk/jvm/os/nio-insanity.

### Survey

1. Do you think you are experiencing this problem?

2. Why? (This is a good time to share a graph like
http://www.twitpic.com/5fdabn or
http://img24.imageshack.us/img24/1754/cassandrarss.png)

2. Are you using mmap? (If yes be sure to have read
http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have
used pmap [or another tool] to rule you mmap and top decieving you.)

3. Are you using JNA? Was mlockall succesful (it's in the logs on
startup)?

4. Is swap enabled? Are you swapping?

5. What version of Apache Cassandra are you using?

6. What is the earliest version of Apache Cassandra you recall seeing
this problem with?

7. Have you tried the patch from CASSANDRA-2654 ?

8. What jvm and version are you using?

9. What OS and version are you using?

10. What are your jvm flags?

11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize)

12. Can you characterise how much GC your cluster is doing?

13. Approximately how many read/writes per unit time is your cluster
doing (per node or the whole cluster)?

14. How are you column families configured (key cache size, row cache
size, etc.)?

Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-10 Thread Sasha Dolgy

No, it's not possible.

To achieve it, there are two options ... contribute to the issue or
wait for it to be resolved ...

https://issues.apache.org/jira/browse/CASSANDRA-2614

-sd

On Sun, Jul 10, 2011 at 5:04 PM, Aditya Narayan ady...@gmail.com wrote:
 Is it now possible to store counters in the standard column families along
 with non counter type columns ? How to achieve this ?

Re: Repair doesn't work after upgrading to 0.8.1

2011-07-01 Thread Sasha Dolgy

This is the same behavior I reported in 2768 as Aaron referenced ...
What was suggested for us was to do the following:

- Shut down the entire ring
- When you bring up each node, do a nodetool repair

That didn't immediately resolve the problems.  In the end, I backed up
all the data, removed the keyspace and created a new one.  That seemed
to have solved our problems.  That was from 0.7.6-2 to 0.8.0

However, in the issue reported, it was unable to be reproduced ... I'd
be curious to know how Hector's keyspace is defined.  Ours at the time
was RF=3 and using Ec2 snitch...

-sd

On Fri, Jul 1, 2011 at 9:22 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 Héctor, when you say I have upgraded all my cluster to 0.8.1, from
 which version was
 that: 0.7.something or 0.8.0 ?

 If this was 0.8.0, did you run successful repair on 0.8.0 previous to
 the upgrade ?

DataStax Brisk

2011-06-30 Thread Sasha Dolgy

How far behind is Brisk from the Cassandra release cycle?  If 0.8.1 of
Cassandra was released yesterday, when ( if it isn't already ) will
the Brisk distribution implement 0.8.1?

-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: advice for EC2 deployment

2011-06-23 Thread Sasha Dolgy

are you able to open a connection from one of the nodes to a node on
the other side?  us-east to us-west?  could your problem be as simple
as connectivity and/or security group configuration?

On Thu, Jun 23, 2011 at 1:51 PM, pankaj soni pankajsoni0...@gmail.com wrote:
 hey,
 I have got my ec2 multi-dc across AZ's but in same region us-east.
 Now I am trying to deploy cassandra over multiple regions that is ec2 us
 west, singapore and us-east. I have edited the config file as
 sasha's reply below.
 though when I run nodetool in each DC, I only see the nodes from that
 region. That is EC2 US west is showing only 2 nodes which are up in that
 region
 but not the other 2 which are there in US-east.
 Kindly suggest a solution.
 -thanks
 On Wed, Apr 27, 2011 at 5:45 PM, Sasha Dolgy sdo...@gmail.com wrote:

 Hi,

 If I understand you correctly, you are trying to get a private ip in
 us-east speaking to the private ip in us-west.  to make your life
 easier, configure your nodes to use hostname of the server.  if it's
 in a different region, it will use the public ip (ec2 dns will handle
 this for you) and if it's in the same region, it will use the private
 ip.  this way you can stop worrying about if you are using the public
 or private ip to communicate with another node.  let the aws dns do
 the work for you.

 just make sure you are using v0.8 with SSL turned on and have the
 appropriate security group definitions ...

 -sasha



 On Wed, Apr 27, 2011 at 1:55 PM, pankajsoni0126
 pankajsoni0...@gmail.com wrote:
  I have been trying to deploy Cassandra cluster across regions and for
  that I
  posted this IP address resolution in MultiDC setup.
 
  But when it is to get nodes talking to each other on different regions
  say,
  us-east and us-west over private IP's of EC2 nodes I am facing problems.
 
  I am assuming if Cassandra is built for multi-DC setup it should be
  easily
  deployed with node1's DC1's public IP listed as seed in all nodes in DC2
  and
  to gain idea about network topology? I have hit a dud for deployment in
  such
  scenario.
 
  Or is it there any way possible to use Private IP's for such a scenario
  in
  EC2, as Public Ip are less secure and costly?





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: How to create data model from RDBMS ERD

2011-06-23 Thread Sasha Dolgy

you can create the inverted index in the same CF ... just means you
would have potentially lots more rows ...

do you have a use-case or hypothetical you can share?  if not ... here's one.

http://code.google.com/p/oauth-php  it has an RDBMs suggested
model 
http://oauth-php.googlecode.com/svn/trunk/library/store/mysql/mysql.sql

how would you model that? self serving as it's my plan today / tomorrow 

On Thu, Jun 23, 2011 at 6:43 PM, mcasandra mohitanch...@gmail.com wrote:
 How should one go about creating a data model from RDBMS ER into Big Table
 Data model? For eg: RDBMS has many indexes required for queries and I think
 this is the most important aspect when desiging the data model in Big Table.

 I was initially planning to denormalize into one CF and use secondary
 indexes. However I also read that creating secondary indexes have
 performance impact. So other option is to create inverted index. But it also
 seems to be bad to have too many CFs. We have requirements to support high
 volume min of 500 writes + 500 reads per sec.

 What would you advise?

Re: advice for EC2 deployment

2011-06-23 Thread Sasha Dolgy

we use a combination of Vyatta  OpenVPN on the nodes that are EC2 and
nodes that aren't Ec2works a treat.

On Thu, Jun 23, 2011 at 10:23 PM, Sameer Farooqui
cassandral...@gmail.com wrote:
 EC2Snitch doesn't currently support multi-Regions in Amazon.
 Tickets to track:
 https://issues.apache.org/jira/browse/CASSANDRA-2452
 https://issues.apache.org/jira/browse/CASSANDRA-2491
 Let us know if/how you get the OpenVPN connection to work across Regions.

 On Thu, Jun 23, 2011 at 6:29 AM, pankajsoni0126 pankajsoni0...@gmail.com
 wrote:

 No, the nodes in the separate DC's are able to discover each other. But
 across the Dc's its not happening.

 I have double checked the config parameters, both require in amazon
 settings
 and cassandra.yaml before posting query here.

 has anybody got there nodes talking to each other across regions by just
 using public-dns?

 I am also looking into open vpn and how to deploy it.

Re: Storing files in blob into Cassandra

2011-06-22 Thread Sasha Dolgy

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos-images-docs-etc-td6078278.html

Of significance from that link (which was great until feeling lucky
was removed...):

Google of terms cassandra large files + feeling lucky
http://www.google.com/search?q=cassandra+large+filesie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:officialclient=firefox-a

Yields:
http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage


--- store your images / documents / etc. somewhere and reference them
in Cassandra.  That's the consensus that's been bandied about on this
list quite frequently.  we employ a solution that uses Amazon S3 for
storage and Cassandra as the reference to the meta data and location
of the files.  works a treat


On Wed, Jun 22, 2011 at 9:07 AM, Damien Picard picard.dam...@gmail.com wrote:
 Hi,

 I have to store some files (Images, documents, etc.) for my users in a
 webapp. I use Cassandra for all of my data and I would like to know if this
 is a good idea to store these files into blob on a Cassandra CF ?
 Is there some contraindications, or special things to know to achieve this ?

 Thank you

Re: solandra or pig or....?

2011-06-22 Thread Sasha Dolgy

First, thanks everyone for the input.  Appreciate it.  The number
crunching would already have been completed, and all statistics per
game defined, and inserted into the appropriate CF/row/cols ...

So, that being said, Solandra appears to be the right way to go ...
except, this would require that my current application(s) be rewritten
to consume Solandra and no longer Cassandra ... Your application
isn't aware of Cassandra only Solr. or can I have the best of both
worlds?  Search is only one aspect of the consumer experience.  If a
consumer wanted to view a 'card' for a baseball player, all the
information would be retrieved directly from Cassandra to build that
card and search wouldn't be required...

-sd

On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani jak...@gmail.com wrote:
 Right,  Solr will not do anything other than basic aggregations (facets) and
 range queries.
 On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich dan.kuebr...@gmail.com
 wrote:

 Solandra is indeed distributed search, not distributed number-crunching.
  As a previous poster said, you could imagine structuring the data in a
 series of documents with fields containing playername, teamname, position,
 location, day, time, inning, at bat, outcome, etc.  Then you could query to
 get a slice of the data that matches your predicate and run statistics on
 that subset.
 The statistics would have to come from other code (eg. R), but solr will
 filter it for you. So, this approach only works if the slices are reasonably
 small, but gives you great granularity on search as long as you put all the
 info in.  The users of this datastore (or you) must be willing to write
 their own simple aggregation functions (show me only the unique player
 names returned by this solr query, show me the average of field X returned
 by this solr query, ...)
 If the numbers of results are too great, MR may be the way to go.

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy

We had a similar problem a last month and found that the OS eventually
in the end killed the Cassandra process on each of our nodes ... I've
upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but
i do see consumption levels rising consistently from one day to the
next on each node ..

On Wed, Jun 1, 2011 at 2:30 PM, Sasha Dolgy sdo...@gmail.com wrote:
 is there a specific string I should be looking for in the logs that
 isn't super obvious to me at the moment...

 On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The place to start is with the statistics Cassandra logs after each GC.

look for GCInspector

I found this in the logs on all my servers but never did much after that

On Wed, Jun 22, 2011 at 2:33 PM, William Oberman
ober...@civicscience.com wrote:
 I woke up this morning to all 4 of 4 of my cassandra instances reporting
 they were down in my cluster.  I quickly started them all, and everything
 seems fine.  I'm doing a postmortem now, but it appears they all OOM'd at
 roughly the same time, which was not reported in any cassandra log, but I
 discovered something in /var/log/kern that showed java died of oom(*).  In
 amazon, I'm using large instances for cassandra, and they have no swap (as
 recommended), so I have ~8GB of ram.  Should I use a different max mem
 setting?  I'm using a stock rpm from riptano/datastax.  If I run ps -aux I
 get:
 /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
 -Xms3843M -Xmx3843M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=X.X.X.X
 -Dcom.sun.management.jmxremote.port=8080
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false -Dmx4jaddress=0.0.0.0
 -Dmx4jport=8081 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true
 -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp
 :/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.1.3.jar:/usr/share/cassandra/lib/apache-cassandra-0.7.4.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-collections-3.2.1.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/cassandra/lib/guava-r05.jar:/usr/share/cassandra/lib/high-scale-lib.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jetty-6.1.21.jar:/usr/share/cassandra/lib/jetty-util-6.1.21.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jug-2.0.0.jar:/usr/share/cassandra/lib/libthrift-0.5.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar
 org.apache.cassandra.thrift.CassandraDaemon
 (*) Also, why would they all OOM so close to each other?  Bad luck?  Or once
 the first node went down, is there an increased chance of the rest?
 I'm still on 0.7.4, when I released cassandra to production that was the
 latest release.  In addition to (or instead of?) fixing memory settings, I'm
 guessing I should upgrade.
 will

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy

Yes ... this is because it was the OS that killed the process, and
wasn't related to Cassandra crashing.  Reviewing our monitoring, we
saw that memory utilization was pegged at 100% for days and days
before it was finally killed because 'apt' was fighting for resource.
At least, that's as far as I got in my investigation before giving up,
moving to 0.8.0 and implementing 24hr nodetool repair on each node via
cronjobso far ... no problems.

On Wed, Jun 22, 2011 at 2:49 PM, William Oberman
ober...@civicscience.com wrote:
 Well, I managed to run 50 days before an OOM, so any changes I make will
 take a while to test ;-)  I've seen the GCInspector log lines appear
 periodically in my logs, but I didn't see a correlation with the crash.
 I'll read the instructions on how to properly do a rolling upgrade today,
 practice on test, and try that on production first.
 will

Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy

I'd implement the concept of a bank account using counters in a
counter column family.  one row per account ... each column for
transaction data and one column for the actual balance.
just so long as you use whole numbers ... no one needs pennies anymore.
-sd

On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote:
 Hello,
 I was wondering if anyone had architecture thoughts of creating a simple
 bank account program that does not use transactions. I think creating an
 example project like this would be a good thing to have for a lot of the
 discussions that pop up about transactions and Cassandra (and
 non-transactional datastores in general).
 Consider the simple system that has accounts, and users can transfer money
 between the accounts.
 There are these interesting papers as background (links below).
 Thank you.
 Trevor Smith
 http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
 http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf
 http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

Re: Storing Accounting Data

2011-06-22 Thread Sasha Dolgy

but you can store the -details- of a transaction as json data and do
some sanity checks to validate that the data you currently have stored
aligns with the recorded transactions.  maybe a batch job run every 24
hours ...

On Wed, Jun 22, 2011 at 4:19 PM, Oleg Anastastasyev olega...@gmail.com wrote:

 Is C* suitable for storing customer account (financial) data, as well as
 billing, payroll, etc?  This is a new company so migration is not an
 issue... starting from scratch.

 If you need only store them - then yes, but if you require transactions 
 spanning
 multiple rows or column families, which i believe will be main functionality
 here - then definitely no, because cassandra has no ACID, no transactions
 spanning multiple rows and no ability to rollback.

Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy

I would still maintain a record of the transaction ... so that I can
do analysis post to determine if/when problems occurred ...

On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote:
 Sasha,
 How would you deal with a transfer between accounts in which only one half
 of the operation was successfully completed?
 Thank you.
 Trevor

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Sasha Dolgy

Implement monitoring and be proactive...that will stop you waking up to a
big surprise.  i'm sure there were symltoms leading up to all 4 nodes going
down.  willing to wager that each node went down at different times and not
all went down at once...
On Jun 22, 2011 11:50 PM, Les Hazlewood l...@katasoft.com wrote:
 I understand that every environment is different and it always 'depends'
:)
 But recommending settings and techniques based on an existing real
 production environment (like the user's suggestion to run nodetool repair
as
 a regular cron job) is always a better starting point for a new Cassandra
 evaluator than having to start from scratch.

 Ryan, do you have any 'seed' settings that you guys use for nodes at
 Twitter?

 Are there any resources/write-ups beyond the two I've listed already that
 address some of these 'gotchas'? If those two links are in fact the ideal
 starting point, that's fine - but it appears that this may not be the case
 however based on the aforementioned user as well as the other who helped
him
 who saw similar warning signs.

 I'm hoping for someone to dispel these reports based on what people
actually
 do in production today. Any info/settings/recommendations based on real
 production environments would be appreciated!

 Thanks again,

 Les

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy

http://www.twitpic.com/5fdabn
http://www.twitpic.com/5fdbdg

i do love a good graph.  two of the weekly memory utilization graphs
for 2 of the 4 servers from this ring... week 21 was a nice week ...
the week before 0.8.0 went out proper.  since then, bumped up to 0.8
and have seen a steady increase in the memory consumption (used) but
have not seen the swap do what it did ...and the buffered/cached seems
much better

-sd

On Thu, Jun 23, 2011 at 12:09 AM, Chris Burroughs
chris.burrou...@gmail.com wrote:

 In `free` terms, by pegged do you mean that free Mem was 0, or -/+
 buffers/cache as 0?

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy

yes.  each one corresponds with taking a node down for various
reasons.  i think more people should show their graphs.  it's great.
hoping Oberman has some.so we can see what his look like ,,

On Thu, Jun 23, 2011 at 12:40 AM, Chris Burroughs
chris.burrou...@gmail.com wrote:
 Do all of the reductions in Used on that graph correspond to node restarts?

 My Zabbix for reference: http://img194.imageshack.us/img194/383/2weekmem.png


 On 06/22/2011 06:35 PM, Sasha Dolgy wrote:
 http://www.twitpic.com/5fdabn
 http://www.twitpic.com/5fdbdg

 i do love a good graph.  two of the weekly memory utilization graphs
 for 2 of the 4 servers from this ring... week 21 was a nice week ...
 the week before 0.8.0 went out proper.  since then, bumped up to 0.8
 and have seen a steady increase in the memory consumption (used) but
 have not seen the swap do what it did ...and the buffered/cached seems
 much better

 -sd

 On Thu, Jun 23, 2011 at 12:09 AM, Chris Burroughs
 chris.burrou...@gmail.com wrote:

 In `free` terms, by pegged do you mean that free Mem was 0, or -/+
 buffers/cache as 0?





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Storing files in blob into Cassandra

2011-06-22 Thread Sasha Dolgy

maybe you want to spend a few minutes reading about Haystack over at
facebook to give you some ideas...

https://www.facebook.com/note.php?note_id=76191543919

Not saying what they've done is the right way... just sayin'

On Thu, Jun 23, 2011 at 6:29 AM, AJ a...@dude.podzone.net wrote:

 I was thinking of doing the same thing.  But, to compensate for the
 bandwidth usage during the read, I was hoping to find a way for the httpd or
 app server to cache the file either in RAM or on disk so subsequent reads
 could just reference the in-mem cache or local hdd.  I have big data
 requirements, so duplicating the storage of file blobs by adding them to the
 hdd would almost double my storage requirements.  So, the hdd cache would
 have to be limited with the LRU removed periodically.

 I was thinking about making the key for each file be a relative file path as
 if it were on disk.  This same path could also be used as it's actual
 location on disk in the local disk cache.  Using a path as the key makes it
 flexible in many ways if I ever change my mind and want to store all files
 on disk, or when backing-up or archiving, etc..

 But, I'm rusty on my apache http knowledge but I also thought there was an
 apache cache mod that would use both ram and disk depending on the frequency
 of use.  But, I don't know if you can tell it to cache this blob like it's
 a file.

 Just some thoughts.

Re: port 8080

2011-06-21 Thread Sasha Dolgy

it's defined in $CASSANDRA_HOME/conf/cassandra-env.sh

JMX_PORT=

Have it different for each instance ...

On Tue, Jun 21, 2011 at 1:24 PM, osishkin osishkin osish...@gmail.com wrote:
 I want to have several deamons running on a machine, each belinging to
 a multi-node cluster.
 Is that a problem in concern to port 8080, for jmx monitoring?
 Is it somewhere hardcoded, so that changing it is the configuration
 files is not enough?

 Thank you
 osi

Re: port 8080

2011-06-21 Thread Sasha Dolgy

Personally speaking, I do not run JMX on 8080, and never have.  The
tools, like cassandra-cli and nodetool expect it to be on the default
port, but you can override with -p or -jmxport

-sd

On Tue, Jun 21, 2011 at 1:33 PM, osishkin osishkin osish...@gmail.com wrote:
 I did, and everything seemed to work fine.
 But I saw a reference here
 http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html
 That said make sure you have at least one node listening on 8080
 since all the Cassandra tools assume JMX is listening there, and then
 remembered that I saw a warning regarding that port when we uploaded
 one of the machines.
 Unfortunately I don't have access to them currently, so I can't
 replicate it immediately.

 but I thought perhaps someone can repute my fear that there is
 something special about that port

 On Tue, Jun 21, 2011 at 2:28 PM, Sasha Dolgy sdo...@gmail.com wrote:
 it's defined in $CASSANDRA_HOME/conf/cassandra-env.sh

 JMX_PORT=

 Have it different for each instance ...

 On Tue, Jun 21, 2011 at 1:24 PM, osishkin osishkin osish...@gmail.com 
 wrote:
 I want to have several deamons running on a machine, each belinging to
 a multi-node cluster.
 Is that a problem in concern to port 8080, for jmx monitoring?
 Is it somewhere hardcoded, so that changing it is the configuration
 files is not enough?

 Thank you
 osi





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: pig integration NoClassDefFoundError TypeParser

2011-06-21 Thread Sasha Dolgy

bang on ... no idea why ... a new day a fresh login ... environment
variables gone.  working now with cassandra 0.8.0 and pig 0.8.1

went through all my steps and all is working ... except line 45 in the
bin/pig_cassandra is not proper when there are multiple pig*.jar
files.

On Mon, Jun 20, 2011 at 10:03 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 I think you might be having environment/classpath issues with an RC of 
 cassandra 0.8 or something.

solandra or pig or....?

2011-06-21 Thread Sasha Dolgy

Folks,

Simple question ... Assuming my current use case is the ability to log
lots of trivial and seemingly useless sports statistics ... I want a
user to be able to query / compare  For example:

-- Show me all baseball players in cheektowaga and ontario,
california who have hit a grandslam on tuesdays where it was just a
leap year.

Each baseball player is represented by a single row in a CF:

player_uuid, fullname, hometown, game1, game2, game3, game4

Game's are UUID's that are a reference to another row in the same CF
that provides information about that game...

location, final score, date (unix timestamp or ISO format) , and
statitics which are represented as a new column timestamp:player_uuid

I can use PIG, as I understand, to run a query to generate specific
information about specific things and populate that data back into
Cassandra in another CF ... similar to the hypothetical search
aboveas the information is structured already, i assume PIG is the
right tool for the job, but may not be ideal for a web application and
enabling ad-hoc queries ... it could take anywhere from 2-?
seconds for that query to generate, populate, and return to the
user...?

On the other hand, I have started to read about Solr / Solandra /
Lucandra  can this provide similar functionality or better ?  or
is it more geared towards full text search and indexing ...

I don't want to get into the habit of guessing what my potential users
want to search for ... trying to think of ways to offload this to
them.



-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: solandra or pig or....?

2011-06-21 Thread Sasha Dolgy

Without getting overly complicated and long winded ... are there
practical references / examples I can review that demonstrate the
cassandra/solandra benefitsi had a quick look at
https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't
dead obvious to me

On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani jak...@gmail.com wrote:
 Solandra can answer the question you used as an example and it's more of a
 fit for low-latency ad-hoc reporting then PIG.  Pig queries will take
 minutes not seconds.
 On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy sdo...@gmail.com wrote:

 Folks,

 Simple question ... Assuming my current use case is the ability to log
 lots of trivial and seemingly useless sports statistics ... I want a
 user to be able to query / compare  For example:

 -- Show me all baseball players in cheektowaga and ontario,
 california who have hit a grandslam on tuesdays where it was just a
 leap year.

 Each baseball player is represented by a single row in a CF:

 player_uuid, fullname, hometown, game1, game2, game3, game4

 Game's are UUID's that are a reference to another row in the same CF
 that provides information about that game...

 location, final score, date (unix timestamp or ISO format) , and
 statitics which are represented as a new column timestamp:player_uuid

 I can use PIG, as I understand, to run a query to generate specific
 information about specific things and populate that data back into
 Cassandra in another CF ... similar to the hypothetical search
 aboveas the information is structured already, i assume PIG is the
 right tool for the job, but may not be ideal for a web application and
 enabling ad-hoc queries ... it could take anywhere from 2-?
 seconds for that query to generate, populate, and return to the
 user...?

 On the other hand, I have started to read about Solr / Solandra /
 Lucandra  can this provide similar functionality or better ?  or
 is it more geared towards full text search and indexing ...

 I don't want to get into the habit of guessing what my potential users
 want to search for ... trying to think of ways to offload this to
 them.



 --
 Sasha Dolgy
 sasha.do...@gmail.com



 --
 http://twitter.com/tjake




-- 
Sasha Dolgy
sasha.do...@gmail.com

pig integration NoClassDefFoundError TypeParser

2011-06-20 Thread Sasha Dolgy

Been trying for the past little bit to try and get the PIG integration
working with Cassandra 0.8.0

1.  Downloaded the src for 0.8.0 and ran ant build
2.  went into contrib/pig and ran ant ... gives me:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
and is copied into the lib/ directory
3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
- I did try to run it with Jackson 1.4 as the
contrib/pig/README.txt suggested, but that failed...  The referenced
JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
results)

Environment variables are set:
java version 1.6.0_24

PIG_INITIAL_ADDRESS=localhost
PIG_HOME=/usr/local/src/pig-0.8.1
PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
PIG_RPC_PORT=9160
CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src

I then start up cassandra ... no issues.  I connect and create a new
keyspace called foo with a column family called bar and a CF called
foo...Inside the CF bar, I create a few rows, with random columns 
4 Rows.

From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
get the error:

[: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator

-- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then

*** Problem here is that $PIG_JAR is a reference to two files ...
pig-0.8.1-core.jar  pig.jar ...

Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar

Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:

2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
error messages to:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
2011-06-21 02:07:23,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
grunt register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
/usr/local/src/pig-0.8.1/pig.jar; register
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
register 
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
grunt
grunt rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
grunt STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-21 02:04:53,324 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2011-06-21 02:04:53,447 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
Operator Key: scope-1)
2011-06-21 02:04:53,458 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-06-21 02:04:53,480 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-06-21 02:04:53,556 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-06-21 02:04:59,700 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-06-21 02:04:59,718 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,719 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-06-21 02:04:59,948 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,960 [Thread-5] INFO

Re: pig integration NoClassDefFoundError TypeParser

2011-06-20 Thread Sasha Dolgy

Hi ... I still have the same problem with pig-0.8.0-cdh3u0...

Maybe I'm doing something wrong.  Where does
org/apache/cassandra/db/marshal/TypeParser exist, or should exist?

It's not in the $CASSANDRA_HOME/libs or
/usr/local/src/pig-0.8.0-cdh3u0/lib or
/usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars


for jar in `ls *.jar`
  do
  jar -tf $jar | grep TypeParser
  if [ $? -eq 0 ]; then
 echo $jar
  fi
  done

Shows me nothing in all the lib dirs



On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 Try running with cdh3u0 version of pig and see if it has the same problem.  
 They backported the patch (to pig 0.9 which should be out in time for the 
 hadoop summit next week) that adds the updated jackson dependency for avro.  
 The download URL for that is - 
 http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz

 Alternatively, I believe today brisk beta 2 will be out which has pig 
 integrated.  Not sure if that would work for your current environment though.

 See if that works.
 On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:

 Been trying for the past little bit to try and get the PIG integration
 working with Cassandra 0.8.0

 1.  Downloaded the src for 0.8.0 and ran ant build
 2.  went into contrib/pig and ran ant ... gives me:
 /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
 and is copied into the lib/ directory
 3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
 that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
 two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
 - I did try to run it with Jackson 1.4 as the
 contrib/pig/README.txt suggested, but that failed...  The referenced
 JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
 results)

 Environment variables are set:
 java version 1.6.0_24

 PIG_INITIAL_ADDRESS=localhost
 PIG_HOME=/usr/local/src/pig-0.8.1
 PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
 PIG_RPC_PORT=9160
 CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src

 I then start up cassandra ... no issues.  I connect and create a new
 keyspace called foo with a column family called bar and a CF called
 foo...Inside the CF bar, I create a few rows, with random columns 
 4 Rows.

 From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
 get the error:

 [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator

 -- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then

 *** Problem here is that $PIG_JAR is a reference to two files ...
 pig-0.8.1-core.jar  pig.jar ...

 Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
 even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar

 Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:

 2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
 error messages to:
 /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
 2011-06-21 02:07:23,778 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
 Connecting to hadoop file system at: file:///
 grunt register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
 /usr/local/src/pig-0.8.1/pig.jar; register
 /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
 register 
 /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
 register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
 grunt
 grunt rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
 grunt STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
 2011-06-21 02:04:53,271 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
 script: UNKNOWN
 2011-06-21 02:04:53,271 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
 pig.usenewlogicalplan is set to true. New logical plan will be used.
 2011-06-21 02:04:53,324 [main] INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
 with processName=JobTracker, sessionId=
 2011-06-21 02:04:53,447 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
 (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
 Operator Key: scope-1)
 2011-06-21 02:04:53,458 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
 - File concatenation threshold: 100 optimistic? false
 2011-06-21 02:04:53,477 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 1
 2011-06-21 02:04:53,477 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
 2011-06-21 02:04:53,480 [main] INFO
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
 Metrics with processName=JobTracker, sessionId= - already initialized
 2011-06-21 02:04:53,494 [main] INFO

Re: pig integration NoClassDefFoundError TypeParser

2011-06-20 Thread Sasha Dolgy

Yes ... I ran an ant in the root directory on a fresh download of 0.8.0 src:

/usr/local/src/apache-cassandra-0.8.0-src# ls
/usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
AbstractCommutativeType.class   AbstractType.class
 LexicalUUIDType.class   UTF8Type.class
AbstractType$1.classAbstractUUIDType.class
 LocalByPartionerType.class  UTF8Type$UTF8Validator.class
AbstractType$2.classAsciiType.class
 LongType.class
UTF8Type$UTF8Validator$State.class
AbstractType$3.classBytesType.class
 MarshalException.class  UUIDType.class
AbstractType$4.classCounterColumnType.class
 TimeUUIDType.class
AbstractType$5.classIntegerType.class
 UTF8Type$1.class

/usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#

/usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#

TypeParser does not exist...?


On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 hmmm, did you build the cassandra src in the root of your cassandra directory 
 with ant?  sounds like it can't find that cassandra class.  That's required.

Re: pig integration NoClassDefFoundError TypeParser

2011-06-20 Thread Sasha Dolgy

cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: doesn't exist
cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: exists...

PIG integration with 0.8.0 is no longer working / doesn't work with
0.8.0 release, but will with 0.8.1 .. fair assumption?

On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy sdo...@gmail.com wrote:
 Yes ... I ran an ant in the root directory on a fresh download of 0.8.0 src:

 /usr/local/src/apache-cassandra-0.8.0-src# ls
 /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
 AbstractCommutativeType.class       AbstractType.class
  LexicalUUIDType.class               UTF8Type.class
 AbstractType$1.class                AbstractUUIDType.class
  LocalByPartionerType.class          UTF8Type$UTF8Validator.class
 AbstractType$2.class                AsciiType.class
  LongType.class
 UTF8Type$UTF8Validator$State.class
 AbstractType$3.class                BytesType.class
  MarshalException.class              UUIDType.class
 AbstractType$4.class                CounterColumnType.class
  TimeUUIDType.class
 AbstractType$5.class                IntegerType.class
  UTF8Type$1.class

 /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
 /usr/local/src/apache-cassandra-0.8.0-src# echo $?
 1
 /usr/local/src/apache-cassandra-0.8.0-src#

 /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
 /usr/local/src/apache-cassandra-0.8.0-src# echo $?
 1
 /usr/local/src/apache-cassandra-0.8.0-src#

 TypeParser does not exist...?


 On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
 jeremy.hanna1...@gmail.com wrote:
 hmmm, did you build the cassandra src in the root of your cassandra 
 directory with ant?  sounds like it can't find that cassandra class.  That's 
 required.




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: cassandra crash

2011-06-17 Thread Sasha Dolgy

What type of environment?  We had issues with our cluster on 0.7.6-2 ... The
messages you see and highlighted, from what I recall aren't bad ... they are
good.  Investigating our crash, it turns out that the OS killed our
Cassandra process and this was found in /var/log/messages

Since then, I have implemented a routine nodetool repair and upgraded to
0.8.0 which seems to have fixed the problem.

Can you post specifics about your environment?  version,  # of nodes, size,
etc...?  That generally helps people to guess better where your problems are
(with respects to the crash you had...)

-sd

2011/6/17 Donna Li donna...@utstar.com

  All:

 Can you find some exception from the last sentence? Would cassandra crash
 when memory is not enough? There are some other application run with
 cassandra, the other application may use large memory.






  --

 *发件人:* Donna Li
 *发送时间:* 2011年6月17日 9:58
 *收件人:* user@cassandra.apache.org
 *主题:* cassandra crash



 All:

 Why cassandra crash after print the following log?



 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-206-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-207-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-137-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-205-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-139-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-138-Data.db

  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021
 SSTableDeletingReference.java (line 104) Deleted
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-208-Data.db

  INFO [GC inspection] 2011-06-16 14:22:59,562 GCInspector.java (line 110)
 GC for ParNew: 385 ms, 26859800 reclaimed leaving 117789112 used; max is
 118784





 Best Regards

 Donna li

Re: Cassandra.yaml

2011-06-17 Thread Sasha Dolgy

Hi Vivek,

When I write client code in Java, using Hector, I don't specify a
cassandra.yaml ... I specify the host(s) and keyspace I want to
connect to.  Alternately, I specify the host(s) and create the
keyspace if the one I would like to use doesn't exist (new cluster for
example).  At no point do I use yaml file with my client code

The conf/cassandra.yaml is there to tell the cassandra server how to
behave / operate when it starts ...

-sd

On Fri, Jun 17, 2011 at 9:55 AM, Vivek Mishra
vivek.mis...@impetus.co.in wrote:

 I have a query:

 I have my Cassandra server running on my local machine and it has loaded 
 Cassandra specific settings from

 apache-cassandra-0.8.0-src/apache-cassandra-0.8.0-src/conf/cassandra.yaml

 Now If I am writing a java program to connect to this server why do I need to 
 provide a new Cassandra.yaml file again?  Even if server is already up and 
 running

 Even if I can create keyspaces, columnfamilies programmatically?  Isn’t it 
 some type of redundancy?

 Might be my query is a bit irrelevant.

 -Vivek

Re: Querying superColumn

2011-06-17 Thread Sasha Dolgy

Write two records ...

1.  [department1] = { Vivek : India }
2.  [India] = { Vivek : department1 }

1.  [department1] = { Vivs : USA }
2.  [USA] = { Vivs : department1 }

Now you can query a single row to display all employees in USA or all
employees in department1 ... employee moves to a new department in a
new country, simply remove the column from that department row and
country row and re-insert into the new rows...

My understanding with Cassandra and similar technologies is that you
are designing to be smart and avoid data duplication.  You are
designing to address the searches and queries based on your business
requirements ... when you know what those are, you cheat and
pre-populate the data you will be searching on ...

On Fri, Jun 17, 2011 at 1:16 PM, Vivek Mishra
vivek.mis...@impetus.co.in wrote:
 Correct. But that will not solve issue of data colocation(data locality) ?



 From: Sasha Dolgy [mailto:sdo...@gmail.com]
 Sent: Thursday, June 16, 2011 8:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Querying superColumn



 Have 1 row with employee info for country/office/division, each column an
 employee id and json info about the employee or a reference.to.another row
 id for that employee data

 No more supercolumn.

 On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote:
 I have a question about querying super column

 For example:

 I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'(
 name, country).

 Now for rowKey 'DEPT1' I have inserted multiple super column like:

 Employee1{
 Name: Vivek
 country: India
 }

 Employee2{
 Name: Vivs
 country: USA
 }



 Now if I want to retrieve a super column whose rowkey is 'DEPT1' and
 employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ?



 -Vivek

 

 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to
 attend a live session by Head of Impetus Labs on 'Secrets of Building a
 Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the
 Cloud '.

 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
 webinar on May 27 by registering at
 http://www.impetus.com/webinar?eventid=42 .


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.

 
 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend
 a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud
 Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud
 ‘.

 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
 webinar on May 27 by registering at
 http://www.impetus.com/webinar?eventid=42 .


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: getFieldValue()

2011-06-17 Thread Sasha Dolgy

A good example for what I understand in using Hector / pycassa / etc.
is, if you wanted to implement connection pooling, you would have to
craft your own solution, versus implementing the solution that is
tested and ready to go, provided by Hector.  Thrift doesn't provide
native connection pooling ...?

There are a few scenarios / examples where using a library that
abstracts the Thrift bindings will make your life easier ... and they
are maintained and up to date generally in alignment with new releases
of Cassandra.  That's a +1 for me ...

Nothing stops you from using Thrift .. depends on how much work you
want to implement yourself.
-sd

On Fri, Jun 17, 2011 at 5:30 PM, Markus Wiesenbacher | Codefreun.de
m...@codefreun.de wrote:
 One question regarding point 2: Why should we always use Hector, Thrift is 
 not that bad?

Re: Docs: Token Selection

2011-06-17 Thread Sasha Dolgy

+1 for this if it is possible...

On Fri, Jun 17, 2011 at 6:31 PM, Eric tamme eta...@gmail.com wrote:
 What I don't like about NTS is I would have to have more replicas than I
 need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt that 2 local
 replicas was insufficient, I'd have to move up to RF=6 which seems like a
 waste... I'm predicting data in the TB range so I'm trying to keep replicas
 to a minimum.

 My goal is to have 2-3 replicas in a local data center and 1 replica in
 another dc.  I think that would be enough barring a major catastrophe.  But,
 I'm not sure this is possible.  I define local as in the same data center
 as the client doing the insert/update.

 Yes, not being able to configure the replication factor differently
 for each data center is a bit annoying.  Im assuming you basically
 want DC1 to have a replication factor of {DC1:2, DC2:1} and DC2 to
 have {DC1:1,DC2:2}.

 I would very much like that feature as well, but I dont know the
 feasibility of it.

 -Eric

Re: urgent how to specify multiple hosts in cassandra

2011-06-17 Thread Sasha Dolgy

have them all within a   and not multiple  ,  

for example:

seeds: 192.168.1.115, 192.168.1.110

versus what you have...

On Fri, Jun 17, 2011 at 7:00 PM, Anurag Gujral anurag.guj...@gmail.com wrote:
 Hi All
           I specified multiple hosts in seeds field when using cassandra-0.8
 like this
  seeds: 192.168.1.115,192.168.1.110,192.168.1.113
 But I am getting error that
 hile parsing a block mapping
  in reader, line 106, column 13:
               - seeds: 192.168.1.115,192.168. ...
                 ^
 expected block end, but found FlowEntry
  in reader, line 106, column 35:
               - seeds: 192.168.1.115,192.168.1.110,192.168.1.113 ...
                                       ^
 Please suggest I am doing upgrade right now/
 Thanks
 Anurag



-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Docs: Token Selection

2011-06-17 Thread Sasha Dolgy

Replication factor is defined per keyspace if i'm not mistaken.  Can't
remember if NTS is per keyspace or per cluster ... if it's per
keyspace, that would be a way around it ... without having to maintain
multiple clusters  just have multiple keyspaces ...

On Fri, Jun 17, 2011 at 9:23 PM, AJ a...@dude.podzone.net wrote:
 On 6/17/2011 12:32 PM, Jeremiah Jordan wrote:

        Run two clusters, one which has {DC1:2, DC2:1} and one which is
 {DC1:1,DC2:2}.  You can't have both in the same cluster, otherwise it
 isn't possible to tell where the data got written when you want to read
 it.  For a given key XYZ you must be able to compute which nodes it is
 stored on just using XYZ, so a strategy where it is on nodes
 DC1_1,DC1_2, and DC2_1 when a node in DC1 is the coordinator, and to
 DC1_1, DC2_1 and DC2_2 when a node in DC2 is the coordinator won't work.
 Given just XYZ I don't know where to look for the data.
        But, from the way you describe what you want to happen, clients
 from DC1 aren't using data inserted by clients from DC2, so you should
 just make two different Cassandra clusters.  Once for the DC1 guys which
 is {DC1:2, DC2:1} and one for the DC2 guys which is {DC1:1,DC2:2}.


 Interesting.  Thx.





-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: sstable2json2sstable bug with json data stored

2011-06-16 Thread Sasha Dolgy

The JSON you are showing below is an export from cassandra?

{ 74657374: [[data, {foo:bar}, 1308209845388000]] }

Does this work?

{
74657374: [[data, {foo:bar}, 1308209845388000]]
}

-sd

On Thu, Jun 16, 2011 at 9:49 AM, Timo Nentwig timo.nent...@toptarif.de wrote:
 On 6/15/11 17:41, Timo Nentwig wrote:

 (json can likely be boiled down even more...)

 Any JSON (well, probably anything with quotes...) breaks it:

 {
 74657374: [[data, {foo:bar}, 1308209845388000]]
 }

 [default@foo] set transactions[test][data]='{foo:bar}';

 I feared that storing data in a readable fashion would be a fateful idea.

 https://issues.apache.org/jira/browse/CASSANDRA-2780

Re: Querying superColumn

2011-06-16 Thread Sasha Dolgy

Have 1 row with employee info for country/office/division, each column an
employee id and json info about the employee or a reference.to.another row
id for that employee data

No more supercolumn.
On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote:
 I have a question about querying super column

 For example:

 I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'(
name, country).

 Now for rowKey 'DEPT1' I have inserted multiple super column like:

 Employee1{
 Name: Vivek
 country: India
 }

 Employee2{
 Name: Vivs
 country: USA
 }



 Now if I want to retrieve a super column whose rowkey is 'DEPT1' and
employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ?



 -Vivek

 

 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to
attend a live session by Head of Impetus Labs on 'Secrets of Building a
Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the
Cloud '.

 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
webinar on May 27 by registering at
http://www.impetus.com/webinar?eventid=42 .


 NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

Re: Docs: Token Selection

2011-06-16 Thread Sasha Dolgy

So, with ec2 ... 3 regions (DC's), each one is +1 from another?
On Jun 16, 2011 3:40 PM, AJ a...@dude.podzone.net wrote:
 Thanks Eric! I've finally got it! I feel like I've just been initiated
 or something by discovering this secret. I kid!

 But, I'm thinking about using OldNetworkTopStrat. Do you, or anyone
 else, know if the same rules for token assignment applies to ONTS?


 On 6/16/2011 7:21 AM, Eric tamme wrote:
 AJ,

 sorry I seemed to miss the original email on this thread. As Aaron
 said, when computing tokens for multiple data centers, you should
 compute them independently for each data center - as if it were its
 own Cassandra cluster.

 You can have overlapping token ranges between multiple data centers,
 but no two nodes can have the same token, so for subsequent data
 centers I just increment the tokens.

 For two data centers with two nodes each using RandomPartitioner
 calculate the tokens for the first DC normally, but int he second data
 center, increment the tokens by one.

 In DC 1
 node 1 = 0
 node 2 = 85070591730234615865843651857942052864

 In DC 2
 node 1 = 1
 node 2 = 85070591730234615865843651857942052865

 For RowMutations this will give each data center a local set of nodes
 that it can write to for complete coverage of the entire token space.
 If you are using NetworkTopologyStrategy for replication, it will give
 an offset mirror replication between the two data centers so that your
 replicas will not get pinned to a node in the remote DC. There are
 other ways to select the tokens, but the increment method is the
 simplest to manage and continue to grow with.

 Hope that helps.

 -Eric

Re: cascading failures due to memory

2011-06-15 Thread Sasha Dolgy

No.  Upgraded to 0.8 and monitor the systems more.  we schedule a repair
every 24hrs via cron and so far no problems..
On Jun 15, 2011 5:44 PM, AJ a...@dude.podzone.net wrote:
 Sasha,

 Did you ever nail down the cause of this problem?

 On 5/31/2011 4:01 AM, Sasha Dolgy wrote:
 hi everyone,

 the current nodes i have deployed (4) have all been working fine, with
 not a lot of data ... more reads than writes at the moment. as i had
 monitoring disabled, when one node's OS killed the cassandra process
 due to out of memory problems ... that was fine. 24 hours later,
 another node, 24 hours later, another node ...until finally, all 4
 nodes no longer had cassandra running.

 When all nodes are started fresh, CPU utilization is at about 21% on
 each box. after 24 hours, this goes up to 32% and then 51% 24 hours
 later.

 originally I had thought that this may be a result of 'nodetool
 repair' not being run consistently ... after adding a cronjob to run
 every 24 hours (staggered between nodes) the problem of the increasing
 memory utilization does not resolve.

 i've read the operations page and also the
 http://wiki.apache.org/cassandra/MemtableThresholds page. i am
 running defaults and 0.7.6-02 ...

 what are the best places to start in terms of finding why this is
 happening? CF design / usage? 'nodetool cfstats' gives me some good
 info ... and i've already implemented some changes to one CF based on
 how it had ballooned (too many rows versus not enough columns)

 suggestions appreciated

Re: What's the best approach to search in Cassandra

2011-06-15 Thread Sasha Dolgy

Datastax has pretty sufficient documentation on their site for secondary
indexes.
On Jun 16, 2011 6:57 AM, Mark Kerzner markkerz...@gmail.com wrote:
 Jake,

 *You need to maintain a huge number of distinct indexes.*
 *
 *
 *Are we talking about secondary indexes? If yes, this sounds like exactly
my
 problem. There is so little documentation! - but I think that if I read
all
 there is on GitHub, I can probably start using it.
 *

 Thank you,
 Mark

 On Fri, Jun 3, 2011 at 8:07 PM, Jake Luciani jak...@gmail.com wrote:

 Mark,

 Check out Solandra. http://github.com/tjake/Solandra


 On Fri, Jun 3, 2011 at 7:56 PM, Mark Kerzner markkerz...@gmail.com
wrote:

 Hi,

 I need to store, say, 10M-100M documents, with each document having say
 100 fields, like author, creation date, access date, etc., and then I
want
 to ask questions like

 give me all documents whose author is like abc**, and creation date any
 time in 2010 and access date in 2010-2011, and so on, perhaps 10-20
 conditions, matching a list of some keywords.

 What's best, Lucene, Katta, Cassandra CF with secondary indices, or plan
 scan and compare of every record?

 Thanks a bunch!

 Mark




 --
 http://twitter.com/tjake

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy

Hi ...

Does anyone else see these type of INFO messages in their log files,
or is i just me..?

INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
21:28:39,877 AntiEntropyService.java (line 177) Excluding
/10.128.34.18 from repair because it is on version 0.7 or sooner. You
should consider updating this node before running repair again.
ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception
in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI
Runtime]
java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$KeyIterator.next(HashMap.java:828)
   at 
org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
   at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)

I'm at a loss as to why this is showing up in the logs.
-sd

On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote:
 hm.  that's not it.  we've been using a non-standard jmx port for some 
 time

 i've dropped the keyspace and recreated ...

 wonder if that'll help

 On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote:
 On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote:

 I recall there being a discussion about a default port changing from
 0.7.x to 0.8.x ...this was JMX, correct?  Or were there others.

 Yes, the default JMX port changed from 8080 to 7199.  I don't think there
 were any others.

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy

Hi Sylvain,

I verified on all nodes with nodetool version that they are 0.8 and have
even restarted nodes. Still persists. The four nodes all report similar
errors about the other nodes.

When i upgraded to 0.8 maybe there were relics about the keyspace that say
it's from an earlier version?

I need to create a new keyspace to see if that fixes the error
On Jun 14, 2011 10:08 AM, Sylvain Lebresne sylv...@datastax.com wrote:
The exception itself is a bug (I've created
https://issues.apache.org/jira/browse/CASSANDRA-2767 to fix it).

However, the important message is the previous one (Even if the
exception was not thrown, repair wouldn't be able to work correctly,
so the fact that the exception is thrown is not such a big deal).
Apparently, from the standpoint of whomever node this logs is from,
the node 10.128.34.18 is still running 0.7. You should check if it is
the case (restarting 10.128.34.18 and look for something like
'Cassandra version: 0.8.0' is one solution). If the does does run
0.8.0 and you still get this error, then it would point to a problem
with our detection of the nodes.

--
Sylvain

On Tue, Jun 14, 2011 at 9:55 AM, Sasha Dolgy sdo...@gmail.com wrote:
Hi ...

Does anyone else see these type of INFO messages in their log files,
or is i just me..?

INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
21:28:39,877 AntiEntropyService.java (line 177) Excluding
/10.128.34.18 from repair because it is on version 0.7 or sooner. You
should consider updating this node before running repair again.
ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13
21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception
in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI
Runtime]
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
at java.util.HashMap$KeyIterator.next(HashMap.java:828)
at
org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
at
org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)

I'm at a loss as to why this is showing up in the logs.
-sd

On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote:
hm. that's not it. we've been using a non-standard jmx port for some
time

i've dropped the keyspace and recreated ...

wonder if that'll help

On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote:
On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote:

I recall there being a discussion about a default port changing from
0.7.x to 0.8.x ...this was JMX, correct? Or were there others.

Yes, the default JMX port changed from 8080 to 7199. I don't think
there
were any others.

Re: odd logs after repair

2011-06-14 Thread Sasha Dolgy

https://issues.apache.org/jira/browse/CASSANDRA-2768

On Tue, Jun 14, 2011 at 10:55 AM, Sylvain Lebresne sylv...@datastax.com wrote:
Could you open a ticket then please ?

--
Sylvain

On Tue, Jun 14, 2011 at 10:25 AM, Sasha Dolgy sdo...@gmail.com wrote:
Hi Sylvain,

I verified on all nodes with nodetool version that they are 0.8 and have
even restarted nodes. Still persists. The four nodes all report similar
errors about the other nodes.

When i upgraded to 0.8 maybe there were relics about the keyspace that say
it's from an earlier version?

I need to create a new keyspace to see if that fixes the error

On Jun 14, 2011 10:08 AM, Sylvain Lebresne sylv...@datastax.com wrote:
The exception itself is a bug (I've created
https://issues.apache.org/jira/browse/CASSANDRA-2767 to fix it).

--
Sylvain

On Tue, Jun 14, 2011 at 9:55 AM, Sasha Dolgy sdo...@gmail.com wrote:
Hi ...

Does anyone else see these type of INFO messages in their log files,
or is i just me..?

I'm at a loss as to why this is showing up in the logs.
-sd

On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote:
hm. that's not it. we've been using a non-standard jmx port for some
time

i've dropped the keyspace and recreated ...

wonder if that'll help

On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote:
On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote:

I recall there being a discussion about a default port changing from
0.7.x to 0.8.x ...this was JMX, correct? Or were there others.

Yes, the default JMX port changed from 8080 to 7199. I don't think
there
were any others.

--
Sasha Dolgy
sasha.do...@gmail.com

Re: New web client future API

2011-06-14 Thread Sasha Dolgy

Your application is built with the thrift bindings and not with a
higher level client like Hector?

On Tue, Jun 14, 2011 at 3:42 PM, Markus Wiesenbacher | Codefreun.de
m...@codefreun.de wrote:

 Hi,

 what is the future API for Cassandra? Thrift, Avro, CQL?

 I just released an early version of my web client
 (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I
 would like to know what the future is ...

 Many thanks
 MW




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: odd logs after repair

2011-06-13 Thread Sasha Dolgy

I recall there being a discussion about a default port changing from
0.7.x to 0.8.x ...this was JMX, correct? Or were there others.

On Mon, Jun 13, 2011 at 3:34 PM, Sasha Dolgy sdo...@gmail.com wrote:
Hi Aaron,

The error is being reported on all 4 nodes. I have confirmed (for my
own sanity) that each node is running: ReleaseVersion: 0.8.0

I can reproduce the error on any node by trailing
cassandra/logs/system.log and running nodetool repair

When I run nodetool ring, the ring looks balanced and nothing out of sorts.

I also have this set up with RF=3 on 4 nodes ... but repair was
working fine prior to the 0.8.0 upgrade.

Are there any special commands I need to run? I've tried scrub,
cleanup, flush too ... still, repair gives the same issues.

-- I have stopped one of the nodes and started it. Issue still
persists. I stop another node that is reported in the logs (like .18
above) and start it ... run repair again ... issue is persisted to the
log file still.

-sd

On Mon, Jun 13, 2011 at 3:02 PM, aaron morton aa...@thelastpickle.com wrote:
You can double check with node tool e.g.

$ ./bin/nodetool -h localhost version
ReleaseVersion: 0.8.0-SNAPSHOT

This error is about the internode wire protocol one node thinks another is
using. Not sure how it could get confused, does it go away if you restart
the node that logged the error ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13 Jun 2011, at 06:19, Sasha Dolgy wrote:

Hi Everyone,

Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no
issues. Trolling the logs today, I find messages like this on all
four nodes:

INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13
02:16:45,978 AntiEntropyService.java (line 177) Excluding
/10.128.34.18 from repair because it is on version 0.7 or sooner. You
should consider updating this node before running repair again.

Maybe it would be nice to have the version of all nodes print in
nodetool ring ? I don't think I'm crazy though ... have manually
checked all are on 0.8.0

--
Sasha Dolgy
sasha.do...@gmail.com

Re: count column in Cassandra

2011-06-13 Thread Sasha Dolgy

probably helpful if you change the subject when posting about a
different topic.

Is your question about counters or the count function?

Counters are cool.
Count allows you to determine how many columns exist in a row.

-sd

On Mon, Jun 13, 2011 at 5:27 PM, Sijie YANG iyan...@gmail.com wrote:
 Hi, All
 I am newbie to cassandra. I have a simple question but don't find any clear
 answer by searching google:
 What's the meaning of count column in Cassandra? Thanks.

Re: SSL Streaming

2011-06-13 Thread Sasha Dolgy

AJ was responding to an email I sent in Marchalthough i do appreciate
the quick reaponse from the community ;) i moved on to our implementation of
vpn...
On Jun 14, 2011 1:35 AM, aaron morton aa...@thelastpickle.com wrote:
 Sasha does
https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362help
?

 A


 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 13 Jun 2011, at 23:26, AJ wrote:

 Performance-wise, I think it would be better to just let the client
encrypt sensitive data before storing it, versus encrypting all traffic all
the time. If individual values are encrypted, then they don't have to be
encrypted/decrypted during transit between nodes during the initial updates
as well as during the commissioning of a new node or other times.

 A drawback, however, is now you have to manage one or more keys for the
lifetime of the data. It will also complicate your data view interfaces.
However, if Cassandra had data encryption built-in somehow, that would solve
this problem... just thinking out loud.

 Can anyone think of other pro/cons of both strategies?

 On 3/22/2011 2:21 AM, Sasha Dolgy wrote:
 Hi,

 Is there documentation available anywhere that describes how one can
 use org.apache.cassandra.security.streaming.* ? After the EC2 posts
 yesterday, one question I was asked was about the security of data
 being shifted between nodes. Is it done in clear text, or
 encrypted..? I haven't seen anything to suggest that it's encrypted,
 but see in the source that security.streaming does leverage SSL ...

 Thanks in advance for some pointers to documentation.

 Also, for anyone who is using SSL .. how much of a performance impact
 have you noticed? Is it minimal or significant?

Re: Cassandra not starting right

2011-06-11 Thread Sasha Dolgy

Dont post to the list in html...that should work.

-f puts it to foreground. Without -f puts it to the bafkground
On Jun 11, 2011 7:29 AM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:
 Thanks for your help!

 It seems when I use this command:

 ./bin/cassandra -f 

 It makes it work.

 I still need to do contr-C

 Sorry I am emailing you directly, but for some reason every email I send
to
 the news letter sends me back an error.

 Thanks again.

Re: Cannot connect to Cassandra

2011-06-10 Thread Sasha Dolgy

netstat -an | grep 9160

see anything?  maybe cassandra service isn't running.?

look for hints in the log files.  these are defined in the
$CASSANDRA_HOME/conf/log4j-server.properties ...

On Fri, Jun 10, 2011 at 9:23 PM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 My Cassandra used to work with no problems.

 I was able to connect with no problems but now for some reason it doesn't
 work anymore.

 [default@unknown] connect localhost/9160;
 Exception connecting to localhost/9160. Reason: Connection refused.

 and

 root# ./bin/cassandra-cli -host localhost -port 9160
 Exception connecting to localhost/9160. Reason: Connection refused.

 Thanks in advance...




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: Python Client

2011-06-10 Thread Sasha Dolgy

pycassa..

http://pycassa.github.com/pycassa/


On Sat, Jun 11, 2011 at 4:58 AM, Carlos Sanchez papach...@gmail.com wrote:

 All,

 I was wondering if there are Cassandra python clients and which one would
 be the best to use

 Thanks a lot,

 Carlos

Re: after a while nothing happening with repair

2011-06-09 Thread Sasha Dolgy

I recall having this issue when one of the nodes wasn't available ... or
there was a problem during the repair process.  Cancelling the repair job
and rerunning it would complete successfully.  I believe there is a bug open
for this

https://issues.apache.org/jira/browse/CASSANDRA-2290

On Thu, Jun 9, 2011 at 10:28 AM, Jonathan Colby jonathan.co...@gmail.comwrote:

 When I run repair on a node in my 0.7.6-2 cluster, the repair starts to
 stream data and activity is seen in the logs.

 However, after a while (a day or so) it seems like everything freezes up.
 The repair command is still running (the command prompt has not returned)
 and netstats shows output similar to below.  All streams at 0% and nothing
 happening.  The logs indicate that things were started but there is no
 indication if anything is in fact still active.

 For example, this is the last log entry related to repair, just this
 morning:

  INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOut.java (line 173)
 Stream context metadata [/var/lib/cassandra/data/DFS/main-f-144-Data.db
 sections=2 progress=0/31947748 - 0%,
 /var/lib/cassandra/data/DFS/main-f-145-Data.db section
 s=2 progress=0/25786564 - 0%,
 /var/lib/cassandra/data/DFS/main-f-143-Data.db sections=2
 progress=0/5830103399 - 0%], 9 sstables.
  INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOutSession.java (line
 174) Streaming to /10.46.108.104


 However, netstats on all related notes looks something like this.  The
 nodes continue to handle read/write requests just  fine. They are not
 overloaded at all.

 Any advice would be greatly appreciated.  Because repairs seem like they
 never finish, I have a feeling we have a lot of garbage data in our cluster.

 /opt/cassandra/bin/nodetool -h $HOSTNAME -p 35014 netstats
 Mode: Normal
 Not sending any streams.
 Streaming from: /10.46.108.104
   DFS: /var/lib/cassandra/data/DFS/main-f-209-Data.db sections=2 
 progress=0/276461810
 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-153-Data.db sections=2
 progress=0/100340568 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-40-Data.db sections=2
 progress=0/62726190502 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-180-Data.db sections=1
 progress=0/158898493 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-109-Data.db sections=2
 progress=0/87250515569 - 0%
 Streaming from: /10.47.108.102
   DFS: /var/lib/cassandra/data/DFS/main-f-304-Data.db sections=2
 progress=0/13563864214 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-350-Data.db sections=1
 progress=0/2877129955 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-379-Data.db sections=2
 progress=0/143804948 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-370-Data.db sections=2
 progress=0/683716174 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-371-Data.db sections=2
 progress=0/56650 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-368-Data.db sections=2
 progress=0/4005533616 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-369-Data.db sections=2
 progress=0/155515922 - 0%
 Streaming from: /10.46.108.103
   DFS: /var/lib/cassandra/data/DFS/main-f-888-Data.db sections=2
 progress=0/158096259 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-828-Data.db sections=1
 progress=0/29508276 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-886-Data.db sections=2
 progress=0/133704150 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-759-Data.db sections=2
 progress=0/83629797522 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-889-Data.db sections=2
 progress=0/96903803 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-751-Data.db sections=2
 progress=0/17944852950 - 0%
 Streaming from: /10.46.108.101
   DFS: /var/lib/cassandra/data/DFS/main-f-1318-Data.db sections=2
 progress=0/60617216778 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-1179-Data.db sections=2
 progress=0/11870790009 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-1324-Data.db sections=2 
 progress=0/710603722
 - 0%
   DFS: /var/lib/cassandra/data/DFS/main-f-1322-Data.db sections=2
 progress=0/5844992187 - 0%




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: how to retrieve data from supercolumns by phpcassa ?

2011-06-08 Thread Sasha Dolgy

you'll find a response to this question on the phpcassa mailing list ...
where you asked the same question.
-sd

On Wed, Jun 8, 2011 at 10:22 AM, amrita amritajayakuma...@gmail.com wrote:

 Hi,
 Can u please tell me how to create a supercolumn and retrieve data from it
 using
 phpcassa???

 student_details{id{sid,lesson_id,answers{time_expired,answer_opted}}}

upgrading to cassandra 0.8

2011-06-07 Thread Sasha Dolgy

Hi,

Good news on the 0.8 release.  So ... if I upgrade one node out of four, and
let it run for a bit ... I should have no issues, correct?  If I make schema
changes, specifically, adding a new column family for counters, how will
this behave with the other three nodes that aren't upgraded?  Or ... should
schema changes not be done until all nodes are upgraded?

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: cascading failures due to memory

2011-06-01 Thread Sasha Dolgy

is there a specific string I should be looking for in the logs that
isn't super obvious to me at the moment...

On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The place to start is with the statistics Cassandra logs after each GC.

 On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy sdo...@gmail.com wrote:
 hi everyone,

 the current nodes i have deployed (4) have all been working fine, with
 not a lot of data ... more reads than writes at the moment.  as i had
 monitoring disabled, when one node's OS killed the cassandra process
 due to out of memory problems ... that was fine.  24 hours later,
 another node, 24 hours later, another node ...until finally, all 4
 nodes no longer had cassandra running.

 When all nodes are started fresh, CPU utilization is at about 21% on
 each box.  after 24 hours, this goes up to 32% and then 51% 24 hours
 later.

 originally I had thought that this may be a result of 'nodetool
 repair' not being run consistently ... after adding a cronjob to run
 every 24 hours (staggered between nodes) the problem of the increasing
 memory utilization does not resolve.

 i've read the operations page and also the
 http://wiki.apache.org/cassandra/MemtableThresholds page.  i am
 running defaults and 0.7.6-02 ...

 what are the best places to start in terms of finding why this is
 happening?  CF design / usage?  'nodetool cfstats' gives me some good
 info ... and i've already implemented some changes to one CF based on
 how it had ballooned (too many rows versus not enough columns)

 suggestions appreciated

 --
 Sasha Dolgy
 sasha.do...@gmail.com




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: cascading failures due to memory

2011-06-01 Thread Sasha Dolgy

and is there anything specific that could be causing the issue between
 Java SE 1.6.0_24 and 1.6.0_25 ?  All nodes are _24

up to 64% memory usage today

-sd

On Wed, Jun 1, 2011 at 9:30 PM, Sasha Dolgy sdo...@gmail.com wrote:
 is there a specific string I should be looking for in the logs that
 isn't super obvious to me at the moment...

 On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The place to start is with the statistics Cassandra logs after each GC.

 On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy sdo...@gmail.com wrote:
 hi everyone,

 the current nodes i have deployed (4) have all been working fine, with
 not a lot of data ... more reads than writes at the moment.  as i had
 monitoring disabled, when one node's OS killed the cassandra process
 due to out of memory problems ... that was fine.  24 hours later,
 another node, 24 hours later, another node ...until finally, all 4
 nodes no longer had cassandra running.

 When all nodes are started fresh, CPU utilization is at about 21% on
 each box.  after 24 hours, this goes up to 32% and then 51% 24 hours
 later.

 originally I had thought that this may be a result of 'nodetool
 repair' not being run consistently ... after adding a cronjob to run
 every 24 hours (staggered between nodes) the problem of the increasing
 memory utilization does not resolve.

 i've read the operations page and also the
 http://wiki.apache.org/cassandra/MemtableThresholds page.  i am
 running defaults and 0.7.6-02 ...

 what are the best places to start in terms of finding why this is
 happening?  CF design / usage?  'nodetool cfstats' gives me some good
 info ... and i've already implemented some changes to one CF based on
 how it had ballooned (too many rows versus not enough columns)

 suggestions appreciated

 --
 Sasha Dolgy
 sasha.do...@gmail.com




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




 --
 Sasha Dolgy
 sasha.do...@gmail.com




-- 
Sasha Dolgy
sasha.do...@gmail.com

cascading failures due to memory

2011-05-31 Thread Sasha Dolgy

hi everyone,

the current nodes i have deployed (4) have all been working fine, with
not a lot of data ... more reads than writes at the moment.  as i had
monitoring disabled, when one node's OS killed the cassandra process
due to out of memory problems ... that was fine.  24 hours later,
another node, 24 hours later, another node ...until finally, all 4
nodes no longer had cassandra running.

When all nodes are started fresh, CPU utilization is at about 21% on
each box.  after 24 hours, this goes up to 32% and then 51% 24 hours
later.

originally I had thought that this may be a result of 'nodetool
repair' not being run consistently ... after adding a cronjob to run
every 24 hours (staggered between nodes) the problem of the increasing
memory utilization does not resolve.

i've read the operations page and also the
http://wiki.apache.org/cassandra/MemtableThresholds page.  i am
running defaults and 0.7.6-02 ...

what are the best places to start in terms of finding why this is
happening?  CF design / usage?  'nodetool cfstats' gives me some good
info ... and i've already implemented some changes to one CF based on
how it had ballooned (too many rows versus not enough columns)

suggestions appreciated

-- 
Sasha Dolgy
sasha.do...@gmail.com

Re: starting with PHPcassa

2011-05-31 Thread Sasha Dolgy

http://thobbs.github.com/phpcassa/installation.html

If you already have the log files, pycassa (python) may be better
suited and quicker 

http://pycassa.github.com/pycassa/

On Tue, May 31, 2011 at 4:03 PM, Amrita Jayakumar
amritajayakuma...@gmail.com wrote:
 I have log files of the format id key value. I want to load these
 files into cassandra using PHPcassa.
 I have installed Cassandra 7. Can anyone please guide me with the exact
 procedures as in how to install PHPcassa and take things forward?

1 2 3 >

1 - 100 of 203 matches

Mail list logo