Re: Missing non composite column

2012-10-16 Thread Vivek Mishra
column name will be "2012-07-24:2:alliance_involvement" or
"alliance_involvement"?

-Vivek

On Tue, Oct 16, 2012 at 10:25 PM, Sylvain Lebresne wrote:

> On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra 
> wrote:
> > Thanks Sylvain. I missed it. If i try to access these via thrift API,
> what
> > will be the column names?
>
> I'm not sure I understand the question. The cli output is pretty much
> what you get via the thrift API.
>
> --
> Sylvain
>


Re: Java 7 support?

2012-10-16 Thread Rob Coli
On Tue, Oct 16, 2012 at 4:45 PM, Edward Sargisson
 wrote:
> The Datastax documentation says that Java 7 is not recommended[1]. However,
> Java 6 is due to EOL in Feb 2013 so what is the reasoning behind that
> comment?

I've asked this approximate question here a few times, with no
official response. The reason I ask is that in addition to Java 7 not
being recommended, in Java 7 OpenJDK becomes the "reference" JVM, and
OpenJDK is also not recommended.

>From other channels, I have conjectured that the current advice on
Java 7 is "it 'works' but is not as extensively tested (and definitely
not as commonly deployed) as Java 6".

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Java 7 support?

2012-10-16 Thread Edward Sargisson

Hi all,
The Datastax documentation says that Java 7 is not recommended[1]. 
However, Java 6 is due to EOL in Feb 2013 so what is the reasoning 
behind that comment?


Is it something we should be still concerned about?

Cheers,
Edward

Links:
[1] http://www.datastax.com/docs/1.1/install/install_deb, step 1
--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net 


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




Re: Cassandra nodes loaded unequally

2012-10-16 Thread Ben Kaehne
Nothing unusual.

All servers are exactly the same. Nothing unusual in the log files. Is
there any level of logging that I should be turning on?

Regards,

On Wed, Oct 17, 2012 at 9:51 AM, Andrey Ilinykh  wrote:

> With your environment (3 nodes, RF=3) it is very difficult to get
> uneven load. Each node receives the same number of read/write
> requests. Probably something is wrong on low level, OS or VM. Do you
> see anything unusual in log files?
>
> Andrey
>
> On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne 
> wrote:
> > Not connecting to the same node every time. Using Hector to ensure an
> even
> > distribution of connections accross the cluster.
> >
> > Regards,
> >
> > On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss 
> wrote:
> >>
> >> are you connecting to the same node every time?  if so, spread out
> >> your connections across the ring
> >>
> >> On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov 
> >> wrote:
> >> > Hi Ben,
> >> >
> >> > I suggest you to compare amount of queries for each node. May be the
> >> > problem
> >> > is on the client side.
> >> > Yoy can do that using JMX:
> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace= >> > KEYSPACE>,columnfamily=","ReadCount"
> >> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace= >> > KEYSPACE>,columnfamily=","WriteCount"
> >> >
> >> > Also I suggest to check output of "nodetool compactionstats".
> >> >
> >> > --
> >> > Alexey
> >> >
> >> >
> >
> >
> >
> >
> > --
> > -Ben
>



-- 
-Ben


Re: Cassandra nodes loaded unequally

2012-10-16 Thread Andrey Ilinykh
With your environment (3 nodes, RF=3) it is very difficult to get
uneven load. Each node receives the same number of read/write
requests. Probably something is wrong on low level, OS or VM. Do you
see anything unusual in log files?

Andrey

On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne  wrote:
> Not connecting to the same node every time. Using Hector to ensure an even
> distribution of connections accross the cluster.
>
> Regards,
>
> On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss  wrote:
>>
>> are you connecting to the same node every time?  if so, spread out
>> your connections across the ring
>>
>> On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov 
>> wrote:
>> > Hi Ben,
>> >
>> > I suggest you to compare amount of queries for each node. May be the
>> > problem
>> > is on the client side.
>> > Yoy can do that using JMX:
>> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=> > KEYSPACE>,columnfamily=","ReadCount"
>> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace=> > KEYSPACE>,columnfamily=","WriteCount"
>> >
>> > Also I suggest to check output of "nodetool compactionstats".
>> >
>> > --
>> > Alexey
>> >
>> >
>
>
>
>
> --
> -Ben


Re: Cassandra nodes loaded unequally

2012-10-16 Thread Ben Kaehne
Not connecting to the same node every time. Using Hector to ensure an even
distribution of connections accross the cluster.

Regards,

On Sat, Oct 13, 2012 at 4:15 AM, B. Todd Burruss  wrote:

> are you connecting to the same node every time?  if so, spread out
> your connections across the ring
>
> On Fri, Oct 12, 2012 at 1:22 AM, Alexey Zotov 
> wrote:
> > Hi Ben,
> >
> > I suggest you to compare amount of queries for each node. May be the
> problem
> > is on the client side.
> > Yoy can do that using JMX:
> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace= > KEYSPACE>,columnfamily=","ReadCount"
> > "org.apache.cassandra.db:type=ColumnFamilies,keyspace= > KEYSPACE>,columnfamily=","WriteCount"
> >
> > Also I suggest to check output of "nodetool compactionstats".
> >
> > --
> > Alexey
> >
> >
>



-- 
-Ben


Re: Cassandra nodes loaded unequally

2012-10-16 Thread Ben Kaehne
I checked this and all the numbers seemed to be about the same. Although
the files would compact from time to time. There was nothing to suggest why
1 node, ongoingly had less load then the others.

Regards,

On Fri, Oct 12, 2012 at 7:22 PM, Alexey Zotov wrote:

> Hi Ben,
>
> I suggest you to compare amount of queries for each node. May be the
> problem is on the client side.
> Yoy can do that using JMX:
> "org.apache.cassandra.db:type=ColumnFamilies,keyspace= KEYSPACE>,columnfamily=","ReadCount"
> "org.apache.cassandra.db:type=ColumnFamilies,keyspace= KEYSPACE>,columnfamily=","WriteCount"
>
> Also I suggest to check output of "nodetool compactionstats".
>
> --
> Alexey
>
>
>


-- 
-Ben


Re: Is Anti Entropy repair idempotent with respect to transferred data?

2012-10-16 Thread Omid Aladini
Thanks Andrey. Also found this ticket regarding this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2698

On Tue, Oct 16, 2012 at 8:00 PM, Andrey Ilinykh  wrote:
>> In my experience running repair on some counter data, the size of
>> streamed data is much bigger than the cluster could possibly have lost
>> messages or would be due to snapshotting at different times.
>>
>> I know the data will eventually be in sync on every repair, but I'm
>> more interested in whether Cassandra transfers excess data and how to
>> minimize this.
>>
>> Does any body have insights on this?
>>
> The problem is in granularity of Merkle tree. Cassandra sends regions
> which have different hash values. It could be much bigger then a
> single row.
>
> Andrey


ApacheCon EU -- Hackathon anyone?

2012-10-16 Thread Eric Evans
Hi all,

ApacheCon EU[1] is in 3 weeks, and Monday the 5th[2] is reserved for
hackathons[3].  Who else is planning to be in Sinsheim on the 5th,
and, is there any interest in a Cassandra hackathon?

One idea for a hackathon would be to focus on testing (writing,
running, manual), and bug squashing.  This is broad enough that people
of all skill levels should be able to contribute.

So, is anyone interested?


[1]: 5–8 November, Sinsheim, Germany
[2]: Guy Fawkes Day!
[3]: http://wiki.apache.org/apachecon/HackathonEU12

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
I am not sure.  If I were to implement it myself though, I would have
probably…

postfixed the rows with 1,2,3,4,… and then stored the lastValue
in the first row so then my program knows all the rows.

Ie. Not sure an index is really needed in that case.

Dean

On 10/16/12 11:45 AM, "Michael Kjellman"  wrote:

>Ah, so they just wrote chunking into Astyanax? Do they create an index
>somewhere so they know how to reassemble the file on the way out?
>
>On 10/16/12 10:36 AM, "Hiller, Dean"  wrote:
>
>>Yes, astyanax stores the file in many rows so it reads from many disks
>>giving you a performance advantage vs. storing each file in one row….well
>>at least from my understanding so read performance "should" be really
>>really good in that case.
>>
>>Dean
>>
>>From: Michael Kjellman
>>mailto:mkjell...@barracuda.com>>
>>Reply-To: "user@cassandra.apache.org"
>>mailto:user@cassandra.apache.org>>
>>Date: Tuesday, October 16, 2012 10:07 AM
>>To: "user@cassandra.apache.org"
>>mailto:user@cassandra.apache.org>>
>>Subject: Re: Using Cassandra to store binary files?
>>
>>When we started with Cassandra almost 2 years ago in production
>>originally it was for the sole purpose storing blobs in a redundant way.
>>I ignored the warnings as my own tests showed it would be okay (and two
>>years later it is "ok"). If you plan on using Cassandra later (as we now
>>as as features such as secondary indexes and cql have matured I'm now
>>stuck with a large amount of data in Cassandra that maybe could be in a
>>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
>>Compactions of these column families take forever.
>>
>>Also, by default there is a 16MB limit. Yes, this is adjustable but
>>currently Thrift does not stream data. I didn't know that Netflix had
>>worked around this (referring to Dean's reply) ― I'll have to look
>>through the source to see how they are overcoming the limitations of the
>>protocol. Last I read there were no plans to make Thrift stream. Looks
>>like there is a bug at
>>https://issues.apache.org/jira/browse/CASSANDRA-265
>>
>>You might want to take a look at the following page:
>>http://wiki.apache.org/cassandra/CassandraLimitations
>>
>>I wanted an easy key value store when I originally picked Cassandra. As
>>our project needs changed and Cassandra has now begun playing a more
>>critical role as it has matured (since the 0.7 days), in retrospect HDFS
>>might have been a better option long term as I really will never need
>>indexing etc on my binary blobs and the convenience of simply being able
>>to grab/reassemble a file by grabbing it's key was convenient at the time
>>but maybe not the most forward thinking. Hope that helps a bit.
>>
>>Also, your read performance won't be amazing by any means with blobs. Not
>>sure if your priority is reads or writes. In our case it was writes so it
>>wasn't a large loss.
>>
>>Best,
>>michael
>>
>>
>>From: Vasileios Vlachos
>>mailto:vasileiosvlac...@gmail.com>>
>>Reply-To: "user@cassandra.apache.org"
>>mailto:user@cassandra.apache.org>>
>>Date: Tuesday, October 16, 2012 8:49 AM
>>To: "user@cassandra.apache.org"
>>mailto:user@cassandra.apache.org>>
>>Subject: Using Cassandra to store binary files?
>>
>>Hello All,
>>
>>We need to store about 40G of binary files in a redundant way and since
>>we are already using Cassandra for other applications we were thinking
>>that we could just solve that problem using the same Cassandra cluster.
>>Each individual File will be approximately 1MB.
>>
>>We are thinking that the data structure should be very simple for this
>>case, using one CF with just one column which will contain the actual
>>files. The row key should then uniquely identify each file. Speed is not
>>an issue when we retrieving the files. Impacting other applications using
>>Cassandra is more important for us. In order to prevent performance
>>issues with other applications using our Cassandra cluster at the moment,
>>we think we should disable key_cache and row_cache for this column
>>family.
>>
>>Anyone tried this before or anyone thinks this is going to be a bad idea?
>>Do you think our current plan is sensible? Any input would be much
>>appreciated. Thank you in advance.
>>
>>Regards,
>>
>>Vasilis
>>
>>--
>>'Like' us on Facebook for exclusive content and other resources on all
>>Barracuda Networks solutions.
>>Visit http://barracudanetworks.com/facebook
>>  ­­
>
>
>'Like' us on Facebook for exclusive content and other resources on all
>Barracuda Networks solutions.
>Visit http://barracudanetworks.com/facebook
>
>



Re: Is Anti Entropy repair idempotent with respect to transferred data?

2012-10-16 Thread Andrey Ilinykh
> In my experience running repair on some counter data, the size of
> streamed data is much bigger than the cluster could possibly have lost
> messages or would be due to snapshotting at different times.
>
> I know the data will eventually be in sync on every repair, but I'm
> more interested in whether Cassandra transfers excess data and how to
> minimize this.
>
> Does any body have insights on this?
>
The problem is in granularity of Merkle tree. Cassandra sends regions
which have different hash values. It could be much bigger then a
single row.

Andrey


Re: Using Cassandra to store binary files?

2012-10-16 Thread Michael Kjellman
Ah, so they just wrote chunking into Astyanax? Do they create an index
somewhere so they know how to reassemble the file on the way out?

On 10/16/12 10:36 AM, "Hiller, Dean"  wrote:

>Yes, astyanax stores the file in many rows so it reads from many disks
>giving you a performance advantage vs. storing each file in one row….well
>at least from my understanding so read performance "should" be really
>really good in that case.
>
>Dean
>
>From: Michael Kjellman
>mailto:mkjell...@barracuda.com>>
>Reply-To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Date: Tuesday, October 16, 2012 10:07 AM
>To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Subject: Re: Using Cassandra to store binary files?
>
>When we started with Cassandra almost 2 years ago in production
>originally it was for the sole purpose storing blobs in a redundant way.
>I ignored the warnings as my own tests showed it would be okay (and two
>years later it is "ok"). If you plan on using Cassandra later (as we now
>as as features such as secondary indexes and cql have matured I'm now
>stuck with a large amount of data in Cassandra that maybe could be in a
>better place.) Does it work? Yes. Would I do it again? Not 100% sure.
>Compactions of these column families take forever.
>
>Also, by default there is a 16MB limit. Yes, this is adjustable but
>currently Thrift does not stream data. I didn't know that Netflix had
>worked around this (referring to Dean's reply) ― I'll have to look
>through the source to see how they are overcoming the limitations of the
>protocol. Last I read there were no plans to make Thrift stream. Looks
>like there is a bug at https://issues.apache.org/jira/browse/CASSANDRA-265
>
>You might want to take a look at the following page:
>http://wiki.apache.org/cassandra/CassandraLimitations
>
>I wanted an easy key value store when I originally picked Cassandra. As
>our project needs changed and Cassandra has now begun playing a more
>critical role as it has matured (since the 0.7 days), in retrospect HDFS
>might have been a better option long term as I really will never need
>indexing etc on my binary blobs and the convenience of simply being able
>to grab/reassemble a file by grabbing it's key was convenient at the time
>but maybe not the most forward thinking. Hope that helps a bit.
>
>Also, your read performance won't be amazing by any means with blobs. Not
>sure if your priority is reads or writes. In our case it was writes so it
>wasn't a large loss.
>
>Best,
>michael
>
>
>From: Vasileios Vlachos
>mailto:vasileiosvlac...@gmail.com>>
>Reply-To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Date: Tuesday, October 16, 2012 8:49 AM
>To: "user@cassandra.apache.org"
>mailto:user@cassandra.apache.org>>
>Subject: Using Cassandra to store binary files?
>
>Hello All,
>
>We need to store about 40G of binary files in a redundant way and since
>we are already using Cassandra for other applications we were thinking
>that we could just solve that problem using the same Cassandra cluster.
>Each individual File will be approximately 1MB.
>
>We are thinking that the data structure should be very simple for this
>case, using one CF with just one column which will contain the actual
>files. The row key should then uniquely identify each file. Speed is not
>an issue when we retrieving the files. Impacting other applications using
>Cassandra is more important for us. In order to prevent performance
>issues with other applications using our Cassandra cluster at the moment,
>we think we should disable key_cache and row_cache for this column family.
>
>Anyone tried this before or anyone thinks this is going to be a bad idea?
>Do you think our current plan is sensible? Any input would be much
>appreciated. Thank you in advance.
>
>Regards,
>
>Vasilis
>
>--
>'Like' us on Facebook for exclusive content and other resources on all
>Barracuda Networks solutions.
>Visit http://barracudanetworks.com/facebook
>  ­­


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
Yes, astyanax stores the file in many rows so it reads from many disks giving 
you a performance advantage vs. storing each file in one row….well at least 
from my understanding so read performance "should" be really really good in 
that case.

Dean

From: Michael Kjellman mailto:mkjell...@barracuda.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 10:07 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Using Cassandra to store binary files?

When we started with Cassandra almost 2 years ago in production originally it 
was for the sole purpose storing blobs in a redundant way. I ignored the 
warnings as my own tests showed it would be okay (and two years later it is 
"ok"). If you plan on using Cassandra later (as we now as as features such as 
secondary indexes and cql have matured I'm now stuck with a large amount of 
data in Cassandra that maybe could be in a better place.) Does it work? Yes. 
Would I do it again? Not 100% sure. Compactions of these column families take 
forever.

Also, by default there is a 16MB limit. Yes, this is adjustable but currently 
Thrift does not stream data. I didn't know that Netflix had worked around this 
(referring to Dean's reply) — I'll have to look through the source to see how 
they are overcoming the limitations of the protocol. Last I read there were no 
plans to make Thrift stream. Looks like there is a bug at 
https://issues.apache.org/jira/browse/CASSANDRA-265

You might want to take a look at the following page: 
http://wiki.apache.org/cassandra/CassandraLimitations

I wanted an easy key value store when I originally picked Cassandra. As our 
project needs changed and Cassandra has now begun playing a more critical role 
as it has matured (since the 0.7 days), in retrospect HDFS might have been a 
better option long term as I really will never need indexing etc on my binary 
blobs and the convenience of simply being able to grab/reassemble a file by 
grabbing it's key was convenient at the time but maybe not the most forward 
thinking. Hope that helps a bit.

Also, your read performance won't be amazing by any means with blobs. Not sure 
if your priority is reads or writes. In our case it was writes so it wasn't a 
large loss.

Best,
michael


From: Vasileios Vlachos 
mailto:vasileiosvlac...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 8:49 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Using Cassandra to store binary files?

Hello All,

We need to store about 40G of binary files in a redundant way and since we are 
already using Cassandra for other applications we were thinking that we could 
just solve that problem using the same Cassandra cluster. Each individual File 
will be approximately 1MB.

We are thinking that the data structure should be very simple for this case, 
using one CF with just one column which will contain the actual files. The row 
key should then uniquely identify each file. Speed is not an issue when we 
retrieving the files. Impacting other applications using Cassandra is more 
important for us. In order to prevent performance issues with other 
applications using our Cassandra cluster at the moment, we think we should 
disable key_cache and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea? Do 
you think our current plan is sensible? Any input would be much appreciated. 
Thank you in advance.

Regards,

Vasilis

--
'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook
  ­­


Re: what happens while node is bootstrapping?

2012-10-16 Thread Michael Kjellman
Correct.

Also, there is a new feature in 1.1+ that lets you play with live traffic
on new nodes before they actually join the ring

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sa
mpling

On 10/16/12 9:42 AM, "Andrey Ilinykh"  wrote:

>>
>>
>> No.  The bootstrapping node will writes for its new range while
>> bootstrapping as consistency optimization (more or less), but does not
>> contribute to the replication factor or consistency level; all of the
>> original replicas for that range still receive writes, serve reads, and
>>are
>> the nodes that count for consistency level.  Basically, the
>>bootstrapping
>> node has no effect on the existing replicas in terms of RF or CL until
>>the
>> bootstrap completes.
>>
>I see. So, if I add new nodes to increase number of writes my cluster
>can handle I will not see any improvement until bootstrap process
>finished, which may take hours. Is it correct?
>
>Thank you,
>  Andrey


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: Missing non composite column

2012-10-16 Thread Sylvain Lebresne
On Tue, Oct 16, 2012 at 12:31 PM, Vivek Mishra  wrote:
> Thanks Sylvain. I missed it. If i try to access these via thrift API, what
> will be the column names?

I'm not sure I understand the question. The cli output is pretty much
what you get via the thrift API.

--
Sylvain


Re: what happens while node is bootstrapping?

2012-10-16 Thread Andrey Ilinykh
>
>
> No.  The bootstrapping node will writes for its new range while
> bootstrapping as consistency optimization (more or less), but does not
> contribute to the replication factor or consistency level; all of the
> original replicas for that range still receive writes, serve reads, and are
> the nodes that count for consistency level.  Basically, the bootstrapping
> node has no effect on the existing replicas in terms of RF or CL until the
> bootstrap completes.
>
I see. So, if I add new nodes to increase number of writes my cluster
can handle I will not see any improvement until bootstrap process
finished, which may take hours. Is it correct?

Thank you,
  Andrey


Re: Using Cassandra to store binary files?

2012-10-16 Thread Michael Kjellman
When we started with Cassandra almost 2 years ago in production originally it 
was for the sole purpose storing blobs in a redundant way. I ignored the 
warnings as my own tests showed it would be okay (and two years later it is 
"ok"). If you plan on using Cassandra later (as we now as as features such as 
secondary indexes and cql have matured I'm now stuck with a large amount of 
data in Cassandra that maybe could be in a better place.) Does it work? Yes. 
Would I do it again? Not 100% sure. Compactions of these column families take 
forever.

Also, by default there is a 16MB limit. Yes, this is adjustable but currently 
Thrift does not stream data. I didn't know that Netflix had worked around this 
(referring to Dean's reply) — I'll have to look through the source to see how 
they are overcoming the limitations of the protocol. Last I read there were no 
plans to make Thrift stream. Looks like there is a bug at 
https://issues.apache.org/jira/browse/CASSANDRA-265

You might want to take a look at the following page: 
http://wiki.apache.org/cassandra/CassandraLimitations

I wanted an easy key value store when I originally picked Cassandra. As our 
project needs changed and Cassandra has now begun playing a more critical role 
as it has matured (since the 0.7 days), in retrospect HDFS might have been a 
better option long term as I really will never need indexing etc on my binary 
blobs and the convenience of simply being able to grab/reassemble a file by 
grabbing it's key was convenient at the time but maybe not the most forward 
thinking. Hope that helps a bit.

Also, your read performance won't be amazing by any means with blobs. Not sure 
if your priority is reads or writes. In our case it was writes so it wasn't a 
large loss.

Best,
michael


From: Vasileios Vlachos 
mailto:vasileiosvlac...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 8:49 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Using Cassandra to store binary files?

Hello All,

We need to store about 40G of binary files in a redundant way and since we are 
already using Cassandra for other applications we were thinking that we could 
just solve that problem using the same Cassandra cluster. Each individual File 
will be approximately 1MB.

We are thinking that the data structure should be very simple for this case, 
using one CF with just one column which will contain the actual files. The row 
key should then uniquely identify each file. Speed is not an issue when we 
retrieving the files. Impacting other applications using Cassandra is more 
important for us. In order to prevent performance issues with other 
applications using our Cassandra cluster at the moment, we think we should 
disable key_cache and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea? Do 
you think our current plan is sensible? Any input would be much appreciated. 
Thank you in advance.

Regards,

Vasilis

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: Using Cassandra to store binary files?

2012-10-16 Thread Hiller, Dean
Astyanax provides a streaming file feature and was written by netflix who is 
storing probably a huge amount of files with that feature.  I was going to use 
that feature for one product but I never got around to creating the 
product…..but I still use astyanax under the hood of PlayOrm  (we kind of use a 
combination so we can put some relational data in cassandra with PlayOrm and 
then do our own thing as well with noSQL with the raw astyanx apis as 
well)…..it gets rid of us needing the RDBMS at all which is nice.

Later,
Dean

From: Vasileios Vlachos 
mailto:vasileiosvlac...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 9:49 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Using Cassandra to store binary files?

ed to store about 40G of binary files in a redundant way and since we are 
already using Cassandra for other applications we were thinking that we could 
just solve that problem using the same Cassandra cluster. Each individual File 
will be approximately 1MB.

We are thinking that the data structure should be very simple for this case, 
using one CF with just one column which will contain the actual files. The row 
key should then uniquely identify each file. Speed is not an issue when we 
retrieving the files. Impacting other applications using Cassandra is more 
important for us. In order to prevent performance issues with other 
applications using our Cassandra cluster at the moment, we think we should 
disable key_cache and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea? Do 
you think our current plan is sensible? Any input would be much appreciated. 
Thank you in adv


Using Cassandra to store binary files?

2012-10-16 Thread Vasileios Vlachos
Hello All,

We need to store about 40G of binary files in a redundant way and since we
are already using Cassandra for other applications we were thinking that we
could just solve that problem using the same Cassandra cluster. Each
individual File will be approximately 1MB.

We are thinking that the data structure should be very simple for this
case, using one CF with just one column which will contain the actual
files. The row key should then uniquely identify each file. Speed is not an
issue when we retrieving the files. Impacting other applications using
Cassandra is more important for us. In order to prevent performance issues
with other applications using our Cassandra cluster at the moment, we think
we should disable key_cache and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea?
Do you think our current plan is sensible? Any input would be much
appreciated. Thank you in advance.

Regards,

Vasilis


Re: what happens while node is bootstrapping?

2012-10-16 Thread Tyler Hobbs
On Mon, Oct 15, 2012 at 3:50 PM, Andrey Ilinykh  wrote:

> Does it mean that during bootstrapping process only replicas serve
> read requests for new node range? In other words, replication factor
> is RF-1?
>

No.  The bootstrapping node will writes for its new range while
bootstrapping as consistency optimization (more or less), but does not
contribute to the replication factor or consistency level; all of the
original replicas for that range still receive writes, serve reads, and are
the nodes that count for consistency level.  Basically, the bootstrapping
node has no effect on the existing replicas in terms of RF or CL until the
bootstrap completes.

-- 
Tyler Hobbs
DataStax 


Re: Why my Cassandra is compacting like mad

2012-10-16 Thread Manu Zhang
only 1/8 of heap is used. Only system-schema_columns cf is compacted. The
weird thing is that it never stops itself.

On Tue, Oct 16, 2012 at 5:14 PM, Alain RODRIGUEZ  wrote:

> "I don't what hardware information may help"
>
> A low memory server will produce an intensive use of CPU and disk IO. It
> will write a lot of small SSTables which will need to be compacted very
> often. That was one of my thought when I asked.
>
> Alain
>
>
> 2012/10/15 Manu Zhang 
>
>> I use default option for compaction:
>>in_memory_compaction_limit_in_mb: 64
>>multithreaded_compaction: false
>>compaction_throughput_mb_per_sec: 16
>>compaction_preheat_key_cache: true
>>
>>
>> I aborted Cassandra after a timeout query;
>> my Cassandra is 1.2beta-2
>> I don't what hardware information may help
>>
>>
>> On Mon, Oct 15, 2012 at 11:19 PM, Michael Kjellman <
>> mkjell...@barracuda.com> wrote:
>>
>>> And to clarify my reply, this was a loop in compactions on
>>> system-schema_columns specifically.
>>>
>>> From: Michael Kjellman 
>>> Reply-To: "user@cassandra.apache.org" 
>>> Date: Monday, October 15, 2012 8:11 AM
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: Why my Cassandra is compacting like mad
>>>
>>> I had a similar bug with 1.1.5 but I couldn't reproduce it so I didn't
>>> file a bug. I did a rolling restart of my nodes and things went back to
>>> normal.
>>>
>>> From: Manu Zhang 
>>> Reply-To: "user@cassandra.apache.org" 
>>> Date: Monday, October 15, 2012 8:02 AM
>>> To: "user@cassandra.apache.org" 
>>> Subject: Why my Cassandra is compacting like mad
>>>
>>> My Cassandra is compacting like mad (look at the following messages)
>>> after I restart it after a timeout query. It won't stop until I kill the
>>> process.
>>>
>>>
>>> INFO 22:57:41,839 Compacting
>>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1258-Data.db')]
>>>  INFO 22:57:42,077 Compacted to
>>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1259-Data.db,].
>>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.017539MB/s.
>>>  Time: 238ms.
>>>  INFO 22:57:42,079 Compacting
>>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1259-Data.db')]
>>>  INFO 22:57:42,309 Compacted to
>>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1260-Data.db,].
>>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.018149MB/s.
>>>  Time: 230ms.
>>>  INFO 22:57:42,311 Compacting
>>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1260-Data.db')]
>>>  INFO 22:57:42,574 Compacted to
>>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1261-Data.db,].
>>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.015872MB/s.
>>>  Time: 263ms.
>>>  INFO 22:57:42,576 Compacting
>>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1261-Data.db')]
>>>  INFO 22:57:42,814 Compacted to
>>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1262-Data.db,].
>>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.017539MB/s.
>>>  Time: 238ms.
>>>  INFO 22:57:42,816 Compacting
>>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1262-Data.db')]
>>>  INFO 22:57:43,070 Compacted to
>>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1263-Data.db,].
>>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.016434MB/s.
>>>  Time: 254ms.
>>>
>>> --
>>> 'Like' us on Facebook for exclusive content and other resources on all
>>> Barracuda Networks solutions.
>>> Visit http://barracudanetworks.com/facebook
>>>   ­­
>>>
>>> --
>>> 'Like' us on Facebook for exclusive content and other resources on all
>>> Barracuda Networks solutions.
>>> Visit http://barracudanetworks.com/facebook
>>>   ­­
>>>
>>
>>
>


Re: Missing non composite column

2012-10-16 Thread Vivek Mishra
Thanks Sylvain. I missed it. If i try to access these via thrift API, what
will be the column names?

-Vivek

If i try to access this row via thrift API, what would be the column names
returned?
On Tue, Oct 16, 2012 at 3:52 PM, Sylvain Lebresne wrote:

> On Tue, Oct 16, 2012 at 11:57 AM, Vivek Mishra 
> wrote:
> > ---
> > RowKey: Jayne Cobb
> > => (column=2012-07-24:2:alliance_involvement, value=false,
> > timestamp=135038100502)
> > => (column=2012-07-24:2:energy_used, value=4.6,
> timestamp=1350381005020001)
> >
> >
> > Not sure, why is it not getting non-composite column(e.g.
> ships_destroyed in
> > this case).
>
> You have all the data in there. The ships_destroyed is the second
> component of each of your column names. That's just the way CQL3
> encodes things underneath.
>
> --
> Sylvain
>


Re: Missing non composite column

2012-10-16 Thread Sylvain Lebresne
On Tue, Oct 16, 2012 at 11:57 AM, Vivek Mishra  wrote:
> ---
> RowKey: Jayne Cobb
> => (column=2012-07-24:2:alliance_involvement, value=false,
> timestamp=135038100502)
> => (column=2012-07-24:2:energy_used, value=4.6, timestamp=1350381005020001)
>
>
> Not sure, why is it not getting non-composite column(e.g. ships_destroyed in
> this case).

You have all the data in there. The ships_destroyed is the second
component of each of your column names. That's just the way CQL3
encodes things underneath.

--
Sylvain


Performance problems & Unbalanced disk io on my cassandra cluster

2012-10-16 Thread Tamar Fraenkel
Hi!

*Problem*
I have one node which seems to be in a bad situation, with lots of dropped
reads for a long time.

*My cluster*
I have 3 node cluster on Amazon m1.large DataStax AMI with cassandra 1.08.
RF=3, RCL=WCL=QUORUM
I use Hector which should be doing round robin of the requests between the
node.
Cluster is not under much load:

*Info*
Using OpsCenter I can see that:

Number of read \ write request is distributed evenly between nodes.
Disk Latency of both read and write and Disk Throughput are much worse
on one of the nodes.

*This is also visible in iostats*
"Good node"
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb  0.58 0.03   42.811.31  2710.90   104.62
63.82 0.025.96   0.48   2.14
xvdc  0.57 0.00   42.851.30  2712.72   104.83
63.81 0.204.60   0.48   2.12

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb  5.60 0.10  456.500.40 32729.6028.00
71.7019.65   43.00   0.36  16.50
xvdc  4.10 0.00  460.000.80 33342.4060.80
72.4917.55   38.09   0.35  16.00

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb  4.70 0.10  608.201.10 39217.6077.70
64.4926.04   42.73   0.39  23.50
xvdc  5.70 0.00  606.800.60 38645.6024.00
63.6622.89   37.69   0.38  23.10


"Bad Node"
Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb  0.67 0.03   51.721.02  3330.2180.62
64.67 0.061.19   0.60   3.16
xvdc  0.67 0.00   51.661.02  3329.2380.85
64.73 0.152.84   0.60   3.17

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb 16.50 0.10 1484.700.80 88937.6052.90
59.91   115.07   77.11   0.58  86.00
xvdc 16.20 0.00 1492.800.60 89701.6043.20
60.09   102.80   69.06   0.58  86.10

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdb 14.00 0.10 1260.000.70 81632.0033.70
64.7876.96   61.56   0.54  68.10
xvdc 15.50 0.10 1257.600.90 80932.0063.20
64.3688.94   70.90   0.53  67.10


*Question*
This does not make sense to me, why would one node do much more read \
writes, reading more sectors with higher utilization and wait time.
Can it be Amazon issue, I don't think so.
This of course may be the result of flushing and compactions, but it
persists for a long time, even when no compaction is happening.
What would you do to further explore or fix the problem?


Thank you very much!!
*
Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Why my Cassandra is compacting like mad

2012-10-16 Thread Alain RODRIGUEZ
"I don't what hardware information may help"

A low memory server will produce an intensive use of CPU and disk IO. It
will write a lot of small SSTables which will need to be compacted very
often. That was one of my thought when I asked.

Alain

2012/10/15 Manu Zhang 

> I use default option for compaction:
>in_memory_compaction_limit_in_mb: 64
>multithreaded_compaction: false
>compaction_throughput_mb_per_sec: 16
>compaction_preheat_key_cache: true
>
>
> I aborted Cassandra after a timeout query;
> my Cassandra is 1.2beta-2
> I don't what hardware information may help
>
>
> On Mon, Oct 15, 2012 at 11:19 PM, Michael Kjellman <
> mkjell...@barracuda.com> wrote:
>
>> And to clarify my reply, this was a loop in compactions on
>> system-schema_columns specifically.
>>
>> From: Michael Kjellman 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 15, 2012 8:11 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Why my Cassandra is compacting like mad
>>
>> I had a similar bug with 1.1.5 but I couldn't reproduce it so I didn't
>> file a bug. I did a rolling restart of my nodes and things went back to
>> normal.
>>
>> From: Manu Zhang 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, October 15, 2012 8:02 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Why my Cassandra is compacting like mad
>>
>> My Cassandra is compacting like mad (look at the following messages)
>> after I restart it after a timeout query. It won't stop until I kill the
>> process.
>>
>>
>> INFO 22:57:41,839 Compacting
>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1258-Data.db')]
>>  INFO 22:57:42,077 Compacted to
>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1259-Data.db,].
>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.017539MB/s.
>>  Time: 238ms.
>>  INFO 22:57:42,079 Compacting
>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1259-Data.db')]
>>  INFO 22:57:42,309 Compacted to
>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1260-Data.db,].
>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.018149MB/s.
>>  Time: 230ms.
>>  INFO 22:57:42,311 Compacting
>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1260-Data.db')]
>>  INFO 22:57:42,574 Compacted to
>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1261-Data.db,].
>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.015872MB/s.
>>  Time: 263ms.
>>  INFO 22:57:42,576 Compacting
>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1261-Data.db')]
>>  INFO 22:57:42,814 Compacted to
>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1262-Data.db,].
>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.017539MB/s.
>>  Time: 238ms.
>>  INFO 22:57:42,816 Compacting
>> [SSTableReader(path='/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1262-Data.db')]
>>  INFO 22:57:43,070 Compacted to
>> [/home/manuzhang/cassandra/data/system/schema_columns/system-schema_columns-ia-1263-Data.db,].
>>  4,377 to 4,377 (~100% of original) bytes for 3 keys at 0.016434MB/s.
>>  Time: 254ms.
>>
>> --
>> 'Like' us on Facebook for exclusive content and other resources on all
>> Barracuda Networks solutions.
>> Visit http://barracudanetworks.com/facebook
>>   ­­
>>
>> --
>> 'Like' us on Facebook for exclusive content and other resources on all
>> Barracuda Networks solutions.
>> Visit http://barracudanetworks.com/facebook
>>   ­­
>>
>
>