Re: [problem with OOM in nodes]

2012-10-11 Thread Hiller, Dean
Splitting one report to multiple rows is uncomfortably WHY? Reading from N disks is way faster than reading from 1 disk. I think in terms of PlayOrm and then explain the model you can use so I think in objects first Report { String uniqueId String reportName; //may be indexable and query

Re: [problem with OOM in nodes]

2012-09-25 Thread Denis Gabaydulin
Thanks a lot for helping. We came to the same decision clustering one report to multiple cassandra rows (sorted buckets of report rows) and manage clusters on client side. On Tue, Sep 25, 2012 at 5:28 AM, aaron morton aa...@thelastpickle.com wrote: What exactly is the problem with big rows?

Re: [problem with OOM in nodes]

2012-09-24 Thread aaron morton
What exactly is the problem with big rows? During compaction the row will be passed through a slower two pass processing, this add's to IO pressure. Counting big rows requires that the entire row be read. Repairing big rows requires that the entire row be repaired. I generally avoid rows

Re: [problem with OOM in nodes]

2012-09-23 Thread aaron morton
/var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 Is it bad signal? Sorry, I do not know what this is outputting. As I can see in cfstats, compacted row maximum

Re: [problem with OOM in nodes]

2012-09-23 Thread Denis Gabaydulin
On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com wrote: /var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 Is it bad signal? Sorry, I do not

Re: [problem with OOM in nodes]

2012-09-21 Thread Denis Gabaydulin
Reports - is a SuperColumnFamily Each report has unique identifier (report_id). This is a key of SuperColumnFamily. And a report saved in separate row. A report is consisted of report rows (may vary between 1 and 50, but most are small). Each report row is saved in separate super column.

Re: [problem with OOM in nodes]

2012-09-21 Thread Denis Gabaydulin
Found one more intersting fact. As I can see in cfstats, compacted row maximum size: 386857368 ! On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin gaba...@gmail.com wrote: Reports - is a SuperColumnFamily Each report has unique identifier (report_id). This is a key of SuperColumnFamily. And

Re: [problem with OOM in nodes]

2012-09-21 Thread Denis Gabaydulin
And some stuff from log: /var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 3821.55MB 3337.85MB 1221.64MB 1128.67MB 930.666MB 916.4MB 861.114MB 843.325MB 711.813MB

[problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin
Hi, all! We have a cluster with virtual 7 nodes (disk storage is connected to nodes with iSCSI). The storage schema is: Reports:{ 1:{ 1:{value1:some val, value2:some val}, 2:{value1:some val, value2:some val} ... }, 2:{ 1:{value1:some val, value2:some

Re: [problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin
p.s. Cassandra 1.1.4 On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin gaba...@gmail.com wrote: Hi, all! We have a cluster with virtual 7 nodes (disk storage is connected to nodes with iSCSI). The storage schema is: Reports:{ 1:{ 1:{value1:some val, value2:some val},

Re: [problem with OOM in nodes]

2012-09-20 Thread Tyler Hobbs
I'm not 100% that I understand your data model and read patterns correctly, but it sounds like you have large supercolumns and are requesting some of the subcolumns from individual super columns. If that's the case, the issue is that Cassandra must deserialize the entire supercolumn in memory