Splitting one report to multiple rows is uncomfortably
WHY? Reading from N disks is way faster than reading from 1 disk.
I think in terms of PlayOrm and then explain the model you can use so I
think in objects first
Report {
String uniqueId
String reportName; //may be indexable and query
Thanks a lot for helping. We came to the same decision clustering one
report to multiple cassandra rows (sorted buckets of report rows) and
manage clusters on client side.
On Tue, Sep 25, 2012 at 5:28 AM, aaron morton aa...@thelastpickle.com wrote:
What exactly is the problem with big rows?
What exactly is the problem with big rows?
During compaction the row will be passed through a slower two pass processing,
this add's to IO pressure.
Counting big rows requires that the entire row be read.
Repairing big rows requires that the entire row be repaired.
I generally avoid rows
/var/log/cassandra$ cat system.log | grep Compacting large | grep -E
[0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ;
print foo MB }' | sort -nr | head -n 50
Is it bad signal?
Sorry, I do not know what this is outputting.
As I can see in cfstats, compacted row maximum
On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com wrote:
/var/log/cassandra$ cat system.log | grep Compacting large | grep -E
[0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ;
print foo MB }' | sort -nr | head -n 50
Is it bad signal?
Sorry, I do not
Reports - is a SuperColumnFamily
Each report has unique identifier (report_id). This is a key of
SuperColumnFamily.
And a report saved in separate row.
A report is consisted of report rows (may vary between 1 and 50,
but most are small).
Each report row is saved in separate super column.
Found one more intersting fact.
As I can see in cfstats, compacted row maximum size: 386857368 !
On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin gaba...@gmail.com wrote:
Reports - is a SuperColumnFamily
Each report has unique identifier (report_id). This is a key of
SuperColumnFamily.
And
And some stuff from log:
/var/log/cassandra$ cat system.log | grep Compacting large | grep -E
[0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ;
print foo MB }' | sort -nr | head -n 50
3821.55MB
3337.85MB
1221.64MB
1128.67MB
930.666MB
916.4MB
861.114MB
843.325MB
711.813MB
Hi, all!
We have a cluster with virtual 7 nodes (disk storage is connected to
nodes with iSCSI). The storage schema is:
Reports:{
1:{
1:{value1:some val, value2:some val},
2:{value1:some val, value2:some val}
...
},
2:{
1:{value1:some val, value2:some
p.s. Cassandra 1.1.4
On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin gaba...@gmail.com wrote:
Hi, all!
We have a cluster with virtual 7 nodes (disk storage is connected to
nodes with iSCSI). The storage schema is:
Reports:{
1:{
1:{value1:some val, value2:some val},
I'm not 100% that I understand your data model and read patterns correctly,
but it sounds like you have large supercolumns and are requesting some of
the subcolumns from individual super columns. If that's the case, the
issue is that Cassandra must deserialize the entire supercolumn in memory
11 matches
Mail list logo