Re: unable to read saved rowcache from disk

aaron morton Fri, 16 Nov 2012 17:16:05 -0800

> Just curious why do you think row key will take 300 byte? 
That's what I thought it said earlier in the email thread.


>  If the row key is Long type, doesn't it take 8 bytes?
Yes, 8 bytes on disk. 
 
> In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
> Did I miss something?


Did that take into account the token, the row key, and the row payload, and the 
java memory overhead ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/11/2012, at 9:35 AM, Wei Zhu <wz1...@yahoo.com> wrote:

> Just curious why do you think row key will take 300 byte? If the row key is 
> Long type, doesn't it take 8 bytes?
> In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. 
> Did I miss something?
> 
> Thanks.
> -Wei
> 
> From: aaron morton <aa...@thelastpickle.com>
> To: user@cassandra.apache.org 
> Sent: Thursday, November 15, 2012 12:15 PM
> Subject: Re: unable to read saved rowcache from disk
> 
> For a row cache of 1,650,000:
> 
> 16 byte token
> 300 byte row key ? 
> and row data ? 
> multiply by a java fudge factor or 5 or 10. 
> 
> Trying delete the saved cache and restarting.
> 
> Cheers
>  
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 15/11/2012, at 8:20 PM, Wz1975 <wz1...@yahoo.com> wrote:
> 
>> Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average 
>> 300B, so 700k row should be a little over 200m, unless it is reading more,  
>> maybe tombstone?  Or the rows on disk  have grown for some reason,  but row 
>> cache was not updated?  Could be something else eats up the memory.  You may 
>> profile memory and see who consumes the memory. 
>> 
>> 
>> Thanks.
>> -Wei
>> 
>> Sent from my Samsung smartphone on AT&T 
>> 
>> 
>> -------- Original message --------
>> Subject: Re: unable to read saved rowcache from disk 
>> From: Manu Zhang <owenzhang1...@gmail.com> 
>> To: user@cassandra.apache.org 
>> CC: 
>> 
>> 
>> 3G, other jvm parameters are unchanged. 
>> 
>> 
>> On Thu, Nov 15, 2012 at 2:40 PM, Wz1975 <wz1...@yahoo.com> wrote:
>> How big is your heap?  Did you change the jvm parameter? 
>> 
>> 
>> 
>> Thanks.
>> -Wei
>> 
>> Sent from my Samsung smartphone on AT&T 
>> 
>> 
>> -------- Original message --------
>> Subject: Re: unable to read saved rowcache from disk 
>> From: Manu Zhang <owenzhang1...@gmail.com> 
>> To: user@cassandra.apache.org 
>> CC: 
>> 
>> 
>> add a counter and print out myself
>> 
>> 
>> On Thu, Nov 15, 2012 at 1:51 PM, Wz1975 <wz1...@yahoo.com> wrote:
>> Curious where did you see this? 
>> 
>> 
>> Thanks.
>> -Wei
>> 
>> Sent from my Samsung smartphone on AT&T 
>> 
>> 
>> -------- Original message --------
>> Subject: Re: unable to read saved rowcache from disk 
>> From: Manu Zhang <owenzhang1...@gmail.com> 
>> To: user@cassandra.apache.org 
>> CC: 
>> 
>> 
>> OOM at deserializing 747321th row
>> 
>> 
>> On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>> oh, as for the number of rows, it's 1650000. How long would you expect it to 
>> be read back?
>> 
>> 
>> On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu <wz1...@yahoo.com> wrote:
>> Good information Edward. 
>> For my case, we have good size of RAM (76G) and the heap is 8G. So I set the 
>> row cache to be 800M as recommended. Our column is kind of big, so the hit 
>> ratio for row cache is around 20%, so according to datastax, might just turn 
>> the row cache altogether. 
>> Anyway, for restart, it took about 2 minutes to load the row cache
>> 
>>  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) reading 
>> saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
>>  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) 
>> completed loading (102801 ms; 21125 keys) row cache for XXX.f2 
>> 
>> Just for comparison, our key is long, the disk usage for row cache is 253K. 
>> (it only stores key when row cache is saved to disk, so 253KB/ 8bytes = 
>> 31625 number of keys). It's about right...
>> So for 15MB, there could be a lot of "narrow" rows. (if the key is Long, 
>> could be more than 1M rows)
>>   
>> Thanks.
>> -Wei
>> From: Edward Capriolo <edlinuxg...@gmail.com>
>> To: user@cassandra.apache.org 
>> Sent: Tuesday, November 13, 2012 11:13 PM
>> Subject: Re: unable to read saved rowcache from disk
>> 
>> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>> 
>> A negative side-effect of a large row-cache is start-up time. The
>> periodic saving of the row cache information only saves the keys that
>> are cached; the data has to be pre-fetched on start-up. On a large
>> data set, this is probably going to be seek-bound and the time it
>> takes to warm up the row cache will be linear with respect to the row
>> cache size (assuming sufficiently large amounts of data that the seek
>> bound I/O is not subject to optimization by disks)
>> 
>> Assuming a row cache 15MB and the average row is 300 bytes, that could
>> be 50,000 entries. 4 hours seems like a long time to read back 50K
>> entries. Unless the source table was very large and you can only do a
>> small number / reads/sec.
>> 
>> On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>> > "incorrect"... what do you mean? I think it's only 15MB, which is not big.
>> >
>> >
>> > On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo <edlinuxg...@gmail.com>
>> > wrote:
>> >>
>> >> Yes the row cache "could be" incorrect so on startup cassandra verify they
>> >> saved row cache by re reading. It takes a long time so do not save a big 
>> >> row
>> >> cache.
>> >>
>> >>
>> >> On Tuesday, November 13, 2012, Manu Zhang <owenzhang1...@gmail.com> wrote:
>> >> > I have a rowcache provieded by SerializingCacheProvider.
>> >> > The data that has been read into it is about 500MB, as claimed by
>> >> > jconsole. After saving cache, it is around 15MB on disk. Hence, I 
>> >> > suppose
>> >> > the size from jconsole is before serializing.
>> >> > Now while restarting Cassandra, it's unable to read saved rowcache back.
>> >> > By "unable", I mean around 4 hours and I have to abort it and remove 
>> >> > cache
>> >> > so as not to suspend other tasks.
>> >> > Since the data aren't huge, why Cassandra can't read it back?
>> >> > My Cassandra is 1.2.0-beta2.
>> >
>> >
>> 
>> 
>> 
>> 
>> 
>> 9
> 
> 
>

Re: unable to read saved rowcache from disk

Reply via email to