Re: SegFault in Crawler Part

2021-06-01 Thread dormando
You can't evict memory that's being used to load data from the network.
So if you have a low amount of memory and run a benchmark doing a bunch of
parallel writes you're going to be sad.

On Tue, 1 Jun 2021, Qingchen Dang wrote:

> Thank you very much! Yes your guess is correct, I forgot the possibility of 
> evicting a crawler item :(
> Furthermore, I have a similar problem as this post: 
> https://github.com/memcached/memcached/issues/467
> I gave a very limited memory usage to Memcached to test eviction and it does 
> cause the similar error.
> When I use Memtier_Benchmark, the error looks like:
>
> [RUN #1] Preparing benchmark client...
>
> [RUN #1] Launching threads now...
>
> error: response parsing failed.
>
> error: response parsing failed.
>
> server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
> storing object
>
> error: response parsing failed.
>
> server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
> storing object
>
> error: response parsing failed.
>
> [RUN #1 17%,   0 secs]  1 threads:       87137 ops,   87213 (avg:   87213) 
> ops/sec, 65.66MB/sec (avg: 65.66MB/sec
>
> [RUN #1 36%,   1 secs]  1 threads:      179012 ops,   91864 (avg:   89540) 
> ops/sec, 69.87MB/sec (avg: 67.76MB/sec
>
> [RUN #1 56%,   2 secs]  1 threads:      279971 ops,  100947 (avg:   93343) 
> ops/sec, 76.76MB/sec (avg: 70.76MB/sec
>
> [RUN #1 75%,   3 secs]  1 threads:      375715 ops,   95732 (avg:   93941) 
> ops/sec, 72.87MB/sec (avg: 71.29MB/sec
>
> [RUN #1 92%,   4 secs]  1 threads:      462054 ops,   93910 (avg:   93935) 
> ops/sec, 71.41MB/sec (avg: 71.31MB/sec
>
> [RUN #1 92%,   4 secs]  1 threads:      462054 ops,       0 (avg:   92431) 
> ops/sec, 0.00KB/sec (avg: 70.17MB/sec)
>
> [RUN #1 92%,   5 secs]  1 threads:      462054 ops,       0 (avg:   90975) 
> ops/sec, 0.00KB/sec (avg: 69.06MB/sec)
>
> [RUN #1 92%,   5 secs]  1 threads:      462054 ops,       0 (avg:   89564) 
> ops/sec, 0.00KB/sec (avg: 67.99MB/sec)
>
> When I use Memaslap, it looks like 
>
> set proportion: set_prop=0.10
>
> get proportion: get_prop=0.90
>
> <12 SERVER_ERROR out of memory storing object
>
> <10 SERVER_ERROR out of memory storing object
>
> <12 SERVER_ERROR out of memory storing object
>
> <7 SERVER_ERROR out of memory storing object
>
> The unmodified Memcached gives errors less frequently than Memcached with my 
> eviction framework (especially using Memtier_Benchmark), so I wonder the
> reason. I read your post message in the above link, but I am still confused 
> about why memory limitation affect Memcached's usage. Could you give a more
> detailed explanation? If I have to give limited memory, is there a way to 
> avoid this issue?
> Thank you very much for helping!
>
> Best,
> Qingchen
> On Tuesday, June 1, 2021 at 2:36:09 AM UTC-4 Dormando wrote:
>   try '-o no_lru_crawler' ? That definitely works.
>
>   I don't know what you're doing since no code has been provided. The 
> locks
>   around managing LRU tails is pretty strict; so make sure you are 
> actually
>   using them correctly.
>
>   The LRU crawler works by injecting a fake item into the LRU, then using
>   that to keep its position and walk. If I had to guess I bet you've
>   "evicted" the LRU crawler, which then immediately dies when it tries to
>   continue crawling.
>
>   On Mon, 31 May 2021, Qingchen Dang wrote:
>
>   > Furthermore, I tried to disable the crawler with the '- 
> no_lru_crawler' command parameter, and it gives the same error. I wonder why 
> it
>   does not disable
>   > the crawler lru as it supposes to do.
>   >
>   > On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
>   > Hi,
>   > I am implementing a framework based on Memcached. There's a problem 
> that confused me a lot. The framework basically change the eviction
>   policy, so
>   > when it calls to evict an item, it might not evict the tail item at 
> COLD LRU, instead it will look for a "more suitable" item to evict and
>   it will
>   > reinsert the tail items to the head of COLD queue.
>   >
>   > It mostly works fine, but sometimes it causes a SegFault when 
> reinsertion happens very frequently (like in almost each eviction). The
>   SegFault is
>   > triggered in the crawler part. As attached, it seems when the crawler 
> loops through the item queue, it reaches an invalid memory address.
>   The bug
>   > happens after around 5000~1000 GET/SET (9:1) operations. I 
> used Memaslap for testing.
>   >
>   > Could anyone give me some suggestions of the reasons which cause such 
> error?
>   >
>   > Here is the gdb messages:
>   >
>   > Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
>   >
>   > [Switching to Thread 0x74d6c700 (LWP 36414)]
>   >
>   > do_item_crawl_q (it=it@entry=0x5579e7e0 )
>   >
>   >     at items.c:2015
>   >
>  

Re: SegFault in Crawler Part

2021-06-01 Thread Qingchen Dang
Thank you very much! Yes your guess is correct, I forgot the possibility of 
evicting a crawler item :(

Furthermore, I have a similar problem as this 
post: https://github.com/memcached/memcached/issues/467
I gave a very limited memory usage to Memcached to test eviction and it 
does cause the similar error.
When I use Memtier_Benchmark, the error looks like:

*[RUN #1] Preparing benchmark client...*

*[RUN #1] Launching threads now...*

*error: response parsing failed.*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*server 127.0.0.1:11211 handle error response: SERVER_ERROR out of memory 
storing object*

*error: response parsing failed.*

*[RUN #1 17%,   0 secs]  1 threads:   87137 ops,   87213 (avg:   87213) 
ops/sec, 65.66MB/sec (avg: 65.66MB/sec*

*[RUN #1 36%,   1 secs]  1 threads:  179012 ops,   91864 (avg:   89540) 
ops/sec, 69.87MB/sec (avg: 67.76MB/sec*

*[RUN #1 56%,   2 secs]  1 threads:  279971 ops,  100947 (avg:   93343) 
ops/sec, 76.76MB/sec (avg: 70.76MB/sec*

*[RUN #1 75%,   3 secs]  1 threads:  375715 ops,   95732 (avg:   93941) 
ops/sec, 72.87MB/sec (avg: 71.29MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   93910 (avg:   93935) 
ops/sec, 71.41MB/sec (avg: 71.31MB/sec*

*[RUN #1 92%,   4 secs]  1 threads:  462054 ops,   0 (avg:   92431) 
ops/sec, 0.00KB/sec (avg: 70.17MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   90975) 
ops/sec, 0.00KB/sec (avg: 69.06MB/sec)*

*[RUN #1 92%,   5 secs]  1 threads:  462054 ops,   0 (avg:   89564) 
ops/sec, 0.00KB/sec (avg: 67.99MB/sec)*
When I use Memaslap, it looks like 

*set proportion: set_prop=0.10*

*get proportion: get_prop=0.90*

*<12 SERVER_ERROR out of memory storing object*

*<10 SERVER_ERROR out of memory storing object*

*<12 SERVER_ERROR out of memory storing object*

*<7 SERVER_ERROR out of memory storing object*
The unmodified Memcached gives errors less frequently than Memcached with 
my eviction framework (especially using Memtier_Benchmark), so I wonder the 
reason. I read your post message in the above link, but I am still confused 
about why memory limitation affect Memcached's usage. Could you give a more 
detailed explanation? If I have to give limited memory, is there a way to 
avoid this issue?
Thank you very much for helping!

Best,
Qingchen
On Tuesday, June 1, 2021 at 2:36:09 AM UTC-4 Dormando wrote:

> try '-o no_lru_crawler' ? That definitely works.
>
> I don't know what you're doing since no code has been provided. The locks
> around managing LRU tails is pretty strict; so make sure you are actually
> using them correctly.
>
> The LRU crawler works by injecting a fake item into the LRU, then using
> that to keep its position and walk. If I had to guess I bet you've
> "evicted" the LRU crawler, which then immediately dies when it tries to
> continue crawling.
>
> On Mon, 31 May 2021, Qingchen Dang wrote:
>
> > Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
> command parameter, and it gives the same error. I wonder why it does not 
> disable
> > the crawler lru as it supposes to do.
> >
> > On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
> > Hi,
> > I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so
> > when it calls to evict an item, it might not evict the tail item at COLD 
> LRU, instead it will look for a "more suitable" item to evict and it will
> > reinsert the tail items to the head of COLD queue.
> >
> > It mostly works fine, but sometimes it causes a SegFault when 
> reinsertion happens very frequently (like in almost each eviction). The 
> SegFault is
> > triggered in the crawler part. As attached, it seems when the crawler 
> loops through the item queue, it reaches an invalid memory address. The bug
> > happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
> >
> > Could anyone give me some suggestions of the reasons which cause such 
> error?
> >
> > Here is the gdb messages:
> >
> > Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
> >
> > [Switching to Thread 0x74d6c700 (LWP 36414)]
> >
> > do_item_crawl_q (it=it@entry=0x5579e7e0 )
> >
> > at items.c:2015
> >
> > 2015 it->prev->next = it->next;
> >
> > (gdb) print it->prev
> >
> > $5 = (struct _stritem *) 0x4f4d6355616d5471
> >
> > (gdb) print it->prev->next
> >
> > Cannot access memory at address 0x4f4d6355616d5479
> >
> > (gdb) print it->next
> >
> > $6 = (struct _stritem *) 0x7a59324376753351
> >
> > (gdb) print it->next->prev
> >
> > Cannot access memory at address 0x7a59324376753361
> >
> > (gdb) print it->nkey
> >
> > $7 = 0 '\000'
> >
> > (gdb) 
> >
> > Here is the part that triggers the error:
> >
> > 2012 assert(it->next != it);
> >

Re: SegFault in Crawler Part

2021-06-01 Thread dormando
try '-o no_lru_crawler' ? That definitely works.

I don't know what you're doing since no code has been provided. The locks
around managing LRU tails is pretty strict; so make sure you are actually
using them correctly.

The LRU crawler works by injecting a fake item into the LRU, then using
that to keep its position and walk. If I had to guess I bet you've
"evicted" the LRU crawler, which then immediately dies when it tries to
continue crawling.

On Mon, 31 May 2021, Qingchen Dang wrote:

> Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
> command parameter, and it gives the same error. I wonder why it does not 
> disable
> the crawler lru as it supposes to do.
>
> On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:
>   Hi,
> I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so
> when it calls to evict an item, it might not evict the tail item at COLD LRU, 
> instead it will look for a "more suitable" item to evict and it will
> reinsert the tail items to the head of COLD queue.
>
> It mostly works fine, but sometimes it causes a SegFault when reinsertion 
> happens very frequently (like in almost each eviction). The SegFault is
> triggered in the crawler part. As attached, it seems when the crawler loops 
> through the item queue, it reaches an invalid memory address. The bug
> happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
>
> Could anyone give me some suggestions of the reasons which cause such error?
>
> Here is the gdb messages:
>
> Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.
>
> [Switching to Thread 0x74d6c700 (LWP 36414)]
>
> do_item_crawl_q (it=it@entry=0x5579e7e0 )
>
>     at items.c:2015
>
> 2015             it->prev->next = it->next;
>
> (gdb) print it->prev
>
> $5 = (struct _stritem *) 0x4f4d6355616d5471
>
> (gdb) print it->prev->next
>
> Cannot access memory at address 0x4f4d6355616d5479
>
> (gdb) print it->next
>
> $6 = (struct _stritem *) 0x7a59324376753351
>
> (gdb) print it->next->prev
>
> Cannot access memory at address 0x7a59324376753361
>
> (gdb) print it->nkey
>
> $7 = 0 '\000'
>
> (gdb) 
>
> Here is the part that triggers the error:
>
> 2012         assert(it->next != it);
>
> 2013         if (it->next) {
>
> 2014             assert(it->prev->next == it);
>
> 2015             it->prev->next = it->next;
>
> 2016             it->next->prev = it->prev;
>
> 2017         } else {
>
> 2018             /* Tail. Move this above? */
>
> 2019             it->prev->next = 0;
>
> 2020         }
>
> (I'm also confused why the assert function in line 2014 does not give error?)
>
> Thank you very much for helping!
>
> Best,
>
> Qingchen
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "memcached" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to memcached+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/memcached/1398d377-06b8-4a43-8811-f299d044d055n%40googlegroups.com.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/1f184a63-c220-c949-91f9-9aeca3ff1d85%40rydia.net.


Re: SegFault in Crawler Part

2021-05-31 Thread Qingchen Dang
Furthermore, I tried to disable the crawler with the '- no_lru_crawler' 
command parameter, and it gives the same error. I wonder why it does not 
disable the crawler lru as it supposes to do.

On Monday, May 31, 2021 at 1:02:38 AM UTC-4 Qingchen Dang wrote:

> Hi,
>
> I am implementing a framework based on Memcached. There's a problem that 
> confused me a lot. The framework basically change the eviction policy, so 
> when it calls to evict an item, it might not evict the tail item at COLD 
> LRU, instead it will look for a "more suitable" item to evict and it will 
> reinsert the tail items to the head of COLD queue.
>
> It mostly works fine, but sometimes it causes a SegFault when reinsertion 
> happens very frequently (like in almost each eviction). The SegFault is 
> triggered in the crawler part. As attached, it seems when the crawler loops 
> through the item queue, it reaches an invalid memory address. The bug 
> happens after around 5000~1000 GET/SET (9:1) operations. I used 
> Memaslap for testing.
>
> Could anyone give me some suggestions of the reasons which cause such 
> error?
>
> Here is the gdb messages:
>
> *Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.*
>
> *[Switching to Thread 0x74d6c700 (LWP 36414)]*
>
> *do_item_crawl_q (it=it@entry=0x5579e7e0 )*
>
> *at items.c:2015*
>
> *2015 it->prev->next = it->next;*
>
> *(gdb) print it->prev*
>
> *$5 = (struct _stritem *) 0x4f4d6355616d5471*
>
> *(gdb) print it->prev->next*
>
> *Cannot access memory at address 0x4f4d6355616d5479*
>
> *(gdb) print it->next*
>
> *$6 = (struct _stritem *) 0x7a59324376753351*
>
> *(gdb) print it->next->prev*
>
> *Cannot access memory at address 0x7a59324376753361*
>
> *(gdb) print it->nkey*
>
> *$7 = 0 '\000'*
>
> *(gdb) *
> Here is the part that triggers the error:
>
> *2012 assert(it->next != it);*
>
> *2013 if (it->next) {*
>
> *2014 assert(it->prev->next == it);*
>
> *2015 it->prev->next = it->next;*
>
> *2016 it->next->prev = it->prev;*
>
> *2017 } else {*
>
> *2018 /* Tail. Move this above? */*
>
> *2019 it->prev->next = 0;*
>
> *2020 }*
>
> (I'm also confused why the assert function in line 2014 does not give 
> error?)
>
> Thank you very much for helping!
>
> Best,
>
> Qingchen
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/1398d377-06b8-4a43-8811-f299d044d055n%40googlegroups.com.


SegFault in Crawler Part

2021-05-30 Thread Qingchen Dang
Hi,

I am implementing a framework based on Memcached. There's a problem that 
confused me a lot. The framework basically change the eviction policy, so 
when it calls to evict an item, it might not evict the tail item at COLD 
LRU, instead it will look for a "more suitable" item to evict and it will 
reinsert the tail items to the head of COLD queue.

It mostly works fine, but sometimes it causes a SegFault when reinsertion 
happens very frequently (like in almost each eviction). The SegFault is 
triggered in the crawler part. As attached, it seems when the crawler loops 
through the item queue, it reaches an invalid memory address. The bug 
happens after around 5000~1000 GET/SET (9:1) operations. I used 
Memaslap for testing.

Could anyone give me some suggestions of the reasons which cause such error?

Here is the gdb messages:

*Thread 8 "memcached" received signal SIGSEGV, Segmentation fault.*

*[Switching to Thread 0x74d6c700 (LWP 36414)]*

*do_item_crawl_q (it=it@entry=0x5579e7e0 )*

*at items.c:2015*

*2015 it->prev->next = it->next;*

*(gdb) print it->prev*

*$5 = (struct _stritem *) 0x4f4d6355616d5471*

*(gdb) print it->prev->next*

*Cannot access memory at address 0x4f4d6355616d5479*

*(gdb) print it->next*

*$6 = (struct _stritem *) 0x7a59324376753351*

*(gdb) print it->next->prev*

*Cannot access memory at address 0x7a59324376753361*

*(gdb) print it->nkey*

*$7 = 0 '\000'*

*(gdb) *
Here is the part that triggers the error:

*2012 assert(it->next != it);*

*2013 if (it->next) {*

*2014 assert(it->prev->next == it);*

*2015 it->prev->next = it->next;*

*2016 it->next->prev = it->prev;*

*2017 } else {*

*2018 /* Tail. Move this above? */*

*2019 it->prev->next = 0;*

*2020 }*

(I'm also confused why the assert function in line 2014 does not give 
error?)

Thank you very much for helping!

Best,

Qingchen

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to memcached+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/memcached/ed80d89c-eb8f-4682-9938-a7cd024d4d10n%40googlegroups.com.