Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Daniel Schneller daniel.schnel...@centerdevice.com
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, March 3, 2015 2:54:13 AM
 Subject: [ceph-users] Understand RadosGW logs
 
 Hi!
 
 After realizing the problem with log rotation (see
 http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
 and fixing it, I now for the first time have some
 meaningful (and recent) logs to look at.
 
 While from an application perspective there seem
 to be no issues, I would like to understand some
 messages I find with relatively high frequency in
 the logs:
 
 Exhibit 1
 -
 2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1

It means that returning data to the client got some error, usually means that 
the client disconnected before completion.
 
 I cannot find anything relevant by Googling for
 that, apart from the actual line of code that
 produces this line.
 What does that mean? Is it an indication of data
 corruption or are there more benign reasons for
 this line?
 
 
 Exhibit 2
 --
 Several of these blocks
 
 2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
 req=0x7fcf5800f3b0 =
 2015-03-03 07:06:17.836671 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836758 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836918 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
 req=0x7fcf5800f3b0 http_status=200 ==
 ...
 2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:27:29.866718 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866778 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866852 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866917 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.875466 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.884434 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.906155 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.914364 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.940653 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:30.272816 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.125773 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 ...
 2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:28:43.016414 7fcf2a7d4700  0
 RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
 part_ofs=0 rule-part_size=0
 2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 
 First, what is the req= line? Is that a thread-id?
 I am asking, because the same id is used over and over
 in the same file over time.

It's the request id (within the current radosgw instance)

 
 More

Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Daniel Schneller

Bump...

On 2015-03-03 10:54:13 +, Daniel Schneller said:


Hi!

After realizing the problem with log rotation (see
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
and fixing it, I now for the first time have some
meaningful (and recent) logs to look at.

While from an application perspective there seem
to be no issues, I would like to understand some
messages I find with relatively high frequency in
the logs:

Exhibit 1
-
2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1

I cannot find anything relevant by Googling for
that, apart from the actual line of code that
produces this line.
What does that mean? Is it an indication of data
corruption or are there more benign reasons for
this line?


Exhibit 2
--
Several of these blocks

2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
req=0x7fcf5800f3b0 =
2015-03-03 07:06:17.836671 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836758 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836918 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
part_ofs=0 rule-part_size=0
2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
req=0x7fcf5800f3b0 http_status=200 ==
...
2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
req=0x7fcf580102a0 =
2015-03-03 09:27:29.866718 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866778 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866852 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866917 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.875466 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.884434 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.906155 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.914364 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.940653 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
part_ofs=0 rule-part_size=0
2015-03-03 09:27:30.272816 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.125773 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
req=0x7fcf580102a0 http_status=200 ==
...
2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
req=0x7fcf580102a0 =
2015-03-03 09:28:43.016414 7fcf2a7d4700  0
RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
part_ofs=0 rule-part_size=0
2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
req=0x7fcf580102a0 http_status=200 ==

First, what is the req= line? Is that a thread-id?
I am asking, because the same id is used over and over
in the same file over time.

More importantly, what do the RGWObjManifest::operator++():...
lines mean? In the middle case above the block even ends
with one of the ERROR lines mentioned before, but the HTTP
status is still 200, suggesting a succesful operation.

Thanks in advance for shedding some light, because I would like
to know if I need to take some action or at least keep an
eye on these via monitoring?

Cheers,
Daniel





[ceph-users] Understand RadosGW logs

2015-03-03 Thread Daniel Schneller

Hi!

After realizing the problem with log rotation (see
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
and fixing it, I now for the first time have some
meaningful (and recent) logs to look at.

While from an application perspective there seem
to be no issues, I would like to understand some
messages I find with relatively high frequency in
the logs:

Exhibit 1
-
2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1


I cannot find anything relevant by Googling for
that, apart from the actual line of code that
produces this line.
What does that mean? Is it an indication of data
corruption or are there more benign reasons for
this line?


Exhibit 2
--
Several of these blocks

2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request 
req=0x7fcf5800f3b0 =
2015-03-03 07:06:17.836671 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836758 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836918 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done 
req=0x7fcf5800f3b0 http_status=200 ==

...
2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request 
req=0x7fcf580102a0 =
2015-03-03 09:27:29.866718 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866778 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866852 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866917 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.875466 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.884434 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.906155 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.914364 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.940653 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:30.272816 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.125773 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done 
req=0x7fcf580102a0 http_status=200 ==

...
2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request 
req=0x7fcf580102a0 =
2015-03-03 09:28:43.016414 7fcf2a7d4700  0 
RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579 
part_ofs=0 rule-part_size=0
2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done 
req=0x7fcf580102a0 http_status=200 ==


First, what is the req= line? Is that a thread-id?
I am asking, because the same id is used over and over
in the same file over time.

More importantly, what do the RGWObjManifest::operator++():...
lines mean? In the middle case above the block even ends
with one of the ERROR lines mentioned before, but the HTTP
status is still 200, suggesting a succesful operation.

Thanks in advance for shedding some light, because I would like
to know if I need to take some action or at least keep an
eye on these via monitoring?

Cheers,
Daniel


___
ceph-users