FW: ceph_monitor - monitor your cluster with parallel python

2015-11-19 Thread igor.podo...@ts.fujitsu.com
Hey, one more time here, I’ve got reject from mail daemon.

Regards,
Igor.

From: Podoski, Igor 
Sent: Thursday, November 19, 2015 8:53 AM
To: ceph-devel; 'ceph-us...@ceph.com'
Subject: ceph_monitor - monitor your cluster with parallel python

Hi Cephers!

I’ve created small tool to help track memory/cpu/io usage. It’s useful for me 
so I thought I could share with you: https://github.com/aiicore/ceph_monitor

In general this is a python script, that uses parallel python to run a function 
on remote host. Data is gathered from all hosts and presented on console or 
added to sqlite database, then can be plotted with e.g. gnuplot. You can define 
osd ranges, that you want to monitor, or monitor certain process, e.g. osds 
only from pool that has ssds.

The main concept is that monitor don’t know and don’t care on which host osd’s 
are running, it treats them as a whole set.

Script uses psutil to get data related to processes (mon/osd/rgw/whatever). In 
near feature I’d like to add modes that can modify process behavior e.g. psutil 
has .nice .ionice .cpu_affinity methods, that could be useful in some tests. 
Basically with parallel python you can run any function remotely, so tuning SO 
by changing some /proc/* files can be done too.

You can add labels to data to see when what happens.

Sample plot: 
https://raw.githubusercontent.com/aiicore/ceph_monitor/master/examples/avg_cpu_mem.png
 
Simple test: 
https://github.com/aiicore/ceph_monitor/blob/master/examples/example_test_with_rados.sh
 

Short readme:  https://github.com/aiicore/ceph_monitor 
Full readme: https://github.com/aiicore/ceph_monitor/blob/master/readme.txt 

I encourage You to use and develop it, if not just please read the full readme 
text, maybe you’ll  come up with a better idea based on my this concept and 
something interesting will happen.

p.s. This currently works with python 2.6 and psutil 0.6.1 on centos 6.6.  If 
you find any bug – report it on my github as an issue.

!!! Security notice !!!
Parallel python supports SHA authentication  – my version currently runs 
WITHOUT this so in certain environments it could be dangerous (you could run 
any function from untrusted client). For now use it only in test/dev isolated 
clusters.

Regards,
Igor.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel client OSD message version

2015-11-19 Thread Lakis, Jacek
Hi Cephers!

Recently, the version of OSDOp in Ceph increased to version 7 (in master, 6 in 
infernalis). Kernel client still encoding version 4.
Should we consider upstreaming patches to the client? 

Thanks,
J. J. Łakis



Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


OSD memory usage during startup - advice needed

2015-11-19 Thread Marcin Gibuła

Hi Ceph devs!

I'm trying to track down and fix huge memory usage when OSD is starting 
after unclean shutdown. Recently, after editing crush map, when 
backfills started, one of our OSDs died (it hit suicide timeout). It was 
also refusing to start again, crashing due to memory allocation failure 
(over 15G used) shortly after start.


Judging from debug output, the problem is in journal recovery, when it 
tries to delete object with huge (several milion keys - it is radosgw 
index* for bucket with over 50mln objects) amount of keys, using 
leveldb's rmkeys_by_prefix() method.


Looking at the source code, rmkeys_by_prefix() batches all operations 
into one list and then submit_transaction() executes them all atomically.


I'd love to write a patch for this issue, but it seems unfixable (or is 
it?) with current API and method behaviour. Could you offer any advice 
on how to proceed?


Backtrace below:

 1: /usr/bin/ceph-osd() [0xacd7ba]
 2: (()+0x10340) [0x7f9713500340]
 3: (gsignal()+0x39) [0x7f971199fcc9]
 4: (abort()+0x148) [0x7f97119a30d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f97122aa535]
 6: (()+0x5e6d6) [0x7f97122a86d6]
 7: (()+0x5e703) [0x7f97122a8703]
 8: (()+0x5e922) [0x7f97122a8922]
 9: (()+0x12b1e) [0x7f9713720b1e]
 10: (tc_new()+0x1e0) [0x7f9713740a00]
 11: (std::string::_Rep::_S_create(unsigned long, unsigned long, 
std::allocator const&)+0x59) [0x7f9712304209]
 12: (std::string::_Rep::_M_clone(std::allocator const&, unsigned 
long)+0x1b) [0x7f9712304dcb]

 13: (std::string::reserve(unsigned long)+0x34) [0x7f9712304e64]
 14: (std::string::append(char const*, unsigned long)+0x4f) 
[0x7f97123050af]
 15: 
(LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string 
const&)+0xcf) [0x97c44f]
 16: 
(DBObjectMap::clear_header(std::tr1::shared_ptr, 
std::tr1::shared_ptr)+0xc1) [0xa63171]
 17: (DBObjectMap::_clear(std::tr1::shared_ptr, 
std::tr1::shared_ptr)+0x91) [0xa682b1]
 18: (DBObjectMap::clear(ghobject_t const&, SequencerPosition 
const*)+0x202) [0xa6b292]
 19: (FileStore::lfn_unlink(coll_t, ghobject_t const&, 
SequencerPosition const&, bool)+0x16b) [0x9154fb]
 20: (FileStore::_remove(coll_t, ghobject_t const&, SequencerPosition 
const&)+0x8b) [0x915f6b]
 21: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned 
long, int, ThreadPool::TPHandle*)+0x3174) [0x926434]
 22: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4]
 23: (JournalingObjectStore::journal_replay(unsigned long)+0x5cb) 
[0x94355b]

 24: (FileStore::mount()+0x3bb6) [0x9139f6]
 25: (OSD::init()+0x259) [0x6c59b9]
 26: (main()+0x2860) [0x6527e0]
 27: (__libc_start_main()+0xf5) [0x7f971198aec5]
 28: /usr/bin/ceph-osd() [0x66b887]


I also suspect that deleting this object was also somehow responsible 
for initial crash, when OSD hit suicide timeout. Any advices on how to 
debug it further?



* - yes, I am aware of shared indexes, but that bucket was created 
pre-hammer and I can't move migrate it


--
mg

P.S. Please CC me, as I'm not subscribed.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[CEPH-DEVEL] ./run-make-check.sh

2015-11-19 Thread Shinobu Kinjo
Hello,

I saw this message during executing run-make-check.sh on fc23:

2015-11-20 01:05:37.529021 7f8742e3d7c0 -1  ** ERROR: error creating empty 
object store in testdir/osd-reactivate/0: (13) Permission denied

Can we ignore at the moment?

If I've missed anything, let me know.

Thanks!!
Shinobu
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: request_queue use-after-free - inode_detach_wb()

2015-11-19 Thread Tejun Heo
Hello,

On Thu, Nov 19, 2015 at 10:56:43PM +0100, Ilya Dryomov wrote:
> Detaching the inode earlier is what I suggested in the first email, but
> I didn't know if this kind of special casing was OK.  I'll try it out.

Yeah, I was confused.  Sorry about that.  On the surface, it looks
like a special case but everything around bdev is special case anyway
and looking at the underlying lifetime rules, I think this is the
right thing to do.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory usage during startup - advice needed

2015-11-19 Thread Samuel Just
Actually, looks like Xiaoxi beat you to it for infernalis!
42a3ab95ec459042e92198fb061c8393146bd1b4
-Sam

On Thu, Nov 19, 2015 at 12:30 PM, Marcin Gibuła  wrote:
>> Judging from debug output, the problem is in journal recovery, when it
>> tries to delete object with huge (several milion keys - it is radosgw
>> index* for bucket with over 50mln objects) amount of keys, using
>> leveldb's rmkeys_by_prefix() method.
>>
>> Looking at the source code, rmkeys_by_prefix() batches all operations
>> into one list and then submit_transaction() executes them all atomically.
>>
>> I'd love to write a patch for this issue, but it seems unfixable (or is
>> it?) with current API and method behaviour. Could you offer any advice
>> on how to proceed?
>
>
> Answering myself, could anyone verify if attached patch looks ok? Should
> reduce memory footprint a bit.
>
> When I first read this code, I assumed that data pointed by leveldb::Slice
> have to be reachable until db->Write is called.
>
> However, looking into leveldb and into its source code, there is no such
> requirement - leveldb makes its own copy of key, so we're effectivly
> doubling memory footprint for no reason.
>
> --
> mg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel client OSD message version

2015-11-19 Thread Ilya Dryomov
On Thu, Nov 19, 2015 at 11:27 AM, Lakis, Jacek  wrote:
> Ilya, thank you for the quick reply.
>
> For example it's about split decoding. I'm asking not because of specific 
> changes, I'm rather curious about when we should sync the kernel client 
> encoding to the master (or stable versions).

It's usually done when the need arises, like if the kernel client
starts using a field that isn't in the old enconding.  We don't update
for each new ceph release.

Thanks,

Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Kernel client OSD message version

2015-11-19 Thread Lakis, Jacek
Ok, thanks.
Coming back to the splitted decoding you mentioned:
This change, except the reordering, causes some performance improvement so the 
need arises. 
It's also not included in the stable release yet. So, it's good to contribute 
the patches?


J. J. Łakis


-Original Message-
From: Ilya Dryomov [mailto:idryo...@gmail.com] 
Sent: Thursday, November 19, 2015 12:00 PM
To: Lakis, Jacek 
Cc: ceph-devel 
Subject: Re: Kernel client OSD message version

On Thu, Nov 19, 2015 at 11:27 AM, Lakis, Jacek  wrote:
> Ilya, thank you for the quick reply.
>
> For example it's about split decoding. I'm asking not because of specific 
> changes, I'm rather curious about when we should sync the kernel client 
> encoding to the master (or stable versions).

It's usually done when the need arises, like if the kernel client starts using 
a field that isn't in the old enconding.  We don't update for each new ceph 
release.

Thanks,

Ilya


Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Reboot blocked when undoing unmap op.

2015-11-19 Thread Wukongming
Hi Sage,

I created a rbd image, and mapped to a local which means I can find /dev/rbd0, 
at this time I reboot the system, in last step of shutting down, it blocked 
with an error

[235618.0202207] libceph: connect 172.16.57.252:6789 error -101.

My Works’ Env:

Ubuntu kernel 3.19.0
Ceph 0.94.5
A cluster of 2 Servers with iscsitgt and open-iscsi, both as server and client. 
Multipath process is on but not affect this issue. I’ve tried stopping 
multipath, but the issue still there.
I map a rbd image to a local, why show me a connect error?

I saw your reply on 
http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/13077, but just 
apart. Is this issue resolved and how?

Thanks!!
wukongming
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

转发: Reboot blocked when undoing unmap op.

2015-11-19 Thread Wukongming
By the way, the ceph cluster is OK before rebooting. But when failed rebooting, 
we should Cold reboot the server and may cause ceph cluster with bad condition 
especially when heartbeating network is added.

2015-10-26 06:39:41.065157 mon.0 172.16.142.139:6789/0 2519 : cluster [INF] 
pgmap v19973: 2048 pgs: 655 active+undersized+degraded, 10 active+remapped, 724 
active+clean, 659 undersized+degraded+peered; 436 MB data, 2290 MB used, 15740 
GB / 15742 GB avail; 68/232 objects degraded (29.310%)

-
wukongming ID: 12019
Tel:0571-86760239
Dept:2014 UIS2 OneStor


-邮件原件-
发件人: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] 
代表 Wukongming
发送时间: 2015年11月20日 10:19
收件人: ceph-devel@vger.kernel.org; Sage Weil
主题: Reboot blocked when undoing unmap op.

Hi Sage,

I created a rbd image, and mapped to a local which means I can find /dev/rbd0, 
at this time I reboot the system, in last step of shutting down, it blocked 
with an error

[235618.0202207] libceph: connect 172.16.57.252:6789 error -101.

My Works’ Env:

Ubuntu kernel 3.19.0
Ceph 0.94.5
A cluster of 2 Servers with iscsitgt and open-iscsi, both as server and client. 
Multipath process is on but not affect this issue. I’ve tried stopping 
multipath, but the issue still there.
I map a rbd image to a local, why show me a connect error?

I saw your reply on 
http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/13077, but just 
apart. Is this issue resolved and how?

Thanks!!
wukongming
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by phone or email immediately and delete it!
N r  y   b X  ǧv ^ )޺{.n +   z ]z   {ay ʇڙ ,j   f   h   z  w   j:+v   
w j m zZ+ ݢj"  ! i
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [CEPH-DEVEL] ./run-make-check.sh

2015-11-19 Thread Shinobu Kinjo
I always run with root -;

Thanks!
Shinobu

- Original Message -
From: "tobe" 
To: "Shinobu Kinjo" 
Cc: "ceph-devel" 
Sent: Friday, November 20, 2015 11:24:57 AM
Subject: Re: [CEPH-DEVEL] ./run-make-check.sh

It seems to be the permission problem and you got "Permission denied".

Can you try again as the root user or sudo?

On Fri, Nov 20, 2015 at 7:39 AM, Shinobu Kinjo  wrote:

> Hello,
>
> I saw this message during executing run-make-check.sh on fc23:
>
> 2015-11-20 01:05:37.529021 7f8742e3d7c0 -1  ** ERROR: error creating empty
> object store in testdir/osd-reactivate/0: (13) Permission denied
>
> Can we ignore at the moment?
>
> If I've missed anything, let me know.
>
> Thanks!!
> Shinobu
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [CEPH] OSD daemons running with a large number of threads

2015-11-19 Thread ghislain.chevalier
Hi,

Thanks for the advice

Brgds


-Message d'origine-
De : Sage Weil [mailto:s...@newdream.net] 
Envoyé : mardi 17 novembre 2015 14:41
À : CHEVALIER Ghislain IMT/OLPS
Cc : ceph-devel@vger.kernel.org
Objet : Re: [CEPH] OSD daemons running with a large number of threads

On Tue, 17 Nov 2015, ghislain.cheval...@orange.com wrote:
> Hi,
> 
> Context:
> Firefly 0.80.9
> Ubuntu 14.04.1
> Almost a production platform  in an openstack environment
> 176 OSD (SAS and SSD), 2 crushmap-oriented storage classes , 8 servers 
> in 2 rooms, 3 monitors on openstack controllers
> Usage: Rados Gateway for object service and RBD as back-end for Cinder 
> and Glance
> 
> Issue:
> We are currently running performances tests on this cluster before turning it 
> to production.
> We created cinder volumes (attached to Ceph Back End) on virtual machines and 
> we use FIO to stress the cluster.
> A very large number of threads are created per OSD daemon (about 1000).

This is normal.  The init scripts set the max open files ulimit to a high value 
(usually 4194304 to avoid any possibility of hitting it) but you may need to 
set /proc/sys/kernel/pid_max to something big if your cluster is large.

sage

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html