[ceph-users] tracker.ceph.com

2016-12-17 Thread Dan Mick

tracker.ceph.com is having issues. I'm looking at it.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore + erasure coding memory usage

2016-12-17 Thread bobobo1...@gmail.com
I've tested this on the latest Kraken RC (installed on RHEL from the el7
repo) and it seemed promising at first but the OSDs still gradually consume
all available memory until OOM killed, they just do so slower. It takes
them a couple of hours to go from 500M each to >2G each. After they're
restarted, they start at ~1G instead of 500M.

On Fri, Nov 18, 2016 at 6:32 PM, bobobo1...@gmail.com 
wrote:

> Just to update, this is still an issue as of the latest Git commit
> (64bcf92e87f9fbb3045de49b7deb53aca1989123).
>
> On Fri, Nov 11, 2016 at 1:31 PM, bobobo1...@gmail.com <
> bobobo1...@gmail.com> wrote:
>
>> Here's another: http://termbin.com/smnm
>>
>> On Fri, Nov 11, 2016 at 1:28 PM, Sage Weil  wrote:
>> > On Fri, 11 Nov 2016, bobobo1...@gmail.com wrote:
>> >> Any more data needed?
>> >>
>> >> On Wed, Nov 9, 2016 at 9:29 AM, bobobo1...@gmail.com
>> >>  wrote:
>> >> > Here it is after running overnight (~9h): http://ix.io/1DNi
>> >
>> > I'm getting a 500 on that URL...
>> >
>> > sage
>> >
>> >
>> >> >
>> >> > On Tue, Nov 8, 2016 at 11:00 PM, bobobo1...@gmail.com
>> >> >  wrote:
>> >> >> Ah, I was actually mistaken. After running without Valgrind, it
>> seems
>> >> >> I just estimated how slowed down it was. I'll leave it to run
>> >> >> overnight as suggested.
>> >> >>
>> >> >> On Tue, Nov 8, 2016 at 10:44 PM, bobobo1...@gmail.com
>> >> >>  wrote:
>> >> >>> Okay, I left it for 3h and it seemed to actually stabilise at
>> around
>> >> >>> 2.3G: http://ix.io/1DEK
>> >> >>>
>> >> >>> This was only after disabling other services on the system however.
>> >> >>> Generally this much RAM isn't available to Ceph (hence the OOM
>> >> >>> previously).
>> >> >>>
>> >> >>> On Tue, Nov 8, 2016 at 9:00 AM, Mark Nelson 
>> wrote:
>> >>  It should be running much slower through valgrind so probably
>> won't
>> >>  accumulate very quickly.  That was the problem with the earlier
>> trace, there
>> >>  wasn't enough memory used yet to really get us out of the weeds.
>> If it's
>> >>  still accumulating quickly, try to wait until the OSD is up to
>> 4+GB RSS if
>> >>  you can.  I usually kill the valgrind/osd process with SIGTERM to
>> make sure
>> >>  the output is preserved.  Not sure what will happen with OOM
>> killer as I
>> >>  haven't let it get that far before killing.
>> >> 
>> >>  Mark
>> >> 
>> >>  On 11/08/2016 10:37 AM, bobobo1...@gmail.com wrote:
>> >> >
>> >> > Unfortunately I don't think overnight is possible. The OOM will
>> kill it
>> >> > in hours, if not minutes. Will the output be preserved/usable if
>> the
>> >> > process is uncleanly terminated?
>> >> >
>> >> >
>> >> > On 8 Nov 2016 08:33, "Mark Nelson" > >> > > wrote:
>> >> >
>> >> > Heya,
>> >> >
>> >> > Sorry got distracted with other stuff yesterday.  Any chance
>> you
>> >> > could run this for longer?  It's tough to tell what's going
>> on from
>> >> > this run unfortunately.  Maybe overnight if possible.
>> >> >
>> >> > Thanks!
>> >> > Mark
>> >> >
>> >> >
>> >> >
>> >> > On 11/08/2016 01:10 AM, bobobo1...@gmail.com
>> >> >  wrote:
>> >> >
>> >> > Just bumping this and CCing directly since I foolishly
>> broke the
>> >> > threading on my reply.
>> >> >
>> >> >
>> >> > On 4 Nov. 2016 8:40 pm, "bobobo1...@gmail.com
>> >> > 
>> >> >  bobobo1...@gmail.com>>"
>> >> > 
>> >> >
>> >> >  bobobo1...@gmail.com>>>
>> >> > wrote:
>> >> >
>> >> > > Then you can view the output data with ms_print or
>> with
>> >> > massif-visualizer.  This may help narrow down where
>> in the
>> >> > code we
>> >> > are using the memory.
>> >> >
>> >> > Done! I've dumped the output from ms_print here:
>> >> > http://ix.io/1CrS
>> >> >
>> >> > It seems most of the memory comes from here:
>> >> >
>> >> > 92.78% (998,248,799B) (heap allocation functions)
>> >> > malloc/new/new[],
>> >> > --alloc-fns, etc.
>> >> > ->46.63% (501,656,678B) 0xD38936:
>> >> > ceph::buffer::create_aligned(unsigned int, unsigned
>> int) (in
>> >> > /usr/bin/ceph-osd)
>> >> > | ->45.07% (484,867,174B) 0xDAFED9:
>> >> > AsyncConnection::process() (in
>> >> > /usr/bin/ceph-osd)
>> >> > | | ->45.07% (484,867,174B) 0xC410EB:
>> >> >

Re: [ceph-users] tgt+librbd error 4

2016-12-17 Thread Jake Young
I don't have the specific crash info, but I have seen crashes with tgt when
the ceph cluster was slow to respond to IO.

It was things like this that pushed me to using another iSCSI to Ceph
solution (FreeNAS running in KVM Linux hypervisor).

Jake

On Fri, Dec 16, 2016 at 9:16 PM ZHONG  wrote:

> Hi All,
>
> I'm using tgt(1.0.55) + librbd(H 0.94.5) for iSCSI service。Recently
> encountered problems, TGT in the absence of pressure crush, exception
> information is as follows:“kernel: tgtd[52067]: segfault at 0 ip
> 7f424cb0d76a sp 7f4228fe0b90 error 4 in
> librbd.so.1.0.0[7f424c9b9000+54b000]”。Has anyone encountered similar
> problems? Thank you!
> ___
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph and rsync

2016-12-17 Thread Mart van Santen


Hello,

The way Wido explained is the correct way. I won't deny, however, last
year we had problems with our SSD disks and they did not perform well.
So we decided to replace all disks. As the replacement done by Ceph
caused highload/downtime on the clients (which was the reason we wanted
to replace the disks), we did this the rsync way. We did not encounter
any problem with that.

It is very important to flush the journal before syncing and correct the
journal symlinks before starting the new disk. Also make sure you disarm
the old disk, as it has the same disk ID, you will run in a lot of
problems if you reenable that disk by accident. So yes, it is possible,
it is very dangerous, and it is not recommended,

Attached the script we used to assist with the migration. (we were on
hammer back then) I'm not sure it is the latest version we have, it
formats a disk with ceph-disk prepare command, mount it the 'ceph' way,
and then print a series of commands to manually execute. And again a big
warning, use at your own risk.


regards,


mart







On 12/16/2016 09:46 PM, Brian :: wrote:
> The fact that you are all SSD I would do exactly what Wido said -
> gracefully remove the OSD and gracefully bring up the OSD on the new
> SSD.
>
> Let Ceph do what its designed to do. The rsync idea looks great on
> paper - not sure what issues you will run into in practise.
>
>
> On Fri, Dec 16, 2016 at 12:38 PM, Alessandro Brega
>  wrote:
>> 2016-12-16 10:19 GMT+01:00 Wido den Hollander :
>>>
 Op 16 december 2016 om 9:49 schreef Alessandro Brega
 :


 2016-12-16 9:33 GMT+01:00 Wido den Hollander :

>> Op 16 december 2016 om 9:26 schreef Alessandro Brega <
> alessandro.bre...@gmail.com>:
>>
>> Hi guys,
>>
>> I'm running a ceph cluster using 0.94.9-1trusty release on XFS for
>> RBD
>> only. I'd like to replace some SSDs because they are close to their
>> TBW.
>>
>> I know I can simply shutdown the OSD, replace the SSD, restart the
>> OSD
> and
>> ceph will take care of the rest. However I don't want to do it this
>> way,
>> because it leaves my cluster for the time of the rebalance/
>> backfilling
> in
>> a degraded state.
>>
>> I'm thinking about this process:
>> 1. keep old OSD running
>> 2. copy all data from current OSD folder to new OSD folder (using
>> rsync)
>> 3. shutdown old OSD
>> 4. redo step 3 to update to the latest changes
>> 5. restart OSD with new folder
>>
>> Are there any issues with this approach? Do I need any special rsync
> flags
>> (rsync -avPHAX --delete-during)?
>>
> Indeed X for transferring xattrs, but also make sure that the
> partitions
> are GPT with the proper GUIDs.
>
> I would never go for this approach in a running setup. Since it's a
> SSD
> cluster I wouldn't worry about the rebalance and just have Ceph do the
> work
> for you.
>
>
 Why not - if it's completely safe. It's much faster (local copy),
 doesn't
 put load on the network (local copy), much safer (2-3 minutes instead of
 1-2 hours degraded time (2TB SSD)), and it's really simple (2 rsync
 commands). Thank you.

>>> I wouldn't say it is completely safe, hence my remark. If you copy, indeed
>>> make sure you copy all the xattrs, but also make sure the partitions tables
>>> match.
>>>
>>> That way it should work, but it's not a 100% guarantee.
>>>
>> Ok, thanks!  Can a ceph dev confirm? I do not want to loose any data ;)
>>
>> Alessandro
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

#!/usr/bin/env python


import argparse
import os
import stat
import sys
import re
from subprocess import call


 WARNING
 THIS IS A VERY DANGEROUS SCRIPT. NO GUARANTEES THIS WILL WORK FO YOU


print "Please read and understand the script before executing"
sys.exit(1)



if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument("-d", "--destination",  type=str, required=1, help="destination disk")
	parser.add_argument("-s", "--source",  type=str, required=1, help="source ods number")
	parser.add_argument("--force", help="force migration", action="store_true")


	force = False
	parted = False
	args = parser.parse_args()
	
	if args.force:
		force = True


	osd = args.source
	disk_id = args.destination
	disk = '/dev/' + disk_id
	# First we gonna check if provided disk is indeed a block device and has an empty 
# parition table

	
	print 'Examining disk: %s' %(disk)

	# Does the device exists:
	if not os.path.exists(disk):