Re: [ceph-users] pgs stuck inactive

2017-03-15 Thread Laszlo Budai


[root@storage2 ~]# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args 
ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /usr/bin/ceph-objectstore-tool...Reading symbols from 
/usr/lib/debug/usr/bin/ceph-objectstore-tool.debug...done.
done.
Starting program: /usr/bin/ceph-objectstore-tool import-rados volumes 
pg.3.367.export.OSD.35
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
open: No such file or directory
[Inferior 1 (process 23735) exited with code 01]
[root@storage2 ~]#




Just checked:
[root@storage2 lib64]# ls -l /lib64/libthread_db*
-rwxr-xr-x. 1 root root 38352 May 12  2016 /lib64/libthread_db-1.0.so
lrwxrwxrwx. 1 root root19 Jun  7  2016 /lib64/libthread_db.so.1 -> 
libthread_db-1.0.so
[root@storage2 lib64]#


Kind regards,
Laszlo


On 16.03.2017 05:26, Brad Hubbard wrote:

Can you install the debuginfo for ceph (how this works depends on your
distro) and run the following?

# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool
import-rados volumes pg.3.367.export.OSD.35

On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai  wrote:

Hello,


the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
command crashes.

~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
*** Caught signal (Segmentation fault) **
 in thread 7f85b60e28c0
 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xaeeaba]
 2: (()+0x10330) [0x7f85b4dca330]
 3: (()+0xa2324) [0x7f85b1cd7324]
 4: (()+0x7d23e) [0x7f85b1cb223e]
 5: (()+0x7d478) [0x7f85b1cb2478]
 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
[0x7f85b1c8a0e5]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
 9: (main()+0x1294) [0x651134]
 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
 11: ceph-objectstore-tool() [0x66f8b7]
2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal (Segmentation
fault) **
 in thread 7f85b60e28c0

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xaeeaba]
 2: (()+0x10330) [0x7f85b4dca330]
 3: (()+0xa2324) [0x7f85b1cd7324]
 4: (()+0x7d23e) [0x7f85b1cb223e]
 5: (()+0x7d478) [0x7f85b1cb2478]
 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
[0x7f85b1c8a0e5]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
 9: (main()+0x1294) [0x651134]
 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
 11: ceph-objectstore-tool() [0x66f8b7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

--- begin dump of recent events ---
   -14> 2017-03-15 14:57:05.557743 7f85b60e28c0  5 asok(0x5632000)
register_command perfcounters_dump hook 0x55e6130
   -13> 2017-03-15 14:57:05.557807 7f85b60e28c0  5 asok(0x5632000)
register_command 1 hook 0x55e6130
   -12> 2017-03-15 14:57:05.557818 7f85b60e28c0  5 asok(0x5632000)
register_command perf dump hook 0x55e6130
   -11> 2017-03-15 14:57:05.557828 7f85b60e28c0  5 asok(0x5632000)
register_command perfcounters_schema hook 0x55e6130
   -10> 2017-03-15 14:57:05.557836 7f85b60e28c0  5 asok(0x5632000)
register_command 2 hook 0x55e6130
-9> 2017-03-15 14:57:05.557841 7f85b60e28c0  5 asok(0x5632000)
register_command perf schema hook 0x55e6130
-8> 2017-03-15 14:57:05.557851 7f85b60e28c0  5 asok(0x5632000)
register_command perf reset hook 0x55e6130
-7> 2017-03-15 14:57:05.557855 7f85b60e28c0  5 asok(0x5632000)
register_command config show hook 0x55e6130
-6> 2017-03-15 14:57:05.557864 7f85b60e28c0  5 asok(0x5632000)
register_command config set hook 0x55e6130
-5> 2017-03-15 14:57:05.557868 7f85b60e28c0  5 asok(0x5632000)
register_command config get hook 0x55e6130
-4> 2017-03-15 14:57:05.557877 7f85b60e28c0  5 asok(0x5632000)
register_command config diff hook 0x55e6130
-3> 2017-03-15 14:57:05.557880 7f85b60e28c0  5 asok(0x5632000)
register_command log flush hook 0x55e6130
-2> 2017-03-15 14:57:05.557888 7f85b60e28c0  5 asok(0x5632000)
register_command log dump hook 0x55e6130
-1> 2017-03-15 14:57:05.557892 7f85b60e28c0  5 asok(0x5632000)
register_command log reopen hook 0x55e6130
 0> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f85b60e28c0

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: 

Re: [ceph-users] mkjournal error creating journal ... : (13) Permission denied

2017-03-15 Thread Gunwoo Gim
 Thank you so much Peter. The 'udevadm trigger' after 'partprobe' triggered
the udev rules and I've found out that even before the udev ruleset
triggers the owner is already ceph:ceph.

 I've dug into ceph-disk a little more and found out that there is a
symbolic link of /dev/disk/by-partuuid/120c536d-cb30-4cea-b607-dd347022a497
at [/dev/mapper/vg--hdd1-lv--hdd1p1(the_filestore_osd)]/journal and the
source doesn't exist. though it exists in /dev/disk/by-parttypeuuid which
has been populated by /lib/udev/rules.d/60-ceph-by-parttypeuuid.rules

 So I added this in /lib/udev/rules.d/60-ceph-by-parttypeuuid.rules:
# when ceph-disk prepares a filestore osd it makes a symbolic link by
disk/by-partuuid but LVM2 doesn't seem to populate /dev/disk/by-partuuid.
ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_TYPE}=="?*",
ENV{ID_PART_ENTRY_UUID}=="?*",
SYMLINK+="disk/by-partuuid/$env{ID_PART_ENTRY_UUID}"
 And finally got the osds all up and in. :D

 Yeah, It wasn't actually a permission problem, but the link just wasn't
existing.


~ # ceph-disk -v activate /dev/mapper/vg--hdd1-lv--hdd1p1
...
mount: Mounting /dev/mapper/vg--hdd1-lv--hdd1p1 on
/var/lib/ceph/tmp/mnt.ECAifr with options noatime,largeio,inode64,swalloc
command_check_call: Running command: /bin/mount -t xfs -o
noatime,largeio,inode64,swalloc -- /dev/mapper/vg--hdd1-lv--hdd1p1
/var/lib/ceph/tmp/mnt.ECAifr
mount: DIGGIN ls -al /var/lib/ceph/tmp/mnt.ECAifr
mount: DIGGIN total 36
drwxr-xr-x 3 ceph ceph  174 Mar 14 11:51 .
drwxr-xr-x 6 ceph ceph 4096 Mar 16 11:30 ..
-rw-r--r-- 1 root root  202 Mar 16 11:19 activate.monmap
-rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 ceph_fsid
drwxr-xr-x 3 ceph ceph   39 Mar 14 11:51 current
-rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 fsid
lrwxrwxrwx 1 ceph ceph   58 Mar 14 11:45 journal ->
/dev/disk/by-partuuid/120c536d-cb30-4cea-b607-dd347022a497
-rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 journal_uuid
-rw-r--r-- 1 ceph ceph   21 Mar 14 11:45 magic
-rw-r--r-- 1 ceph ceph4 Mar 14 11:51 store_version
-rw-r--r-- 1 ceph ceph   53 Mar 14 11:51 superblock
-rw-r--r-- 1 ceph ceph2 Mar 14 11:51 whoami
...
ceph_disk.main.Error: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs',
'--mkkey', '-i', u'0', '--monmap',
'/var/lib/ceph/tmp/mnt.ECAifr/activate.monmap', '-
-osd-data', '/var/lib/ceph/tmp/mnt.ECAifr', '--osd-journal',
'/var/lib/ceph/tmp/mnt.ECAifr/journal', '--osd-uuid',
u'377c336b-278d-4caf-b2f5-592ac72cd9b6', '-
-keyring', '/var/lib/ceph/tmp/mnt.ECAifr/keyring', '--setuser', 'ceph',
'--setgroup', 'ceph'] failed : 2017-03-16 11:30:05.238725 7f918fbc0a40 -1
filestore(/v
ar/lib/ceph/tmp/mnt.ECAifr) mkjournal error creating journal on
/var/lib/ceph/tmp/mnt.ECAifr/journal: (13) Permission denied
2017-03-16 11:30:05.238756 7f918fbc0a40 -1 OSD::mkfs: ObjectStore::mkfs
failed with error -13
2017-03-16 11:30:05.238833 7f918fbc0a40 -1  ** ERROR: error creating empty
object store in /var/lib/ceph/tmp/mnt.ECAifr: (13) Permission denied


~ # blkid /dev/mapper/vg--*lv-*p* | grep
'120c536d-cb30-4cea-b607-dd347022a497'
/dev/mapper/vg--ssd1-lv--ssd1p1: PARTLABEL="ceph journal"
PARTUUID="120c536d-cb30-4cea-b607-dd347022a497"
~ # ls -al /dev/disk/by-id | grep dm-22
lrwxrwxrwx 1 root root   11 Mar 16 11:37 dm-name-vg--ssd1-lv--ssd1p1 ->
../../dm-22
lrwxrwxrwx 1 root root   11 Mar 16 11:37
dm-uuid-part1-LVM-n1SH1FvtfjgxJOMWN9aHurFvn2BpIsLZi89GWxA68hLmUQV6l5oyiEOPsFciRbKg
-> ../../dm-22
~ # ls -al /dev/disk/by-parttypeuuid | grep dm-22
lrwxrwxrwx 1 root root  11 Mar 16 11:37
45b0969e-9b03-4f30-b4c6-b4b80ceff106.120c536d-cb30-4cea-b607-dd347022a497
-> ../../dm-22
~ # ls -al /dev/disk/by-uuid | grep dm-22
~ # ls -al /dev/disk/by-partuuid/ | grep dm-22
~ # ls -al /dev/disk/by-path | grep dm-22


Best Regards,
Nicholas Gim.

On Wed, Mar 15, 2017 at 6:46 PM Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

On 03/15/17 08:43, Gunwoo Gim wrote:

 After a reboot, all the partitions of LVM don't show up in /dev/mapper
-nor in the /dev/dm- or /proc/partitions- though the whole disks
show up; I have to make the hosts run one 'partprobe' every time they boot
so as to have the partitions all show up.

Maybe you need this after partprobe:

udevadm trigger



 I've found out that the udev rules have never triggered even when I
removed the DEVTYPE checking part; checked with a udev
line: RUN+="/bin/echo 'add /dev/$name' >> /root/log.txt"
 I've also tried chowning all the /dev/dm- to ceph:disk in vain. Do I
have to use the udev rules even if the /dev/dm- s are already owned by
ceph:ceph?

No, I think you just need them owned by ceph:ceph. Test that with something
like:

sudo -u ceph hexdump -C /dev/dm-${number} | head

(which reads, not writes...so not a full test, but close enough)

And also make sure the files in /var/lib/ceph/{osd,mon,...} are owned by
ceph:ceph too. Maybe you have a mix of root and ceph, which is easy to
cause by running it as root when ceph owns some files.


And FYI, I don't like udev, and did not use ceph-deploy or 

Re: [ceph-users] pgs stuck inactive

2017-03-15 Thread Brad Hubbard
Can you install the debuginfo for ceph (how this works depends on your
distro) and run the following?

# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool
import-rados volumes pg.3.367.export.OSD.35

On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai  wrote:
> Hello,
>
>
> the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
> command crashes.
>
> ~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
> *** Caught signal (Segmentation fault) **
>  in thread 7f85b60e28c0
>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>  1: ceph-objectstore-tool() [0xaeeaba]
>  2: (()+0x10330) [0x7f85b4dca330]
>  3: (()+0xa2324) [0x7f85b1cd7324]
>  4: (()+0x7d23e) [0x7f85b1cb223e]
>  5: (()+0x7d478) [0x7f85b1cb2478]
>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
> [0x7f85b1c8a0e5]
>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>  9: (main()+0x1294) [0x651134]
>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>  11: ceph-objectstore-tool() [0x66f8b7]
> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f85b60e28c0
>
>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>  1: ceph-objectstore-tool() [0xaeeaba]
>  2: (()+0x10330) [0x7f85b4dca330]
>  3: (()+0xa2324) [0x7f85b1cd7324]
>  4: (()+0x7d23e) [0x7f85b1cb223e]
>  5: (()+0x7d478) [0x7f85b1cb2478]
>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
> [0x7f85b1c8a0e5]
>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>  9: (main()+0x1294) [0x651134]
>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>  11: ceph-objectstore-tool() [0x66f8b7]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
>
> --- begin dump of recent events ---
>-14> 2017-03-15 14:57:05.557743 7f85b60e28c0  5 asok(0x5632000)
> register_command perfcounters_dump hook 0x55e6130
>-13> 2017-03-15 14:57:05.557807 7f85b60e28c0  5 asok(0x5632000)
> register_command 1 hook 0x55e6130
>-12> 2017-03-15 14:57:05.557818 7f85b60e28c0  5 asok(0x5632000)
> register_command perf dump hook 0x55e6130
>-11> 2017-03-15 14:57:05.557828 7f85b60e28c0  5 asok(0x5632000)
> register_command perfcounters_schema hook 0x55e6130
>-10> 2017-03-15 14:57:05.557836 7f85b60e28c0  5 asok(0x5632000)
> register_command 2 hook 0x55e6130
> -9> 2017-03-15 14:57:05.557841 7f85b60e28c0  5 asok(0x5632000)
> register_command perf schema hook 0x55e6130
> -8> 2017-03-15 14:57:05.557851 7f85b60e28c0  5 asok(0x5632000)
> register_command perf reset hook 0x55e6130
> -7> 2017-03-15 14:57:05.557855 7f85b60e28c0  5 asok(0x5632000)
> register_command config show hook 0x55e6130
> -6> 2017-03-15 14:57:05.557864 7f85b60e28c0  5 asok(0x5632000)
> register_command config set hook 0x55e6130
> -5> 2017-03-15 14:57:05.557868 7f85b60e28c0  5 asok(0x5632000)
> register_command config get hook 0x55e6130
> -4> 2017-03-15 14:57:05.557877 7f85b60e28c0  5 asok(0x5632000)
> register_command config diff hook 0x55e6130
> -3> 2017-03-15 14:57:05.557880 7f85b60e28c0  5 asok(0x5632000)
> register_command log flush hook 0x55e6130
> -2> 2017-03-15 14:57:05.557888 7f85b60e28c0  5 asok(0x5632000)
> register_command log dump hook 0x55e6130
> -1> 2017-03-15 14:57:05.557892 7f85b60e28c0  5 asok(0x5632000)
> register_command log reopen hook 0x55e6130
>  0> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f85b60e28c0
>
>  ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>  1: ceph-objectstore-tool() [0xaeeaba]
>  2: (()+0x10330) [0x7f85b4dca330]
>  3: (()+0xa2324) [0x7f85b1cd7324]
>  4: (()+0x7d23e) [0x7f85b1cb223e]
>  5: (()+0x7d478) [0x7f85b1cb2478]
>  6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
>  7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15)
> [0x7f85b1c8a0e5]
>  8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
>  9: (main()+0x1294) [0x651134]
>  10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
>  11: ceph-objectstore-tool() [0x66f8b7]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 rbd_replay
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>0/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 keyvaluestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 5 

Re: [ceph-users] Ceph Cluster Failures

2017-03-15 Thread Robin H. Johnson
On Thu, Mar 16, 2017 at 02:22:08AM +, Rich Rocque wrote:
> Has anyone else run into this or have any suggestions on how to remedy it?
We need a LOT more info.

> After a couple months of almost no issues, our Ceph cluster has
> started to have frequent failures. Just this week it's failed about
> three times.
>
> The issue appears to be than an MDS or Monitor will fail and then all
> clients hang. After that, all clients need to be forcibly restarted.
- Can you define monitor 'failing' in this case? 
- What do the logs contain? 
- Is it running out of memory?
- Can you turn up the debug level?
- Has your cluster experienced continual growth and now might be
  undersized in some regard?

> The architecture for our setup is:
Are these virtual machines? The overall specs seem rather like VM
instances rather than hardware.

> 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers
What sort of SSD are the monitor datastores on? ('mon data' in the
config)

> 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers
12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
How many OSD servers, what SSDs?

What is the network setup & connectivity between them (hopefully
10Gbit).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Moving data from EC pool with replicated cache tier to replicated pool

2017-03-15 Thread Alex Gorbachev
On Tue, Mar 14, 2017 at 6:20 AM pwoszuk  wrote:

> Hi all
>
> I need a help with operation of moving all data from one pool to another.
>
> pool1: ECpool with replicated cache tier pool (name it: pool1a)
>
> pool2: replicated pool
>
> need to move data from pool1 -> pool2
>
> any help/procedures would be helpful


This should help:

https://ceph.com/geen-categorie/ceph-pool-migration/

Regards,
Alex

>
>
> Kind regards
>
> Paul
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Cluster Failures

2017-03-15 Thread Rich Rocque
Hi,


After a couple months of almost no issues, our Ceph cluster has started to have 
frequent failures. Just this week it's failed about three times.


The issue appears to be than an MDS or Monitor will fail and then all clients 
hang. After that, all clients need to be forcibly restarted.


Has anyone else run into this or have any suggestions on how to remedy it?


The architecture for our setup is:

3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers

12 ea OSDs (ssd), on 1cpu, 1GB RAM servers


Ceph v10.2.5

Clients connect via CephFS Kernel driver.


I'd also like to note I'm relatively new to Ceph and I'm here on behalf of the 
person who set the cluster up, so any information is appreciated.


Thank you for your time,

Rich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] total storage size available in my CEPH setup?

2017-03-15 Thread Christian Balzer

Hello,

On Wed, 15 Mar 2017 21:36:00 + James Okken wrote:

> Thanks gentlemen,
> 
> I hope to add more OSD since we will need a good deal more than 2.3TB and I 
> fo want to leave free space / margins.
> 
> I am also thinking of reducing the replication to2 .
>  I am sure I can google how to do that. But I am sure most of my results are 
> going to be people telling me not to do it.

Mostly for good reasons, but that is quite diminished in your RAID'ed OSDs.

> Can you direct me to a good tutorial on how to do so.
> 
No such thing, but you already must have changed your configuration, as
your pools are min_size 1, which is not the default.
Changing them to size=2 should do the trick.

Christian
> 
> And, youre are right, I am a beginner.
> 
> James Okken
> Lab Manager
> Dialogic Research Inc.
> 4 Gatehall Drive
> Parsippany
> NJ 07054
> USA
> 
> Tel:   973 967 5179
> Email:   james.ok...@dialogic.com
> Web:    www.dialogic.com – The Network Fuel Company
> 
> This e-mail is intended only for the named recipient(s) and may contain 
> information that is privileged, confidential and/or exempt from disclosure 
> under applicable law. No waiver of privilege, confidence or otherwise is 
> intended by virtue of communication via the internet. Any unauthorized use, 
> dissemination or copying is strictly prohibited. If you have received this 
> e-mail in error, or are not named as a recipient, please immediately notify 
> the sender and destroy all copies of this e-mail.
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Maxime Guyot
> Sent: Tuesday, March 14, 2017 7:29 AM
> To: Christian Balzer; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] total storage size available in my CEPH setup?
> 
> Hi,
> 
> >> My question is how much total CEPH storage does this allow me? Only 2.3TB? 
> >> or does the way CEPH duplicates data enable more than 1/3 of the storage?  
> > 3 means 3, so 2.3TB. Note that Ceph is spare, so that can help quite a bit. 
> >  
> 
> To expand on this, you probably want to keep some margins and not run at your 
> cluster 100% :) (especially if you are running RBD with thin provisioning). 
> By default, “ceph status” will issue a warning at 85% full (osd nearfull 
> ratio). You should also consider that you need some free space for auto 
> healing to work (if you plan to use more than 3 OSDs on a size=3 pool).
> 
> Cheers,
> Maxime 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Odd latency numbers

2017-03-15 Thread Christian Balzer

Hello,

On Wed, 15 Mar 2017 16:49:00 + Rhian Resnick wrote:

> Morning all,
> 
> 
> We starting to apply load to our test cephfs system and are noticing some odd 
> latency numbers. We are using erasure coding for the cold data pools and 
> replication for our our cache tiers (not on ssd yet) . We noticed the 
> following high latency on one node and it seams to be slowing down writes and 
> reads on the cluster.
> 
The pg dump below was massive overkill at this point in time, whereas a
"ceph osd tree" would have probably shown us the topology (where is your
tier, where your EC pool(s)?).
Same for a "ceph osd pool ls detail".

So if we were to assume that node is you cache tier (replica 1?), then the
latencies would make sense. 
But that's guesswork, so describe your cluster in more detail.

And yes, a single slow OSD (stealthily failing drive, etc) can bring a
cluster to its knees. 
This is why many people here tend to get every last bit of info with
collectd and feed it into carbon and graphite/grafana, etc.
This will immediately indicate culprits and allow you to correlate this
with other data, like actual disk/network/cpu load, etc.

For the time being run atop on that node and see if you can reduce the
issue to something like "all disk are busy all the time" or "CPU meltdown".

> 
> Our next step is break out mds, mgr, and mons to different machines but we 
> wanted to start the discussion here.
>

If your nodes (not a single iota of HW/NW info from you) are powerful
enough, breaking out stuff isn't likely to help or a necessity. 
 
More below.

> 
> Here is a bunch of information you may find useful.
> 
> 
> ceph.conf
> 
> [global]
> fsid = X
> mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3
> mon_host = 10.141.167.238,10.141.160.251,10.141.161.249
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> 
> cluster network = 10.85.8.0/22
> public network = 10.141.0.0/16
> 
> # we tested this with bluestore and xfs and have the same results
> [osd]
> enable_experimental_unrecoverable_data_corrupting_features = bluestore
> 
I suppose this is not production in any shape or form.

> Status
> 
> cluster 8f6ba9d6-314d-4725-bcfa-340e500697f0
>  health HEALTH_OK
>  monmap e2: 3 mons at 
> {ceph-mon1=10.141.167.238:6789/0,ceph-mon2=10.141.160.251:6789/0,ceph-mon3=10.141.161.249:6789/0}
> election epoch 12, quorum 0,1,2 ceph-mon2,ceph-mon3,ceph-mon1
>   fsmap e30: 1/1/1 up {0=ceph-mon3=up:active}, 2 up:standby
> mgr active: ceph-mon3 standbys: ceph-mon2, ceph-mon1
>  osdmap e100: 12 osds: 12 up, 12 in
> flags sortbitwise,require_jewel_osds,require_kraken_osds
>   pgmap v119525: 124 pgs, 6 pools, 471 GB data, 1141 kobjects
> 970 GB used, 2231 GB / 3202 GB avail
>  124 active+clean
>   client io 11962 B/s rd, 11 op/s rd, 0 op/s wr
> 
At first glance there seem to be way too little PGs here, even given the
low number of OSDs.

> 
> Pool space usage
> 
Irrelevant.
> GLOBAL:
> SIZE  AVAIL RAW USED %RAW USED
> 3202G 2231G 970G 30.31
> POOLS:
> NAMEID USED   %USED MAX AVAIL OBJECTS
> rbd 0   0 0  580G   0
> cephfs-hot  1  76137M 11.35  580G  466451
> cephfs-cold 2397G 25.48 1161G  650158
> cephfs_metadata 3  47237k 0  580G   52275
> one-hot 4   0 0  580G   0
> one 5   0 0 1161G   0
> 
An aside, how happy are you with OpenNebula and Ceph?
I found that the lack of a migration network option in ON is a show
stopper for us. 

> 
> OSD Performance and Latency
> 
> osd commit_latency(ms) apply_latency(ms)
>   9  1 1
>   8  1 1
>   0 1313
>  11  1 1
>   1 3838
>  10  2 2
>   2 2121
>   3  2 2
>   4 2020
>   5  1 1
>   6  1 1
>   7  1 1
> 
I found these counters to be less than reliable or at least relevant
unless there is constant activity and they are read frequently as well.

For example on a cluster where the HDD based OSDs are basically dormant
write wise most of the time while the SSD based cache tier is very busy:
---
 1676  124 
 1766   99 
 18 01 
 19 00 
---

The first two are HDD OSD, the 2nd two are SSD OSDs in the cache tier.

Re: [ceph-users] Directly addressing files on individual OSD

2017-03-15 Thread Anthony D'Atri
As I parse Youssef’s message, I believe there are some misconceptions.  It 
might help if you could give a bit more info on what your existing ‘cluster’ is 
running.  NFS? CIFS/SMB?  Something else?

1) Ceph regularly runs scrubs to ensure that all copies of data are consistent. 
 The checksumming that you describe would be both infeasible and redundant.

2) It sounds as though your current back-end stores user files as-is and is 
either a traditional file server setup or perhaps a virtual filesystem 
aggregating multiple filesystems.  Ceph is not a file storage solution in this 
sense.  The below sounds as though you want user files to not be sharded across 
multiple servers.  This is antithetical to how Ceph works and is counter to 
data durability and availability, unless there is some replication that you 
haven’t described.  Reference this diagram:

http://docs.ceph.com/docs/master/_images/stack.png

Beneath the hood Ceph operates internally on ‘objects’ that are not exposed to 
clients as such. There are several different client interfaces that are built 
on top of this block service:

- RBD volumes — think in terms of a virtual disk drive attached to a VM
- RGW — like Amazon S3 or Swift
- CephFS — provides a mountable filesystem interface, somewhat like NFS or even 
SMB but with important distictions in behavior and use-case

I had not heard of iRODS before but just looked it up.  It is a very different 
thing than any of the common interfaces to Ceph.

If your users need to mount the storage as a share / volume, in the sense of 
SMB or NFS, then Ceph may not be your best option.  If they can cope with an S3 
/ Swift type REST object interface, a cluster with RGW interfaces might do the 
job, or perhaps Swift or Gluster.   It’s hard to say for sure based on 
assumptions of what you need.

— Anthony


> We currently run a commodity cluster that supports a few petabytes of data. 
> Each node in the cluster has 4 drives, currently mounted as /0 through /3. We 
> have been researching alternatives for managing the storage, Ceph being one 
> possibility, iRODS being another. For preservation purposes, we would like 
> each file to exist as one whole piece per drive (as opposed to being striped 
> across multiple drives). It appears this is the default in Ceph.
> 
> Now, it has always been convenient for us to run distributed jobs over SSH 
> to, for instance, compile a list of checksums of all files in the cluster:
> 
> dsh -Mca 'find /{0..3}/items -name \*.warc.gz | xargs md5sum 
> >/tmp/$HOSTNAME.md5sum'
> 
> And that nicely allows each node to process its own files using the local CPU.
> 
> Would this scenario still be possible where Ceph is managing the storage?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] total storage size available in my CEPH setup?

2017-03-15 Thread James Okken
Thanks gentlemen,

I hope to add more OSD since we will need a good deal more than 2.3TB and I fo 
want to leave free space / margins.

I am also thinking of reducing the replication to2 .
 I am sure I can google how to do that. But I am sure most of my results are 
going to be people telling me not to do it.
Can you direct me to a good tutorial on how to do so.


And, youre are right, I am a beginner.

James Okken
Lab Manager
Dialogic Research Inc.
4 Gatehall Drive
Parsippany
NJ 07054
USA

Tel:   973 967 5179
Email:   james.ok...@dialogic.com
Web:    www.dialogic.com – The Network Fuel Company

This e-mail is intended only for the named recipient(s) and may contain 
information that is privileged, confidential and/or exempt from disclosure 
under applicable law. No waiver of privilege, confidence or otherwise is 
intended by virtue of communication via the internet. Any unauthorized use, 
dissemination or copying is strictly prohibited. If you have received this 
e-mail in error, or are not named as a recipient, please immediately notify the 
sender and destroy all copies of this e-mail.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Maxime 
Guyot
Sent: Tuesday, March 14, 2017 7:29 AM
To: Christian Balzer; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] total storage size available in my CEPH setup?

Hi,

>> My question is how much total CEPH storage does this allow me? Only 2.3TB? 
>> or does the way CEPH duplicates data enable more than 1/3 of the storage?
> 3 means 3, so 2.3TB. Note that Ceph is spare, so that can help quite a bit.

To expand on this, you probably want to keep some margins and not run at your 
cluster 100% :) (especially if you are running RBD with thin provisioning). By 
default, “ceph status” will issue a warning at 85% full (osd nearfull ratio). 
You should also consider that you need some free space for auto healing to work 
(if you plan to use more than 3 OSDs on a size=3 pool).

Cheers,
Maxime 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Tech Talk next Thurs

2017-03-15 Thread Patrick McGarry
Hey cephers,

Just a reminder that we'll be having our monthly Ceph Tech Talk next
Thursday at 1p EST.

http://ceph.com/ceph-tech-talks/

Chris Holcombe from Canonical will be talking about some of the work
they have been doing to streamline Ceph Deployment as well as a
walkthrough on writing applications from scratch with language
bindings (using Rust as an example).

I hope you can all make it for what should be a great interactive session!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating Ceph Pools on different OSD's -- crushmap ?

2017-03-15 Thread Deepak Naidu
Ok, I found this tutorial on crushmap from han. Hopefully I should get my 
structure accomplished using crushmap.

https://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

--
Deepak

From: Deepak Naidu
Sent: Wednesday, March 15, 2017 12:45 PM
To: ceph-users
Subject: Creating Ceph Pools on different OSD's -- crushmap ?

Hello,

I am trying to address the failure domain & performance/isolation of pools 
based on what OSD they can belong to. Let me give example. Can I achieve this 
with crurshmap ruleset or any other method, if so how?

Example:
10x storage servers each have 3x OSD ie OSD.0 through OSD29 -- Belong to 
Pool0 - This can be replicated pool or ecpool

Similarly,

10x storage servers each have 5x OSD ie OSD.30 through OSD79 -- Belong to 
Pool1 - This can be replicated pool or ecpool


Thanks for any info.

--
Deepak






---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating new Pools - PG's

2017-03-15 Thread Mike Jacobacci
Hi David,

Thank you for your response!

I was thinking that I may use Ceph to back other projects outside of our
infrastructure, so I calculated 75% VM and 25% other usage when I created
the pool.

Cheers,
Mike

On Wed, Mar 15, 2017 at 12:57 PM, David Turner 
wrote:

> Especially if you are planning to remove the old pool after you migrate,
> you shouldn't have any problems with this plan.  If you were going to leave
> both running indefinitely, then I'd recommend calculating out how many PGs
> you should add based on how many OSDs you have.  Based on your numbers, you
> might have too few PGs in your cluster, but adding 1024 temporarily should
> not be an issue.
>
> On Wed, Mar 15, 2017 at 1:26 PM Mike Jacobacci  wrote:
>
>> I have a simple question I hope.
>>
>> I am moving our VM infrastructure from Xenserver to Proxmox. I only have
>> one pool that has several RBD images that are mounted in Xen, that pool has
>> 1024 PG's (30OSD's/3x replication).
>>
>> Question: I would like to create a new pool for Proxmox so I can have a
>> clean slate going forward... Would it be bad to create a new pool with the
>> same PG/Replication if I am going to remove the old pool after I have
>> migrated the VM's over?  Right now I am only using about 2TB out of 109TB.
>>
>> Cheers,
>> Mike
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating new Pools - PG's

2017-03-15 Thread David Turner
Especially if you are planning to remove the old pool after you migrate,
you shouldn't have any problems with this plan.  If you were going to leave
both running indefinitely, then I'd recommend calculating out how many PGs
you should add based on how many OSDs you have.  Based on your numbers, you
might have too few PGs in your cluster, but adding 1024 temporarily should
not be an issue.

On Wed, Mar 15, 2017 at 1:26 PM Mike Jacobacci  wrote:

> I have a simple question I hope.
>
> I am moving our VM infrastructure from Xenserver to Proxmox. I only have
> one pool that has several RBD images that are mounted in Xen, that pool has
> 1024 PG's (30OSD's/3x replication).
>
> Question: I would like to create a new pool for Proxmox so I can have a
> clean slate going forward... Would it be bad to create a new pool with the
> same PG/Replication if I am going to remove the old pool after I have
> migrated the VM's over?  Right now I am only using about 2TB out of 109TB.
>
> Cheers,
> Mike
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Creating Ceph Pools on different OSD's -- crushmap ?

2017-03-15 Thread Deepak Naidu
Hello,

I am trying to address the failure domain & performance/isolation of pools 
based on what OSD they can belong to. Let me give example. Can I achieve this 
with crurshmap ruleset or any other method, if so how?

Example:
10x storage servers each have 3x OSD ie OSD.0 through OSD29 -- Belong to 
Pool0 - This can be replicated pool or ecpool

Similarly,

10x storage servers each have 5x OSD ie OSD.30 through OSD79 -- Belong to 
Pool1 - This can be replicated pool or ecpool


Thanks for any info.

--
Deepak





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph 0.94.10 ceph-objectstore-tool segfault

2017-03-15 Thread Laszlo Budai

Hello,

I'm trying to do an import-rados operation, but the ceph-objectstore-tool 
crashes with segfault:

[root@storage1 ~]# ceph-objectstore-tool import-rados images pg6.6exp-osd1
*** Caught signal (Segmentation fault) **
 in thread 7f84e0b24880
 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xb13532]
 2: (()+0xf100) [0x7f84dc87a100]
 3: (()+0xa7294) [0x7f84dd4ad294]
 4: (()+0x808ee) [0x7f84dd4868ee]
 5: (()+0x80b28) [0x7f84dd486b28]
 6: (rados_ioctx_create()+0x40) [0x7f84dd45c390]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x23) 
[0x7f84dd45c503]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x682bec]
 9: (main()+0x126c) [0x65145c]
 10: (__libc_start_main()+0xf5) [0x7f84db699b15]
 11: ceph-objectstore-tool() [0x670187]
2017-03-15 17:39:02.290206 7f84e0b24880 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7f84e0b24880

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xb13532]
 2: (()+0xf100) [0x7f84dc87a100]
 3: (()+0xa7294) [0x7f84dd4ad294]
 4: (()+0x808ee) [0x7f84dd4868ee]
 5: (()+0x80b28) [0x7f84dd486b28]
 6: (rados_ioctx_create()+0x40) [0x7f84dd45c390]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x23) 
[0x7f84dd45c503]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x682bec]
 9: (main()+0x126c) [0x65145c]
 10: (__libc_start_main()+0xf5) [0x7f84db699b15]
 11: ceph-objectstore-tool() [0x670187]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
   -14> 2017-03-15 17:39:02.272360 7f84e0b24880  5 asok(0x41d6000) 
register_command perfcounters_dump hook 0x4192120
   -13> 2017-03-15 17:39:02.272378 7f84e0b24880  5 asok(0x41d6000) 
register_command 1 hook 0x4192120
   -12> 2017-03-15 17:39:02.272380 7f84e0b24880  5 asok(0x41d6000) 
register_command perf dump hook 0x4192120
   -11> 2017-03-15 17:39:02.272382 7f84e0b24880  5 asok(0x41d6000) 
register_command perfcounters_schema hook 0x4192120
   -10> 2017-03-15 17:39:02.272384 7f84e0b24880  5 asok(0x41d6000) 
register_command 2 hook 0x4192120
-9> 2017-03-15 17:39:02.272385 7f84e0b24880  5 asok(0x41d6000) 
register_command perf schema hook 0x4192120
-8> 2017-03-15 17:39:02.272387 7f84e0b24880  5 asok(0x41d6000) 
register_command perf reset hook 0x4192120
-7> 2017-03-15 17:39:02.272388 7f84e0b24880  5 asok(0x41d6000) 
register_command config show hook 0x4192120
-6> 2017-03-15 17:39:02.272389 7f84e0b24880  5 asok(0x41d6000) 
register_command config set hook 0x4192120
-5> 2017-03-15 17:39:02.272390 7f84e0b24880  5 asok(0x41d6000) 
register_command config get hook 0x4192120
-4> 2017-03-15 17:39:02.272392 7f84e0b24880  5 asok(0x41d6000) 
register_command config diff hook 0x4192120
-3> 2017-03-15 17:39:02.272393 7f84e0b24880  5 asok(0x41d6000) 
register_command log flush hook 0x4192120
-2> 2017-03-15 17:39:02.272394 7f84e0b24880  5 asok(0x41d6000) 
register_command log dump hook 0x4192120
-1> 2017-03-15 17:39:02.272395 7f84e0b24880  5 asok(0x41d6000) 
register_command log reopen hook 0x4192120
 0> 2017-03-15 17:39:02.290206 7f84e0b24880 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7f84e0b24880

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xb13532]
 2: (()+0xf100) [0x7f84dc87a100]
 3: (()+0xa7294) [0x7f84dd4ad294]
 4: (()+0x808ee) [0x7f84dd4868ee]
 5: (()+0x80b28) [0x7f84dd486b28]
 6: (rados_ioctx_create()+0x40) [0x7f84dd45c390]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x23) 
[0x7f84dd45c503]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x682bec]
 9: (main()+0x126c) [0x65145c]
 10: (__libc_start_main()+0xf5) [0x7f84db699b15]
 11: ceph-objectstore-tool() [0x670187]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent   500
  max_new 1000
  log_file
--- end dump of recent events ---
Segmentation fault
[root@storage1 ~]#



I have created the dump with the same tool. Then I've stopped all the OSDs 
belonging to that PG. Used the ceph-objectstore-tool --op remove to remove the 
pg  from the OSDs.
After that I 

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shain Miley
Thanks for all the help so far.

Just to be clear…if I am planning on upgrading the cluster from Hammer in say 
the next 3 months…what is the suggested upgrade path?

Thanks again,
Shain 

> On Mar 15, 2017, at 2:05 PM, Abhishek Lekshmanan  wrote:
> 
> 
> 
> On 15/03/17 18:32, Shinobu Kinjo wrote:
>> So description of Jewel is wrong?
>> 
>> http://docs.ceph.com/docs/master/releases/ 
>> 
> Yeah we missed updating jewel dates as well when updating about hammer, Jewel 
> is an LTS and would get more upgrades. Once Luminous is released, however, 
> we'll eventually shift focus on bugs that would hinder upgrades to Luminous 
> itself
> 
> Abhishek
>> On Thu, Mar 16, 2017 at 2:27 AM, John Spray  wrote:
>>> On Wed, Mar 15, 2017 at 5:04 PM, Shinobu Kinjo  wrote:
 It may be probably kind of challenge but please consider Kraken (or
 later) because Jewel will be retired:
 
 http://docs.ceph.com/docs/master/releases/
>>> Nope, Jewel is LTS, Kraken is not.
>>> 
>>> Kraken will only receive updates until the next stable release.  Jewel
>>> will receive updates for longer.
>>> 
>>> John
>>> 
 On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
> No this is a production cluster that I have not had a chance to upgrade 
> yet.
> 
> We had an is with the OS on a node so I am just trying to reinstall ceph 
> and
> hope that the osd data is still in tact.
> 
> Once I get things stable again I was planning on upgrading…but the upgrade
> is a bit intensive by the looks of it so I need to set aside a decent 
> amount
> of time.
> 
> Thanks all!
> 
> Shain
> 
> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
> 
> Just curious, why you still want to deploy new hammer instead of stable
> jewel? Is this a test environment? the last .10 release was basically for
> bug fixes for 0.94.9.
> 
> 
> 
> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:
>> FYI:
>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>> 
>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
>>> Hello,
>>> I am trying to deploy ceph to a new server using ceph-deply which I have
>>> done in the past many times without issue.
>>> 
>>> Right now I am seeing a timeout trying to connect to git.ceph.com:
>>> 
>>> 
>>> [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>>> apt-get
>>> -q install --assume-yes ca-certificates
>>> [hqosd6][DEBUG ] Reading package lists...
>>> [hqosd6][DEBUG ] Building dependency tree...
>>> [hqosd6][DEBUG ] Reading state information...
>>> [hqosd6][DEBUG ] ca-certificates is already the newest version.
>>> [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>>> upgraded.
>>> [hqosd6][INFO  ] Running command: wget -O release.asc
>>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [hqosd6][WARNIN] --2017-03-15 11:49:16--
>>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>>> [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>>> connected.
>>> [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>>> Permanently
>>> [hqosd6][WARNIN] Location:
>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [following]
>>> [hqosd6][WARNIN] --2017-03-15 11:49:17--
>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> [hqosd6][WARNIN] Retrying.
>>> [hqosd6][WARNIN]
>>> [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> [hqosd6][WARNIN] Retrying.
>>> [hqosd6][WARNIN]
>>> [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>>> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> [hqosd6][WARNIN] Connecting to git.ceph.com
>>> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> [hqosd6][WARNIN] Retrying.
>>> 
>>> 
>>> I am wondering if this is a known issue.
>>> 
>>> Just an fyi...I am using an older version of ceph-deply (1.5.36) because
>>> in
>>> the past upgrading to a newer version I was not able to install hammer
>>> on
>>> the cluster…so the workaround was to use a slightly older version.
>>> 

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Abhishek Lekshmanan



On 15/03/17 18:32, Shinobu Kinjo wrote:

So description of Jewel is wrong?

http://docs.ceph.com/docs/master/releases/
Yeah we missed updating jewel dates as well when updating about hammer, 
Jewel is an LTS and would get more upgrades. Once Luminous is released, 
however, we'll eventually shift focus on bugs that would hinder upgrades 
to Luminous itself


Abhishek

On Thu, Mar 16, 2017 at 2:27 AM, John Spray  wrote:

On Wed, Mar 15, 2017 at 5:04 PM, Shinobu Kinjo  wrote:

It may be probably kind of challenge but please consider Kraken (or
later) because Jewel will be retired:

http://docs.ceph.com/docs/master/releases/

Nope, Jewel is LTS, Kraken is not.

Kraken will only receive updates until the next stable release.  Jewel
will receive updates for longer.

John


On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:

No this is a production cluster that I have not had a chance to upgrade yet.

We had an is with the OS on a node so I am just trying to reinstall ceph and
hope that the osd data is still in tact.

Once I get things stable again I was planning on upgrading…but the upgrade
is a bit intensive by the looks of it so I need to set aside a decent amount
of time.

Thanks all!

Shain

On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:

Just curious, why you still want to deploy new hammer instead of stable
jewel? Is this a test environment? the last .10 release was basically for
bug fixes for 0.94.9.



On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:

FYI:
https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3

On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:

Hello,
I am trying to deploy ceph to a new server using ceph-deply which I have
done in the past many times without issue.

Right now I am seeing a timeout trying to connect to git.ceph.com:


[hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
apt-get
-q install --assume-yes ca-certificates
[hqosd6][DEBUG ] Reading package lists...
[hqosd6][DEBUG ] Building dependency tree...
[hqosd6][DEBUG ] Reading state information...
[hqosd6][DEBUG ] ca-certificates is already the newest version.
[hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
upgraded.
[hqosd6][INFO  ] Running command: wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] --2017-03-15 11:49:16--
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
[hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
connected.
[hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
Permanently
[hqosd6][WARNIN] Location:
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[following]
[hqosd6][WARNIN] --2017-03-15 11:49:17--
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
[hqosd6][WARNIN] Connecting to git.ceph.com
(git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
[hqosd6][WARNIN] Retrying.
[hqosd6][WARNIN]
[hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Connecting to git.ceph.com
(git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
[hqosd6][WARNIN] Retrying.
[hqosd6][WARNIN]
[hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Connecting to git.ceph.com
(git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
[hqosd6][WARNIN] Retrying.


I am wondering if this is a known issue.

Just an fyi...I am using an older version of ceph-deply (1.5.36) because
in
the past upgrading to a newer version I was not able to install hammer
on
the cluster…so the workaround was to use a slightly older version.

Thanks in advance for any help you may be able to provide.

Shain


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo
So description of Jewel is wrong?

http://docs.ceph.com/docs/master/releases/

On Thu, Mar 16, 2017 at 2:27 AM, John Spray  wrote:
> On Wed, Mar 15, 2017 at 5:04 PM, Shinobu Kinjo  wrote:
>> It may be probably kind of challenge but please consider Kraken (or
>> later) because Jewel will be retired:
>>
>> http://docs.ceph.com/docs/master/releases/
>
> Nope, Jewel is LTS, Kraken is not.
>
> Kraken will only receive updates until the next stable release.  Jewel
> will receive updates for longer.
>
> John
>
>>
>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
>>> No this is a production cluster that I have not had a chance to upgrade yet.
>>>
>>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>>> hope that the osd data is still in tact.
>>>
>>> Once I get things stable again I was planning on upgrading…but the upgrade
>>> is a bit intensive by the looks of it so I need to set aside a decent amount
>>> of time.
>>>
>>> Thanks all!
>>>
>>> Shain
>>>
>>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>>>
>>> Just curious, why you still want to deploy new hammer instead of stable
>>> jewel? Is this a test environment? the last .10 release was basically for
>>> bug fixes for 0.94.9.
>>>
>>>
>>>
>>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:

 FYI:
 https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3

 On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
 > Hello,
 > I am trying to deploy ceph to a new server using ceph-deply which I have
 > done in the past many times without issue.
 >
 > Right now I am seeing a timeout trying to connect to git.ceph.com:
 >
 >
 > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
 > apt-get
 > -q install --assume-yes ca-certificates
 > [hqosd6][DEBUG ] Reading package lists...
 > [hqosd6][DEBUG ] Building dependency tree...
 > [hqosd6][DEBUG ] Reading state information...
 > [hqosd6][DEBUG ] ca-certificates is already the newest version.
 > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
 > upgraded.
 > [hqosd6][INFO  ] Running command: wget -O release.asc
 > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [hqosd6][WARNIN] --2017-03-15 11:49:16--
 > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
 > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
 > connected.
 > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
 > Permanently
 > [hqosd6][WARNIN] Location:
 > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [following]
 > [hqosd6][WARNIN] --2017-03-15 11:49:17--
 > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
 > [hqosd6][WARNIN] Connecting to git.ceph.com
 > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 > [hqosd6][WARNIN] Retrying.
 > [hqosd6][WARNIN]
 > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
 > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [hqosd6][WARNIN] Connecting to git.ceph.com
 > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 > [hqosd6][WARNIN] Retrying.
 > [hqosd6][WARNIN]
 > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
 > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 > [hqosd6][WARNIN] Connecting to git.ceph.com
 > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 > [hqosd6][WARNIN] Retrying.
 >
 >
 > I am wondering if this is a known issue.
 >
 > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
 > in
 > the past upgrading to a newer version I was not able to install hammer
 > on
 > the cluster…so the workaround was to use a slightly older version.
 >
 > Thanks in advance for any help you may be able to provide.
 >
 > Shain
 >
 >
 > ___
 > ceph-users mailing list
 > ceph-users@lists.ceph.com
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread John Spray
On Wed, Mar 15, 2017 at 5:04 PM, Shinobu Kinjo  wrote:
> It may be probably kind of challenge but please consider Kraken (or
> later) because Jewel will be retired:
>
> http://docs.ceph.com/docs/master/releases/

Nope, Jewel is LTS, Kraken is not.

Kraken will only receive updates until the next stable release.  Jewel
will receive updates for longer.

John

>
> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
>> No this is a production cluster that I have not had a chance to upgrade yet.
>>
>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>> hope that the osd data is still in tact.
>>
>> Once I get things stable again I was planning on upgrading…but the upgrade
>> is a bit intensive by the looks of it so I need to set aside a decent amount
>> of time.
>>
>> Thanks all!
>>
>> Shain
>>
>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>>
>> Just curious, why you still want to deploy new hammer instead of stable
>> jewel? Is this a test environment? the last .10 release was basically for
>> bug fixes for 0.94.9.
>>
>>
>>
>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:
>>>
>>> FYI:
>>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>>
>>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
>>> > Hello,
>>> > I am trying to deploy ceph to a new server using ceph-deply which I have
>>> > done in the past many times without issue.
>>> >
>>> > Right now I am seeing a timeout trying to connect to git.ceph.com:
>>> >
>>> >
>>> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>>> > apt-get
>>> > -q install --assume-yes ca-certificates
>>> > [hqosd6][DEBUG ] Reading package lists...
>>> > [hqosd6][DEBUG ] Building dependency tree...
>>> > [hqosd6][DEBUG ] Reading state information...
>>> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
>>> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>>> > upgraded.
>>> > [hqosd6][INFO  ] Running command: wget -O release.asc
>>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
>>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>>> > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>>> > connected.
>>> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>>> > Permanently
>>> > [hqosd6][WARNIN] Location:
>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [following]
>>> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> > [hqosd6][WARNIN] Retrying.
>>> > [hqosd6][WARNIN]
>>> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> > [hqosd6][WARNIN] Retrying.
>>> > [hqosd6][WARNIN]
>>> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>>> > [hqosd6][WARNIN] Retrying.
>>> >
>>> >
>>> > I am wondering if this is a known issue.
>>> >
>>> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
>>> > in
>>> > the past upgrading to a newer version I was not able to install hammer
>>> > on
>>> > the cluster…so the workaround was to use a slightly older version.
>>> >
>>> > Thanks in advance for any help you may be able to provide.
>>> >
>>> > Shain
>>> >
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo
Would you file this as a doc bug? So we discuss properly with tracking.

http://tracker.ceph.com

On Thu, Mar 16, 2017 at 2:17 AM, Deepak Naidu  wrote:
>>> because Jewel will be retired:
> Hmm.  Isn't Jewel LTS ?
>
> Every other stable releases is a LTS (Long Term Stable) and will receive 
> updates until two LTS are published.
>
> --
> Deepak
>
>> On Mar 15, 2017, at 10:09 AM, Shinobu Kinjo  wrote:
>>
>> It may be probably kind of challenge but please consider Kraken (or
>> later) because Jewel will be retired:
>>
>> http://docs.ceph.com/docs/master/releases/
>>
>>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
>>> No this is a production cluster that I have not had a chance to upgrade yet.
>>>
>>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>>> hope that the osd data is still in tact.
>>>
>>> Once I get things stable again I was planning on upgrading…but the upgrade
>>> is a bit intensive by the looks of it so I need to set aside a decent amount
>>> of time.
>>>
>>> Thanks all!
>>>
>>> Shain
>>>
>>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>>>
>>> Just curious, why you still want to deploy new hammer instead of stable
>>> jewel? Is this a test environment? the last .10 release was basically for
>>> bug fixes for 0.94.9.
>>>
>>>
>>>
 On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:

 FYI:
 https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3

> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
> Hello,
> I am trying to deploy ceph to a new server using ceph-deply which I have
> done in the past many times without issue.
>
> Right now I am seeing a timeout trying to connect to git.ceph.com:
>
>
> [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
> apt-get
> -q install --assume-yes ca-certificates
> [hqosd6][DEBUG ] Reading package lists...
> [hqosd6][DEBUG ] Building dependency tree...
> [hqosd6][DEBUG ] Reading state information...
> [hqosd6][DEBUG ] ca-certificates is already the newest version.
> [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
> upgraded.
> [hqosd6][INFO  ] Running command: wget -O release.asc
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] --2017-03-15 11:49:16--
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
> [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
> connected.
> [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
> Permanently
> [hqosd6][WARNIN] Location:
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [following]
> [hqosd6][WARNIN] --2017-03-15 11:49:17--
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
>
>
> I am wondering if this is a known issue.
>
> Just an fyi...I am using an older version of ceph-deply (1.5.36) because
> in
> the past upgrading to a newer version I was not able to install hammer
> on
> the cluster…so the workaround was to use a slightly older version.
>
> Thanks in advance for any help you may be able to provide.
>
> Shain
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> 

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Deepak Naidu
>> because Jewel will be retired:
Hmm.  Isn't Jewel LTS ? 

Every other stable releases is a LTS (Long Term Stable) and will receive 
updates until two LTS are published. 

--
Deepak

> On Mar 15, 2017, at 10:09 AM, Shinobu Kinjo  wrote:
> 
> It may be probably kind of challenge but please consider Kraken (or
> later) because Jewel will be retired:
> 
> http://docs.ceph.com/docs/master/releases/
> 
>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
>> No this is a production cluster that I have not had a chance to upgrade yet.
>> 
>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>> hope that the osd data is still in tact.
>> 
>> Once I get things stable again I was planning on upgrading…but the upgrade
>> is a bit intensive by the looks of it so I need to set aside a decent amount
>> of time.
>> 
>> Thanks all!
>> 
>> Shain
>> 
>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>> 
>> Just curious, why you still want to deploy new hammer instead of stable
>> jewel? Is this a test environment? the last .10 release was basically for
>> bug fixes for 0.94.9.
>> 
>> 
>> 
>>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:
>>> 
>>> FYI:
>>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>> 
 On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
 Hello,
 I am trying to deploy ceph to a new server using ceph-deply which I have
 done in the past many times without issue.
 
 Right now I am seeing a timeout trying to connect to git.ceph.com:
 
 
 [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
 apt-get
 -q install --assume-yes ca-certificates
 [hqosd6][DEBUG ] Reading package lists...
 [hqosd6][DEBUG ] Building dependency tree...
 [hqosd6][DEBUG ] Reading state information...
 [hqosd6][DEBUG ] ca-certificates is already the newest version.
 [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
 upgraded.
 [hqosd6][INFO  ] Running command: wget -O release.asc
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] --2017-03-15 11:49:16--
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
 [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
 connected.
 [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
 Permanently
 [hqosd6][WARNIN] Location:
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [following]
 [hqosd6][WARNIN] --2017-03-15 11:49:17--
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 [hqosd6][WARNIN]
 [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 [hqosd6][WARNIN]
 [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 
 
 I am wondering if this is a known issue.
 
 Just an fyi...I am using an older version of ceph-deply (1.5.36) because
 in
 the past upgrading to a newer version I was not able to install hammer
 on
 the cluster…so the workaround was to use a slightly older version.
 
 Thanks in advance for any help you may be able to provide.
 
 Shain
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo
It may be probably kind of challenge but please consider Kraken (or
later) because Jewel will be retired:

http://docs.ceph.com/docs/master/releases/

On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
> No this is a production cluster that I have not had a chance to upgrade yet.
>
> We had an is with the OS on a node so I am just trying to reinstall ceph and
> hope that the osd data is still in tact.
>
> Once I get things stable again I was planning on upgrading…but the upgrade
> is a bit intensive by the looks of it so I need to set aside a decent amount
> of time.
>
> Thanks all!
>
> Shain
>
> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>
> Just curious, why you still want to deploy new hammer instead of stable
> jewel? Is this a test environment? the last .10 release was basically for
> bug fixes for 0.94.9.
>
>
>
> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:
>>
>> FYI:
>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>
>> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
>> > Hello,
>> > I am trying to deploy ceph to a new server using ceph-deply which I have
>> > done in the past many times without issue.
>> >
>> > Right now I am seeing a timeout trying to connect to git.ceph.com:
>> >
>> >
>> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
>> > apt-get
>> > -q install --assume-yes ca-certificates
>> > [hqosd6][DEBUG ] Reading package lists...
>> > [hqosd6][DEBUG ] Building dependency tree...
>> > [hqosd6][DEBUG ] Reading state information...
>> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
>> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
>> > upgraded.
>> > [hqosd6][INFO  ] Running command: wget -O release.asc
>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
>> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
>> > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
>> > connected.
>> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
>> > Permanently
>> > [hqosd6][WARNIN] Location:
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [following]
>> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> > [hqosd6][WARNIN]
>> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> > [hqosd6][WARNIN]
>> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
>> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
>> > [hqosd6][WARNIN] Connecting to git.ceph.com
>> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
>> > [hqosd6][WARNIN] Retrying.
>> >
>> >
>> > I am wondering if this is a known issue.
>> >
>> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
>> > in
>> > the past upgrading to a newer version I was not able to install hammer
>> > on
>> > the cluster…so the workaround was to use a slightly older version.
>> >
>> > Thanks in advance for any help you may be able to provide.
>> >
>> > Shain
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Odd latency numbers

2017-03-15 Thread Rhian Resnick
Morning all,


We starting to apply load to our test cephfs system and are noticing some odd 
latency numbers. We are using erasure coding for the cold data pools and 
replication for our our cache tiers (not on ssd yet) . We noticed the following 
high latency on one node and it seams to be slowing down writes and reads on 
the cluster.


Our next step is break out mds, mgr, and mons to different machines but we 
wanted to start the discussion here.


Here is a bunch of information you may find useful.


ceph.conf

[global]
fsid = X
mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3
mon_host = 10.141.167.238,10.141.160.251,10.141.161.249
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

cluster network = 10.85.8.0/22
public network = 10.141.0.0/16

# we tested this with bluestore and xfs and have the same results
[osd]
enable_experimental_unrecoverable_data_corrupting_features = bluestore

Status

cluster 8f6ba9d6-314d-4725-bcfa-340e500697f0
 health HEALTH_OK
 monmap e2: 3 mons at 
{ceph-mon1=10.141.167.238:6789/0,ceph-mon2=10.141.160.251:6789/0,ceph-mon3=10.141.161.249:6789/0}
election epoch 12, quorum 0,1,2 ceph-mon2,ceph-mon3,ceph-mon1
  fsmap e30: 1/1/1 up {0=ceph-mon3=up:active}, 2 up:standby
mgr active: ceph-mon3 standbys: ceph-mon2, ceph-mon1
 osdmap e100: 12 osds: 12 up, 12 in
flags sortbitwise,require_jewel_osds,require_kraken_osds
  pgmap v119525: 124 pgs, 6 pools, 471 GB data, 1141 kobjects
970 GB used, 2231 GB / 3202 GB avail
 124 active+clean
  client io 11962 B/s rd, 11 op/s rd, 0 op/s wr


Pool space usage

GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
3202G 2231G 970G 30.31
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
rbd 0   0 0  580G   0
cephfs-hot  1  76137M 11.35  580G  466451
cephfs-cold 2397G 25.48 1161G  650158
cephfs_metadata 3  47237k 0  580G   52275
one-hot 4   0 0  580G   0
one 5   0 0 1161G   0


OSD Performance and Latency

osd commit_latency(ms) apply_latency(ms)
  9  1 1
  8  1 1
  0 1313
 11  1 1
  1 3838
 10  2 2
  2 2121
  3  2 2
  4 2020
  5  1 1
  6  1 1
  7  1 1


OSD Tree and Status

ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.12685 root default
-2 1.08875 host ceph-mon1
 0 0.27219 osd.0   up  1.0  1.0
 1 0.27219 osd.1   up  1.0  1.0
 2 0.27219 osd.2   up  1.0  1.0
 4 0.27219 osd.4   up  1.0  1.0
-3 0.94936 host ceph-mon2
 3 0.27219 osd.3   up  1.0  1.0
 5 0.27219 osd.5   up  1.0  1.0
 7 0.27219 osd.7   up  1.0  1.0
 9 0.13280 osd.9   up  1.0  1.0
-4 1.08875 host ceph-mon3
 6 0.27219 osd.6   up  1.0  1.0
 8 0.27219 osd.8   up  1.0  1.0
10 0.27219 osd.10  up  1.0  1.0
11 0.27219 osd.11  up  1.0  1.0


[root@ceph-mon1 ~]# ceph pg dump
dumped all in format plain
version 119550
stamp 2017-03-15 12:46:24.428486
last_osdmap_epoch 100
last_pg_scan 100
full_ratio 0.95
nearfull_ratio 0.85
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES   LOG  
DISK_LOG STATESTATE_STAMPVERSIONREPORTED   UP   
UP_PRIMARY ACTING   ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
0.2d  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1199590'0  99:57 [9,11,1]  
9 [9,11,1]  90'0 2017-03-15 08:40:09.815059 
0'0 2017-03-14 08:12:37.425834
0.2c  0  00 0   0   00  
  0 active+clean 2017-03-15 11:27:47.9594790'0 100:29  [8,0,3]  
8  [8,0,3]  80'0 2017-03-15 11:27:47.959398 
0'0 2017-03-12 22:50:44.393457
0.2b  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:45.7987940'0  

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shain Miley
No this is a production cluster that I have not had a chance to upgrade yet.

We had an is with the OS on a node so I am just trying to reinstall ceph and 
hope that the osd data is still in tact.

Once I get things stable again I was planning on upgrading…but the upgrade is a 
bit intensive by the looks of it so I need to set aside a decent amount of time.

Thanks all!

Shain

> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
> 
> Just curious, why you still want to deploy new hammer instead of stable 
> jewel? Is this a test environment? the last .10 release was basically for bug 
> fixes for 0.94.9.
> 
> 
> 
> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  > wrote:
> FYI:
> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3 
> 
> 
> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  > wrote:
> > Hello,
> > I am trying to deploy ceph to a new server using ceph-deply which I have
> > done in the past many times without issue.
> >
> > Right now I am seeing a timeout trying to connect to git.ceph.com 
> > :
> >
> >
> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get
> > -q install --assume-yes ca-certificates
> > [hqosd6][DEBUG ] Reading package lists...
> > [hqosd6][DEBUG ] Building dependency tree...
> > [hqosd6][DEBUG ] Reading state information...
> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
> > upgraded.
> > [hqosd6][INFO  ] Running command: wget -O release.asc
> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> > 
> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> > 
> > [hqosd6][WARNIN] Resolving ceph.com  (ceph.com 
> > )... 158.69.68.141
> > [hqosd6][WARNIN] Connecting to ceph.com  (ceph.com 
> > )|158.69.68.141|:443...
> > connected.
> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
> > Permanently
> > [hqosd6][WARNIN] Location:
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> >  
> > [following]
> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> > 
> > [hqosd6][WARNIN] Resolving git.ceph.com  
> > (git.ceph.com )... 8.43.84.132
> > [hqosd6][WARNIN] Connecting to git.ceph.com 
> > (git.ceph.com )|8.43.84.132|:443... failed: 
> > Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> > [hqosd6][WARNIN]
> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> > 
> > [hqosd6][WARNIN] Connecting to git.ceph.com 
> > (git.ceph.com )|8.43.84.132|:443... failed: 
> > Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> > [hqosd6][WARNIN]
> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc 
> > 
> > [hqosd6][WARNIN] Connecting to git.ceph.com 
> > (git.ceph.com )|8.43.84.132|:443... failed: 
> > Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> >
> >
> > I am wondering if this is a known issue.
> >
> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because in
> > the past upgrading to a newer version I was not able to install hammer on
> > the cluster…so the workaround was to use a slightly older version.
> >
> > Thanks in advance for any help you may be able to provide.
> >
> > Shain
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Vasu Kulkarni
Just curious, why you still want to deploy new hammer instead of stable
jewel? Is this a test environment? the last .10 release was basically for
bug fixes for 0.94.9.



On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:

> FYI:
> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>
> On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
> > Hello,
> > I am trying to deploy ceph to a new server using ceph-deply which I have
> > done in the past many times without issue.
> >
> > Right now I am seeing a timeout trying to connect to git.ceph.com:
> >
> >
> > [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
> apt-get
> > -q install --assume-yes ca-certificates
> > [hqosd6][DEBUG ] Reading package lists...
> > [hqosd6][DEBUG ] Building dependency tree...
> > [hqosd6][DEBUG ] Reading state information...
> > [hqosd6][DEBUG ] ca-certificates is already the newest version.
> > [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
> > upgraded.
> > [hqosd6][INFO  ] Running command: wget -O release.asc
> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> > [hqosd6][WARNIN] --2017-03-15 11:49:16--
> > https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> > [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
> > [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
> > connected.
> > [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
> > Permanently
> > [hqosd6][WARNIN] Location:
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [following]
> > [hqosd6][WARNIN] --2017-03-15 11:49:17--
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> > [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
> > [hqosd6][WARNIN] Connecting to git.ceph.com
> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> > [hqosd6][WARNIN]
> > [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> > [hqosd6][WARNIN] Connecting to git.ceph.com
> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> > [hqosd6][WARNIN]
> > [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
> > https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> > [hqosd6][WARNIN] Connecting to git.ceph.com
> > (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> > [hqosd6][WARNIN] Retrying.
> >
> >
> > I am wondering if this is a known issue.
> >
> > Just an fyi...I am using an older version of ceph-deply (1.5.36) because
> in
> > the past upgrading to a newer version I was not able to install hammer on
> > the cluster…so the workaround was to use a slightly older version.
> >
> > Thanks in advance for any help you may be able to provide.
> >
> > Shain
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Deepak Naidu
I had similar issue when using older version of ceph-deploy. I see the URL 
got.ceph.com  doesn't work on browser as well.

To resolve this, I installed the latest version of ceph-deploy and it worked 
fine. New version wasn't using git.ceph.com.

During ceph-deploy you can mention what version of ceph you want example jewel, 
etc..  


--
Deepak

> On Mar 15, 2017, at 9:06 AM, Shain Miley  wrote:
> 
> s
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shinobu Kinjo
FYI:
https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3

On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
> Hello,
> I am trying to deploy ceph to a new server using ceph-deply which I have
> done in the past many times without issue.
>
> Right now I am seeing a timeout trying to connect to git.ceph.com:
>
>
> [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get
> -q install --assume-yes ca-certificates
> [hqosd6][DEBUG ] Reading package lists...
> [hqosd6][DEBUG ] Building dependency tree...
> [hqosd6][DEBUG ] Reading state information...
> [hqosd6][DEBUG ] ca-certificates is already the newest version.
> [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
> upgraded.
> [hqosd6][INFO  ] Running command: wget -O release.asc
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] --2017-03-15 11:49:16--
> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
> [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
> connected.
> [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
> Permanently
> [hqosd6][WARNIN] Location:
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc [following]
> [hqosd6][WARNIN] --2017-03-15 11:49:17--
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
> [hqosd6][WARNIN]
> [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
> https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
> [hqosd6][WARNIN] Connecting to git.ceph.com
> (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
> [hqosd6][WARNIN] Retrying.
>
>
> I am wondering if this is a known issue.
>
> Just an fyi...I am using an older version of ceph-deply (1.5.36) because in
> the past upgrading to a newer version I was not able to install hammer on
> the cluster…so the workaround was to use a slightly older version.
>
> Thanks in advance for any help you may be able to provide.
>
> Shain
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Shain Miley
Hello,
I am trying to deploy ceph to a new server using ceph-deply which I have done 
in the past many times without issue.

Right now I am seeing a timeout trying to connect to git.ceph.com 
:


[hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get -q 
install --assume-yes ca-certificates
[hqosd6][DEBUG ] Reading package lists...
[hqosd6][DEBUG ] Building dependency tree...
[hqosd6][DEBUG ] Reading state information...
[hqosd6][DEBUG ] ca-certificates is already the newest version.
[hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
[hqosd6][INFO  ] Running command: wget -O release.asc 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] --2017-03-15 11:49:16--  
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
[hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443... 
connected.
[hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved Permanently
[hqosd6][WARNIN] Location: 
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc [following]
[hqosd6][WARNIN] --2017-03-15 11:49:17--  
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
[hqosd6][WARNIN] Connecting to git.ceph.com (git.ceph.com)|8.43.84.132|:443... 
failed: Connection timed out.
[hqosd6][WARNIN] Retrying.
[hqosd6][WARNIN] 
[hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)  
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Connecting to git.ceph.com (git.ceph.com)|8.43.84.132|:443... 
failed: Connection timed out.
[hqosd6][WARNIN] Retrying.
[hqosd6][WARNIN] 
[hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)  
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
[hqosd6][WARNIN] Connecting to git.ceph.com (git.ceph.com)|8.43.84.132|:443... 
failed: Connection timed out.
[hqosd6][WARNIN] Retrying.


I am wondering if this is a known issue.

Just an fyi...I am using an older version of ceph-deply (1.5.36) because in the 
past upgrading to a newer version I was not able to install hammer on the 
cluster…so the workaround was to use a slightly older version.

Thanks in advance for any help you may be able to provide.

Shain

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-15 Thread Sage Weil
On Wed, 15 Mar 2017, Brad Hubbard wrote:
> +ceph-devel
> 
> On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph  wrote:
> > Hello,
> >
> > We suspect these messages not only at the time of OSD creation. But in idle
> > conditions also. May I know what is the impact of these error? Can we safely
> > ignore this? Or is there any way/config to fix this problem
> >
> > Few occurrence for these events as follows:---
> >
> > 
> > 2017-03-14 17:16:09.500370 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 started
> > 2017-03-14 17:16:09.500374 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: memtable #1
> > done
> > 2017-03-14 17:16:09.500376 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": 1489511769500289,
> > "job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, 0, 0],
> > "immutable_memtables": 0}
> > 2017-03-14 17:16:09.500382 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500330) [default] Level summary: base level 1 max bytes
> > base 268435456 files[2 4 6 0 0 0 0] max score 0.76
> >
> > 2017-03-14 17:16:09.500390 7fedeba61700  4 rocksdb: [JOB 17] Try to delete
> > WAL files size 244090350, prev total WAL file size 247331500, number of live
> > WAL files 2.
> >
> > 2017-03-14 17:34:11.610513 7fedf3a71700 -1
> > bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6
> 
> These errors come from here.
> 
> void KernelDevice::aio_submit(IOContext *ioc)
> {
> ...
> int r = aio_queue.submit(*cur, );
> if (retries)
>   derr << __func__ << " retries " << retries << dendl;
> 
> The submit function is this one which calls libaio's io_submit
> function directly and increments retries if it receives EAGAIN.
> 
> #if defined(HAVE_LIBAIO)
> int FS::aio_queue_t::submit(aio_t , int *retries)
> {
>   // 2^16 * 125us = ~8 seconds, so max sleep is ~16 seconds
>   int attempts = 16;
>   int delay = 125;
>   iocb *piocb = 
>   while (true) {
> int r = io_submit(ctx, 1, ); <-NOTE
> if (r < 0) {
>   if (r == -EAGAIN && attempts-- > 0) { <-NOTE
> usleep(delay);
> delay *= 2;
> (*retries)++;
> continue;
>   }
>   return r;
> }
> assert(r == 1);
> break;
>   }
>   return 0;
> }
> 
> 
> From the man page.
> 
> IO_SUBMIT(2)   Linux Programmer's
> Manual  IO_SUBMIT(2)
> 
> NAME
>io_submit - submit asynchronous I/O blocks for processing
> ...
> RETURN VALUE
>On success, io_submit() returns the number of iocbs submitted
> (which may be 0 if nr is zero).  For the  failure
>return, see NOTES.
> 
> ERRORS
>EAGAIN Insufficient resources are available to queue any iocbs.
> 
> I suspect increasing bdev_aio_max_queue_depth may help here but some
> of the other devs may have more/better ideas.

Yes--try increasing bdev_aio_max_queue_depth.  It defaults to 32; try 
changing it to 128, 1024, or 4096 and see if these errors go away.

I've never been able to trigger this on my test boxes, but I put in the 
warning to help ensure we pick a good default.

What kernel version are you running?

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck inactive

2017-03-15 Thread Laszlo Budai

Hello,


the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 command 
crashes.

~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
*** Caught signal (Segmentation fault) **
 in thread 7f85b60e28c0
 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xaeeaba]
 2: (()+0x10330) [0x7f85b4dca330]
 3: (()+0xa2324) [0x7f85b1cd7324]
 4: (()+0x7d23e) [0x7f85b1cb223e]
 5: (()+0x7d478) [0x7f85b1cb2478]
 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) 
[0x7f85b1c8a0e5]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
 9: (main()+0x1294) [0x651134]
 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
 11: ceph-objectstore-tool() [0x66f8b7]
2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7f85b60e28c0

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xaeeaba]
 2: (()+0x10330) [0x7f85b4dca330]
 3: (()+0xa2324) [0x7f85b1cd7324]
 4: (()+0x7d23e) [0x7f85b1cb223e]
 5: (()+0x7d478) [0x7f85b1cb2478]
 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) 
[0x7f85b1c8a0e5]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
 9: (main()+0x1294) [0x651134]
 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
 11: ceph-objectstore-tool() [0x66f8b7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
   -14> 2017-03-15 14:57:05.557743 7f85b60e28c0  5 asok(0x5632000) 
register_command perfcounters_dump hook 0x55e6130
   -13> 2017-03-15 14:57:05.557807 7f85b60e28c0  5 asok(0x5632000) 
register_command 1 hook 0x55e6130
   -12> 2017-03-15 14:57:05.557818 7f85b60e28c0  5 asok(0x5632000) 
register_command perf dump hook 0x55e6130
   -11> 2017-03-15 14:57:05.557828 7f85b60e28c0  5 asok(0x5632000) 
register_command perfcounters_schema hook 0x55e6130
   -10> 2017-03-15 14:57:05.557836 7f85b60e28c0  5 asok(0x5632000) 
register_command 2 hook 0x55e6130
-9> 2017-03-15 14:57:05.557841 7f85b60e28c0  5 asok(0x5632000) 
register_command perf schema hook 0x55e6130
-8> 2017-03-15 14:57:05.557851 7f85b60e28c0  5 asok(0x5632000) 
register_command perf reset hook 0x55e6130
-7> 2017-03-15 14:57:05.557855 7f85b60e28c0  5 asok(0x5632000) 
register_command config show hook 0x55e6130
-6> 2017-03-15 14:57:05.557864 7f85b60e28c0  5 asok(0x5632000) 
register_command config set hook 0x55e6130
-5> 2017-03-15 14:57:05.557868 7f85b60e28c0  5 asok(0x5632000) 
register_command config get hook 0x55e6130
-4> 2017-03-15 14:57:05.557877 7f85b60e28c0  5 asok(0x5632000) 
register_command config diff hook 0x55e6130
-3> 2017-03-15 14:57:05.557880 7f85b60e28c0  5 asok(0x5632000) 
register_command log flush hook 0x55e6130
-2> 2017-03-15 14:57:05.557888 7f85b60e28c0  5 asok(0x5632000) 
register_command log dump hook 0x55e6130
-1> 2017-03-15 14:57:05.557892 7f85b60e28c0  5 asok(0x5632000) 
register_command log reopen hook 0x55e6130
 0> 2017-03-15 14:57:05.567987 7f85b60e28c0 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7f85b60e28c0

 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
 1: ceph-objectstore-tool() [0xaeeaba]
 2: (()+0x10330) [0x7f85b4dca330]
 3: (()+0xa2324) [0x7f85b1cd7324]
 4: (()+0x7d23e) [0x7f85b1cb223e]
 5: (()+0x7d478) [0x7f85b1cb2478]
 6: (rados_ioctx_create()+0x32) [0x7f85b1c89f92]
 7: (librados::Rados::ioctx_create(char const*, librados::IoCtx&)+0x15) 
[0x7f85b1c8a0e5]
 8: (do_import_rados(std::string, bool)+0xb7c) [0x68199c]
 9: (main()+0x1294) [0x651134]
 10: (__libc_start_main()+0xf5) [0x7f85b0c69f45]
 11: ceph-objectstore-tool() [0x66f8b7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent   500
  max_new 1000
  log_file
--- end dump of recent events ---
Segmentation fault (core dumped)
#

Any ideas what to try?

Thank you.
Laszlo


On 15.03.2017 04:27, Brad Hubbard wrote:

Decide which copy you want to keep and export that with ceph-objectstore-tool

Delete all copies on all OSDs with 

Re: [ceph-users] [ceph-fuse] Quota size change does not notify another ceph-fuse client.

2017-03-15 Thread John Spray
On Wed, Mar 15, 2017 at 1:53 AM, yu2xiangyang  wrote:
> Dear cephers,
>
> I met a problem when using ceph-fuse with quota enabled.
>
>  My ceph version is :
>
>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) .
>
> I have two ceph-fuse process in two different hosts(node1 and node2).
>
>  One ceph-fuse is mounted with root directory on /mnt/cephfs on node1.
>  [root@node1] ceph-fuse -m mds-host:6789 /mnt/cephfs
>
>  I set quota on a sub directory volumes/nas0 in /mnt/cephfs.
>  [root@node1] setfattr -n ceph.quota.max_bytes -v 1000
> /mnt/cephfs/volumes/nas0
>
> One ceph-fuse is mounted with /volumes/nas0 on node2 with client quota
> enabled.
> [root@node2] ceph-fuse --id yxy -r /volumes/nas0 /shares/share0/
>
>  [root@node2]$ cat /etc/ceph/ceph.client.yxy.keyring
>  [client.yxy] key = AQBmtMdYNPnOJRAAg5t+gkUDmqTpQhZh2VXlWg==
>  caps mds = "allow * path=/volumes/nas0"
> caps mon = "allow *"
> caps osd = "allow *"
>
> [root@node2 ]$ cat /etc/ceph/ceph.conf
> [global] mon_initial_members=x
> mon_host = x
> client quota = true
>
>  DF commond show that ceph-fuse has 94GB capacity just equal to the size we
> set.
>
> ceph-fuse 94G 4G 90G 3% /shares/share0
>
> But when I resize the quota with /volumes/nas0. in node1.
> [root@node1 ~]# setfattr -n ceph.quota.max_bytes -v 700
> /mnt/cephfs/volumes/nas0
>
> On node2 , DF command stilll show 94GB capacity.
> ceph-fuse 94G 4G 90G 3% /shares/share0
>
> One host resizes the quota size and another host does not know the change.
>
>  Is it a problem with ceph-fuse? Appreciate any reply.

There's a known issue that quota updates aren't visible on another
client until it does some IO in the directory:
http://tracker.ceph.com/issues/17939

This probably have the same underlying cause as the statfs (df) data
looking out of date.

I'm looking into it, see ticket for updates.

John


>
> cheers,
> penglaixy
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck inactive

2017-03-15 Thread Laszlo Budai

Ok.
Delete the dirs using the ceph-objectstore-tool. DONE

ceph pg force_create_pg 3.367 lead me to this state:

HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 16 requests are blocked 
> 32 sec; 2 osds have slow requests; noout flag(s) set
pg 3.367 is stuck inactive since forever, current state creating, last acting []
pg 3.367 is stuck unclean since forever, current state creating, last acting []
16 ops are blocked > 262.144 sec
8 ops are blocked > 262.144 sec on osd.28
8 ops are blocked > 262.144 sec on osd.35
2 osds have slow requests
noout flag(s) set


I am supposed to have all the OSDs started for the force_create_pg, right?

Kind regards,
Laszlo


On 15.03.2017 04:27, Brad Hubbard wrote:

Decide which copy you want to keep and export that with ceph-objectstore-tool

Delete all copies on all OSDs with ceph-objectstore-tool (not by
deleting the directory on the disk).

Use force_create_pg to recreate the pg empty.

Use ceph-objectstore-tool to do a rados import on the exported pg copy.


On Wed, Mar 15, 2017 at 12:00 PM, Laszlo Budai  wrote:

Hello,

I have tried to recover the pg using the following steps:
Preparation:
1. set noout
2. stop osd.2
3. use ceph-objectstore-tool to export from osd2
4. start osd.2
5. repeat step 2-4 on osd 35,28, 63 (I've done these hoping to be able to
use one of those exports to recover the PG)


First attempt:

1. stop osd.2
2. remove the 3.367_head directory
3. start osd.2
Here I was hoping that the cluster will recover the pg from the 2 other
identical osds. It did NOT. So I have tried the following commands on the
PG:
ceph pg repair
ceph pg scrub
ceph pg deep-scrub
ceph pg force_create_pg
 nothing changed. My PG was still incomplete. So I tried to remove all the
OSDs that were referenced in the pg query:


1. stop osd.2
2. delete the 3.367_head directory
3. start osd2
4 repeat steps 6-8 for all the OSDs that were listed in the pg query
5. did an import from one of the exports. -> I was able again to query the
pg (that was impossible when all the 3.367_head dirs were deleted) and the
stats were saying that the number of objects is 6 the size is 21M (all
correct values according to the files I was able to see before starting the
procedure) But the PG is still incomplete.

What else can I try?

Thank you,
Laszlo





On 12.03.2017 13:06, Brad Hubbard wrote:


On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai 
wrote:


Hello,

I have already done the export with ceph_objectstore_tool. I just have to
decide which OSDs to keep.
Can you tell me why the directory structure in the OSDs is different for
the
same PG when checking on different OSDs?
For instance, in OSD 2 and 63 there are NO subdirectories in the
3.367__head, while OSD 28, 35 contains
./DIR_7/DIR_6/DIR_B/
./DIR_7/DIR_6/DIR_3/

When are these subdirectories created?

The files are identical on all the OSDs, only the way how these are
stored
is different. It would be enough if you could point me to some
documentation
that explain these, I'll read it. So far, searching for the architecture
of
an OSD, I could not find the gory details about these directories.



https://github.com/ceph/ceph/blob/master/src/os/filestore/HashIndex.h



Kind regards,
Laszlo


On 12.03.2017 02:12, Brad Hubbard wrote:



On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai 
wrote:



Hello,

Thank you for your answer.

indeed the min_size is 1:

# ceph osd pool get volumes size
size: 3
# ceph osd pool get volumes min_size
min_size: 1
#
I'm gonna try to find the mentioned discussions on the mailing lists,
and
read them. If you have a link at hand, that would be nice if you would
send
it to me.




This thread is one example, there are lots more.



http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html



In the attached file you can see the contents of the directory
containing
PG
data on the different OSDs (all that have appeared in the pg query).
According to the md5sums the files are identical. What bothers me is
the
directory structure (you can see the ls -R in each dir that contains
files).




So I mixed up 63 and 68, my list should have read 2, 28, 35 and 63
since 68 is listed as empty in the pg query.



Where can I read about how/why those DIR# subdirectories have appeared?

Given that the files themselves are identical on the "current" OSDs
belonging to the PG, and as the osd.63 (currently not belonging to the
PG)
has the same files, is it safe to stop the OSD.2, remove the 3.367_head
dir,
and then restart the OSD? (all these with the noout flag set of course)




*You* need to decide which is the "good" copy and then follow the
instructions in the links I provided to try and recover the pg. Back
those known copies on 2, 28, 35 and 63 up with the
ceph_objectstore_tool before proceeding. They may well be identical
but the peering process still needs to "see" the relevant logs and
currently something is stopping it doing so.



Kind 

Re: [ceph-users] pgs stuck inactive

2017-03-15 Thread Laszlo Budai

Hello,


So, I've done the following seps:

1. set noout
2. stop osd2
3. ceph-objectstore-tool remove
4. start osd2
5. repeat step 2-4 on osd 28 and 35

then I've run the ceph pg force_create_pg 3.367.
This has left the PG in creating state:

# ceph -s
cluster 6713d1b8-83da-11e6-aa79-525400d98c5a
 health HEALTH_WARN
1 pgs stuck inactive
1 pgs stuck unclean
8 requests are blocked > 32 sec
noout flag(s) set
 monmap e3: 3 mons at 
{tv-dl360-1=10.12.193.73:6789/0,tv-dl360-2=10.12.193.74:6789/0,tv-dl360-3=10.12.193.75:6789/0}
election epoch 84, quorum 0,1,2 tv-dl360-1,tv-dl360-2,tv-dl360-3
 osdmap e60876: 72 osds: 72 up, 72 in
flags noout
  pgmap v3947626: 4864 pgs, 11 pools, 129 GB data, 22073 objects
423 GB used, 130 TB / 130 TB avail
4863 active+clean
   1 creating
  client io 0 B/s rd, 12846 B/s wr, 2 op/s


# ceph health detail
HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 8 requests are blocked > 
32 sec; 1 osds have slow requests; noout flag(s) set
pg 3.367 is stuck inactive since forever, current state creating, last acting []
pg 3.367 is stuck unclean since forever, current state creating, last acting []
8 ops are blocked > 2097.15 sec
8 ops are blocked > 2097.15 sec on osd.28
1 osds have slow requests
noout flag(s) set
root@tv-dl360-1:~#

As I saw that there is nothing on the osd.2 and the other two are blocked, I 
have restarted the osd2

service ceph-osd restart id=2

this has unlocked the creation process and the PG became active+clean

now when I try to import the dump I get this error: OSD has the store locked.

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-35/ --journal-path 
/var/lib/ceph/osd/ceph-35/journal --pgid 3.367 --op import --file 
pg.3.367.export.OSD.35
OSD has the store locked
#

How should I proceed further?
stop all the three OSDs belonging to this PG, and do the import, and then 
restart them one by one?

Kind regards,
Laszlo



On 15.03.2017 04:27, Brad Hubbard wrote:

Decide which copy you want to keep and export that with ceph-objectstore-tool

Delete all copies on all OSDs with ceph-objectstore-tool (not by
deleting the directory on the disk).

Use force_create_pg to recreate the pg empty.

Use ceph-objectstore-tool to do a rados import on the exported pg copy.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-03-15 Thread 许雪寒
Hi, sir.

I'm sorry I made a mistake, the fix that you provided should be the one we 
need, is it safe for us to simply "git cherry-pick" that commit into our 0.94.5 
version?

So sorry for my mistake.
Thank you.

On Wed, Jan 11, 2017 at 3:59 PM, 许雪寒  wrote:
> In our test, when one machine is under heavy packet loss, OSDs on 
> other machines can be brought down and sometimes more than one OSD go 
> down, because they may try to read a message that comes from that 
> machine. So we are really concerned, please help us. Thanks
>
>
> -邮件原件-
> 发件人: 许雪寒
> 发送时间: 2017年1月11日 15:13
> 收件人: 'Sage Weil'
> 抄送: ceph-de...@vger.kernel.org
> 主题: 答复: Pipe "deadlock" in Hammer, 0.94.5
>
> Thanks for your reply, sir:-)
>
> Actually, this case is not very rare in our test. When iptables drop ip 
> packets with the probability around 98%~99%, this case occurs about every 
> three times our test runs.
> I checked #14120 as you recommended, however, it doesn't seem to be our 
> problem. Because, as http://tracker.ceph.com/issues/18184 says, #14120 is 
> caused by a commit that changed Pipe::tcp_read_wait() to return -errno 
> instead of "-1" and this commit is not applied in our tested hammer 
> verion(0.94.5).
> And I truly agree with you in that the invocation of "recv" function passes 
> MSG_DONTWAIT flag, so it shouldn't block, however, every time we encounter 
> this problem it's always the Pipe::reader_thread who's calling recv that hold 
> the lock when suicide happens, this is really confusing.
>
> Please help us, thank you:-)
>
>
>
> -邮件原件-
> 发件人: Sage Weil [mailto:s...@newdream.net]
> 发送时间: 2017年1月10日 21:12
> 收件人: 许雪寒
> 抄送: ceph-de...@vger.kernel.org
> 主题: Re: Pipe "deadlock" in Hammer, 0.94.5
>
> On Tue, 10 Jan 2017, 许雪寒 wrote:
>> Hi, everyone.
>>
>> Recently, we did some experiment to test the stability of the ceph cluster. 
>> We used Hammer version which is the mostly used version of online cluster. 
>> One of the scenarios that we simulated is poor network connectivity, in 
>> which we used iptables to drop TCP/IP packet under some probability. And 
>> sometimes, we can see that some OSD suicide themselves.
>>
>> We used gdb to debug the core dumped by linux. We found that the thread that 
>> hit the suicide time threshold is a peering thread who is trying to send a 
>> pg_notify message, the ceph-osd log file and gdb output is as follows:
>>
>> Log file:
>> -3> 2017-01-10 17:02:13.469949 7fd446ff7700  1 heartbeat_map 
>> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had timed out after 15
>> -2> 2017-01-10 17:02:13.469952 7fd446ff7700  1 heartbeat_map 
>> is_healthy 'OSD::osd_tp thread 0x7fd440bed700' had suicide timed out 
>> after 150
>> -1> 2017-01-10 17:02:13.469954 7fd4451f4700  1 --
>> 10.160.132.157:6818/10014122 <== osd.20 10.160.132.156:0/24908 163 
>>  osd_ping(ping e4030 stamp 2017-01-10 17:02:13.450374) v2 
>> 47+0+0 (3247646131 0 0) 0x7fd418ca8600 con 0x7fd413c89700
>>  0> 2017-01-10 17:02:13.496895 7fd446ff7700 -1 error_msg
>> common/HeartbeatMap.cc: In function 'bool 
>> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, 
>> time_t)' thread 7fd446ff7700 time 2017-01-10 17:02:13.469969
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>>
>> GDB OUTPUT:
>> (gdb) thread 8
>> [Switching to thread 8 (Thread 0x7fd440bed700 (LWP 15302))]#0
>> 0x003c5d80e334 in __lll_lock_wait () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x003c5d80e334 in __lll_lock_wait () from
>> /lib64/libpthread.so.0
>> #1  0x003c5d8095d8 in _L_lock_854 () from /lib64/libpthread.so.0
>> #2  0x003c5d8094a7 in pthread_mutex_lock () from
>> /lib64/libpthread.so.0
>> #3  0x01a54ae4 in Mutex::Lock (this=0x7fd426453598,
>> no_lockdep=false) at common/Mutex.cc:96
>> #4  0x01409285 in Mutex::Locker::Locker (this=0x7fd440beb6c0,
>> m=...) at common/Mutex.h:115
>> #5  0x01c46446 in PipeConnection::try_get_pipe 
>> (this=0x7fd426453580, p=0x7fd440beb908) at
>> msg/simple/PipeConnection.cc:38
>> #6  0x01c05809 in SimpleMessenger::submit_message 
>> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580, 
>> dest_addr=..., dest_type=4, already_locked=false) at
>> msg/simple/SimpleMessenger.cc:443
>> #7  0x01c033fa in SimpleMessenger::_send_message 
>> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at
>> msg/simple/SimpleMessenger.cc:136
>> #8  0x01c467c7 in SimpleMessenger::send_message 
>> (this=0x7fd482029400, m=0x7fd425538d00, con=0x7fd426453580) at
>> msg/simple/SimpleMessenger.h:139
>> #9  0x01c466a1 in PipeConnection::send_message 
>> (this=0x7fd426453580, m=0x7fd425538d00) at
>> msg/simple/PipeConnection.cc:78
>> #10 0x013b3ff2 in OSDService::send_map (this=0x7fd4821e76c8, 
>> m=0x7fd425538d00, con=0x7fd426453580) at osd/OSD.cc:1054
>> #11 0x013b45e7 in OSDService::send_incremental_map 
>> (this=0x7fd4821e76c8, since=4028, 

Re: [ceph-users] mkjournal error creating journal ... : (13) Permission denied

2017-03-15 Thread Peter Maloney
On 03/15/17 08:43, Gunwoo Gim wrote:
>  After a reboot, all the partitions of LVM don't show up in
> /dev/mapper -nor in the /dev/dm- or /proc/partitions- though
> the whole disks show up; I have to make the hosts run one 'partprobe'
> every time they boot so as to have the partitions all show up.
Maybe you need this after partprobe:

udevadm trigger

>
>  I've found out that the udev rules have never triggered even when I
> removed the DEVTYPE checking part; checked with a udev
> line: RUN+="/bin/echo 'add /dev/$name' >> /root/log.txt" 
>  I've also tried chowning all the /dev/dm- to ceph:disk in vain.
> Do I have to use the udev rules even if the /dev/dm- s are
> already owned by ceph:ceph?
>
No, I think you just need them owned by ceph:ceph. Test that with
something like:

sudo -u ceph hexdump -C /dev/dm-${number} | head

(which reads, not writes...so not a full test, but close enough)

And also make sure the files in /var/lib/ceph/{osd,mon,...} are owned by
ceph:ceph too. Maybe you have a mix of root and ceph, which is easy to
cause by running it as root when ceph owns some files.


And FYI, I don't like udev, and did not use ceph-deploy or ceph-disk. I
did it with a very simple init script instead:


> #!/bin/bash
> mkdir -p /var/run/ceph
> chown ceph:ceph /var/run/ceph
> chgrp -R ceph /var/log/ceph
> for d in /var/lib/ceph/osd/*/journal; do
> d=$(readlink -f "$d")
> chown ceph:ceph "$d"
> done

This works on ubuntu 14.04 as is, a badly written init script, but I
think centos will not accept it without the lsb tags.

A side effect of doing it this way is you have to manually run the
script again when replacing or adding disks, since it is not run after
hot swap like udev is.


>  Thank you very much for reading.
>
> Best Regards,
> Nicholas.
>
> On Wed, Mar 15, 2017 at 1:06 AM Gunwoo Gim  > wrote:
>
>  Thank you very much, Peter.
>
>  I'm sorry for not clarifying the version number; it's kraken and
> 11.2.0-1xenial.
>
>  I guess the udev rules in this file are supposed to change them :
> /lib/udev/rules.d/95-ceph-osd.rules
>  ...but the rules' filters don't seem to match the DEVTYPE part of
> the prepared partitions on the LVs I've got on the host.
>
>  Would it have been the cause of trouble? I'd love to be informed
> of a good way to make it work with the logical volumes; should I
> fix the udev rule?
>
> ~ # cat /lib/udev/rules.d/95-ceph-osd.rules | head -n 19
> # OSD_UUID
> ACTION=="add", SUBSYSTEM=="block", \
>   ENV{DEVTYPE}=="partition", \
>   ENV{ID_PART_ENTRY_TYPE}=="4fbd7e29-9d25-41b8-afd0-062c0ceff05d", \
>   OWNER:="ceph", GROUP:="ceph", MODE:="660", \
>   RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
> ACTION=="change", SUBSYSTEM=="block", \
>   ENV{ID_PART_ENTRY_TYPE}=="4fbd7e29-9d25-41b8-afd0-062c0ceff05d", \
>   OWNER="ceph", GROUP="ceph", MODE="660"
>
> # JOURNAL_UUID
> ACTION=="add", SUBSYSTEM=="block", \
>   ENV{DEVTYPE}=="partition", \
>   ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
>   OWNER:="ceph", GROUP:="ceph", MODE:="660", \
>   RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
> ACTION=="change", SUBSYSTEM=="block", \
>   ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
>   OWNER="ceph", GROUP="ceph", MODE="660"
>
>
> ~ # udevadm info /dev/mapper/vg--ssd1-lv--ssd1p1 | grep
> ID_PART_ENTRY_TYPE
> E: ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
>
> ~ # udevadm info /dev/mapper/vg--ssd1-lv--ssd1p1 | grep DEVTYPE
> E: DEVTYPE=disk
>
>
> Best Regards,
> Nicholas.
>  
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bluestore

2017-03-15 Thread Christian Balzer

Hello,

On Wed, 15 Mar 2017 09:07:10 +0100 Michał Chybowski wrote:

> > Hello,
> >
> > your subject line has little relevance to your rather broad questions.
> >
> > On Tue, 14 Mar 2017 23:45:26 +0100 Michał Chybowski wrote:
> >  
> >> Hi,
> >>
> >> I'm going to set up a small cluster (5 nodes with 3 MONs, 2 - 4 HDDs per
> >> node) to test if ceph in such small scale is going to perform good
> >> enough to put it into production environment (or does it perform well
> >> only if there are tens of OSDs, etc.).  
> > So we are to assume that this is a test cluster (but resembling your
> > deployment plans) and that you have little to no Ceph experience, right?  
> Exactly. If test cluster will perform "well enough", we'll deploy the 
> same setup for production use.

As others mentioned, small clusters can work, none of mine is large in any
sense of the word.
However the load fits the HW and we're using SSD journals and SSD based
cache tiers in some.

> >
> > Ceph is definitely a scale-out design (more OSDs are better) but that of
> > course depends on your workload, expectations and actual HW/design.
> >
> > For a very harsh look at things, a cluster with 3 nodes and one OSD (HDD)
> > each will only be as "fast" as a single HDD plus the latencies introduced
> > by the network, replication and OSD code overhead.
> > Made even worse (twice as much) by an inline journal.
> > And you get to spend the 3 times the money for that "pleasure".
> > So compared to local storage Ceph is going perform in the "mediocre" to
> > "abysmal" range.  
> Local storage is not even being compared here as it's a SPOF which I'm 
> trying to eliminate. Mainly ceph will be used to provide rbd columes to 
> xenserver VMs with replication factor of 3 (safety first, I know it'll 
> cost a lot more than local storage / 2 replicas), but eventually it 
> might be used to serve also as RGW backend.
>
Then you're stuck with XFS or Bluestore because of RGW.
Xenserver (search the archives) isn't exactly well supported with Ceph.

> >  
> >> Are there any "do's" and "don'ts" in matter of OSD storage type
> >> (bluestore / xfs / ext4 / btrfs), correct
> >> "journal-to-storage-drive-size" ratio and monitor placement in very
> >> limited space (dedicated machines just for MONs are not an option).
> >>  
> > Lots of answers in the docs and this ML, search for them.
> >
> > If you're testing for something that won't be in production before the end
> > of this year, look at Bluestore.
> > Which incidentally has no journal (but can benefit from fast storage for
> > similar reasons, WAL etc) and where people have no to little experience
> > what ratios and sizes are "good".
> >
> > Also with looking at Bluestore ante portas, I wouldn't consider BTRFS or
> > ZFS at this time, too much of a specialty case for the uninitiated.
> >
> > Which leaves you with XFS or EXT4 for immediate deployment needs, with
> > EXT4 being deprecated (for RGW users).
> > I found EXT4 a better fit for our needs (just RBD) in all the years I
> > tested and compared it with XFS, but if you want to go down the path of
> > least resistance and have a large pool of people to share your problems
> > with, XFS is your only choice at this time.  
> Why (how) EXT4 got "deprecated" for RGW use? Also could you give me any 
> comparison between EXT4 and XFS (latency, throughput, etc)?
>
Read the changelogs, release notes, find the ML threads:

"Deprecating ext4 support"
"Better late than never,  some XFS versus EXT4 test results"

Christian

> >
> > If your machines are powerful enough, co-sharing MONs is not an issue.
> >
> > Christian  
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-15 Thread Brad Hubbard
+ceph-devel

On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph  wrote:
> Hello,
>
> We suspect these messages not only at the time of OSD creation. But in idle
> conditions also. May I know what is the impact of these error? Can we safely
> ignore this? Or is there any way/config to fix this problem
>
> Few occurrence for these events as follows:---
>
> 
> 2017-03-14 17:16:09.500370 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 started
> 2017-03-14 17:16:09.500374 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: memtable #1
> done
> 2017-03-14 17:16:09.500376 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": 1489511769500289,
> "job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, 0, 0],
> "immutable_memtables": 0}
> 2017-03-14 17:16:09.500382 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-17:16:09.500330) [default] Level summary: base level 1 max bytes
> base 268435456 files[2 4 6 0 0 0 0] max score 0.76
>
> 2017-03-14 17:16:09.500390 7fedeba61700  4 rocksdb: [JOB 17] Try to delete
> WAL files size 244090350, prev total WAL file size 247331500, number of live
> WAL files 2.
>
> 2017-03-14 17:34:11.610513 7fedf3a71700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6

These errors come from here.

void KernelDevice::aio_submit(IOContext *ioc)
{
...
int r = aio_queue.submit(*cur, );
if (retries)
  derr << __func__ << " retries " << retries << dendl;

The submit function is this one which calls libaio's io_submit
function directly and increments retries if it receives EAGAIN.

#if defined(HAVE_LIBAIO)
int FS::aio_queue_t::submit(aio_t , int *retries)
{
  // 2^16 * 125us = ~8 seconds, so max sleep is ~16 seconds
  int attempts = 16;
  int delay = 125;
  iocb *piocb = 
  while (true) {
int r = io_submit(ctx, 1, ); <-NOTE
if (r < 0) {
  if (r == -EAGAIN && attempts-- > 0) { <-NOTE
usleep(delay);
delay *= 2;
(*retries)++;
continue;
  }
  return r;
}
assert(r == 1);
break;
  }
  return 0;
}


>From the man page.

IO_SUBMIT(2)   Linux Programmer's
Manual  IO_SUBMIT(2)

NAME
   io_submit - submit asynchronous I/O blocks for processing
...
RETURN VALUE
   On success, io_submit() returns the number of iocbs submitted
(which may be 0 if nr is zero).  For the  failure
   return, see NOTES.

ERRORS
   EAGAIN Insufficient resources are available to queue any iocbs.

I suspect increasing bdev_aio_max_queue_depth may help here but some
of the other devs may have more/better ideas.

> 2017-03-14 17:34:11.672123 7fedf226e700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6
> 2017-03-14 17:34:11.696253 7fedf2a6f700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10
> 2017-03-14 17:34:11.739016 7fedf3a71700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10
> 2017-03-14 17:51:08.808848 7fee05294700  4 rocksdb: reusing log 57 from
> recycle list
> =
>
>
> =
> 2017-03-14 18:32:06.910308 7fedeb260700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1489516326910303, "job": 27, "event": "table_file_deletion",
> "file_number": 69}
> 2017-03-14 18:32:06.910326 7fedeb260700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1489516326910325, "job": 27, "event": "table_file_deletion",
> "file_number": 51}
> 2017-03-14 18:32:06.910343 7fedeb260700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1489516326910342, "job": 27, "event": "table_file_deletion",
> "file_number": 50}
> 2017-03-14 18:37:37.139963 7fedf3270700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10
> 2017-03-14 18:37:37.144147 7fedf4a73700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 9
> 2017-03-14 18:37:37.179302 7fedf226e700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10
> 2017-03-14 18:37:37.205677 7fedf2a6f700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11
> 2017-03-14 18:37:37.226650 7fedf5274700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11
> 2017-03-14 18:37:37.234589 7fedf3a71700 -1
> bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11
> 2017-03-14 19:01:53.494364 7fee05294700  4 rocksdb: reusing log 61 from
> recycle list
> ==
>
> ==
> 2017-03-14 19:38:04.946439 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-19:38:04.946355) EVENT_LOG_v1 {"time_micros": 1489520284946348,
> "job": 29, "event": "flush_finished", "lsm_state": [2, 4, 13, 0, 0, 0, 0],
> "immutable_memtables": 0}
> 2017-03-14 19:38:04.946441 7fedeba61700  4 rocksdb: (Original Log Time
> 2017/03/14-19:38:04.946383) [default] Level summary: base level 1 max bytes
> base 268435456 files[2 4 13 0 0 0 0] max score 0.85
>
> 2017-03-14 

Re: [ceph-users] Ceph Bluestore

2017-03-15 Thread Michał Chybowski



W dniu 15.03.2017 o 09:05, Eneko Lacunza pisze:

Hi Michal,

El 14/03/17 a las 23:45, Michał Chybowski escribió:


I'm going to set up a small cluster (5 nodes with 3 MONs, 2 - 4 HDDs 
per node) to test if ceph in such small scale is going to perform 
good enough to put it into production environment (or does it perform 
well only if there are tens of OSDs, etc.).
Are there any "do's" and "don'ts" in matter of OSD storage type 
(bluestore / xfs / ext4 / btrfs), correct 
"journal-to-storage-drive-size" ratio and monitor placement in very 
limited space (dedicated machines just for MONs are not an option).


You don't tell us what this cluster will be used for. I have several 
tiny ceph clusters (3 nodes) in production for some years now, ceph 
nodes usually do mon+osd+virtualization.
My bad, it will provide disks for VMs, maybe in future it'll be expanded 
a lot to provide backend for object storage.


They perform quite good for their use case (VMs only use heavy I/O 
rarely), but I have always built the clusters with SSDs for journals. 
I have seen better performance with this setup than some entry-level 
EMC disk enclosures; I always thought this was a misconfiguration 
problem on the other enclosure provider though! :)


Cheers
Eneko




--
Michał Chybowski

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bluestore

2017-03-15 Thread Michał Chybowski



Hello,

your subject line has little relevance to your rather broad questions.

On Tue, 14 Mar 2017 23:45:26 +0100 Michał Chybowski wrote:


Hi,

I'm going to set up a small cluster (5 nodes with 3 MONs, 2 - 4 HDDs per
node) to test if ceph in such small scale is going to perform good
enough to put it into production environment (or does it perform well
only if there are tens of OSDs, etc.).

So we are to assume that this is a test cluster (but resembling your
deployment plans) and that you have little to no Ceph experience, right?
Exactly. If test cluster will perform "well enough", we'll deploy the 
same setup for production use.


Ceph is definitely a scale-out design (more OSDs are better) but that of
course depends on your workload, expectations and actual HW/design.

For a very harsh look at things, a cluster with 3 nodes and one OSD (HDD)
each will only be as "fast" as a single HDD plus the latencies introduced
by the network, replication and OSD code overhead.
Made even worse (twice as much) by an inline journal.
And you get to spend the 3 times the money for that "pleasure".
So compared to local storage Ceph is going perform in the "mediocre" to
"abysmal" range.
Local storage is not even being compared here as it's a SPOF which I'm 
trying to eliminate. Mainly ceph will be used to provide rbd columes to 
xenserver VMs with replication factor of 3 (safety first, I know it'll 
cost a lot more than local storage / 2 replicas), but eventually it 
might be used to serve also as RGW backend.



Are there any "do's" and "don'ts" in matter of OSD storage type
(bluestore / xfs / ext4 / btrfs), correct
"journal-to-storage-drive-size" ratio and monitor placement in very
limited space (dedicated machines just for MONs are not an option).


Lots of answers in the docs and this ML, search for them.

If you're testing for something that won't be in production before the end
of this year, look at Bluestore.
Which incidentally has no journal (but can benefit from fast storage for
similar reasons, WAL etc) and where people have no to little experience
what ratios and sizes are "good".

Also with looking at Bluestore ante portas, I wouldn't consider BTRFS or
ZFS at this time, too much of a specialty case for the uninitiated.

Which leaves you with XFS or EXT4 for immediate deployment needs, with
EXT4 being deprecated (for RGW users).
I found EXT4 a better fit for our needs (just RBD) in all the years I
tested and compared it with XFS, but if you want to go down the path of
least resistance and have a large pool of people to share your problems
with, XFS is your only choice at this time.
Why (how) EXT4 got "deprecated" for RGW use? Also could you give me any 
comparison between EXT4 and XFS (latency, throughput, etc)?


If your machines are powerful enough, co-sharing MONs is not an issue.

Christian


--
Michał Chybowski

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bluestore

2017-03-15 Thread Eneko Lacunza

Hi Michal,

El 14/03/17 a las 23:45, Michał Chybowski escribió:


I'm going to set up a small cluster (5 nodes with 3 MONs, 2 - 4 HDDs 
per node) to test if ceph in such small scale is going to perform good 
enough to put it into production environment (or does it perform well 
only if there are tens of OSDs, etc.).
Are there any "do's" and "don'ts" in matter of OSD storage type 
(bluestore / xfs / ext4 / btrfs), correct 
"journal-to-storage-drive-size" ratio and monitor placement in very 
limited space (dedicated machines just for MONs are not an option).


You don't tell us what this cluster will be used for. I have several 
tiny ceph clusters (3 nodes) in production for some years now, ceph 
nodes usually do mon+osd+virtualization.


They perform quite good for their use case (VMs only use heavy I/O 
rarely), but I have always built the clusters with SSDs for journals. I 
have seen better performance with this setup than some entry-level EMC 
disk enclosures; I always thought this was a misconfiguration problem on 
the other enclosure provider though! :)


Cheers
Eneko


--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
  943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mkjournal error creating journal ... : (13) Permission denied

2017-03-15 Thread Gunwoo Gim
 After a reboot, all the partitions of LVM don't show up in /dev/mapper
-nor in the /dev/dm- or /proc/partitions- though the whole disks
show up; I have to make the hosts run one 'partprobe' every time they boot
so as to have the partitions all show up.

 I've found out that the udev rules have never triggered even when I
removed the DEVTYPE checking part; checked with a udev
line: RUN+="/bin/echo 'add /dev/$name' >> /root/log.txt"
 I've also tried chowning all the /dev/dm- to ceph:disk in vain. Do I
have to use the udev rules even if the /dev/dm- s are already owned by
ceph:ceph?

 Thank you very much for reading.

Best Regards,
Nicholas.

On Wed, Mar 15, 2017 at 1:06 AM Gunwoo Gim  wrote:

>  Thank you very much, Peter.
>
>  I'm sorry for not clarifying the version number; it's kraken and
> 11.2.0-1xenial.
>
>  I guess the udev rules in this file are supposed to change them :
> /lib/udev/rules.d/95-ceph-osd.rules
>  ...but the rules' filters don't seem to match the DEVTYPE part of the
> prepared partitions on the LVs I've got on the host.
>
>  Would it have been the cause of trouble? I'd love to be informed of a
> good way to make it work with the logical volumes; should I fix the udev
> rule?
>
> ~ # cat /lib/udev/rules.d/95-ceph-osd.rules | head -n 19
> # OSD_UUID
> ACTION=="add", SUBSYSTEM=="block", \
>   ENV{DEVTYPE}=="partition", \
>   ENV{ID_PART_ENTRY_TYPE}=="4fbd7e29-9d25-41b8-afd0-062c0ceff05d", \
>   OWNER:="ceph", GROUP:="ceph", MODE:="660", \
>   RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
> ACTION=="change", SUBSYSTEM=="block", \
>   ENV{ID_PART_ENTRY_TYPE}=="4fbd7e29-9d25-41b8-afd0-062c0ceff05d", \
>   OWNER="ceph", GROUP="ceph", MODE="660"
>
> # JOURNAL_UUID
> ACTION=="add", SUBSYSTEM=="block", \
>   ENV{DEVTYPE}=="partition", \
>   ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
>   OWNER:="ceph", GROUP:="ceph", MODE:="660", \
>   RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
> ACTION=="change", SUBSYSTEM=="block", \
>   ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
>   OWNER="ceph", GROUP="ceph", MODE="660"
>
>
> ~ # udevadm info /dev/mapper/vg--ssd1-lv--ssd1p1 | grep ID_PART_ENTRY_TYPE
> E: ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
>
> ~ # udevadm info /dev/mapper/vg--ssd1-lv--ssd1p1 | grep DEVTYPE
> E: DEVTYPE=disk
>
>
> Best Regards,
> Nicholas.
>
> On Tue, Mar 14, 2017 at 6:37 PM Peter Maloney <
> peter.malo...@brockmann-consult.de> wrote:
>
> Is this Jewel? Do you have some udev rules or anything that changes the
> owner on the journal device (eg. /dev/sdx or /dev/nvme0n1p1) to ceph:ceph?
>
>
> On 03/14/17 08:53, Gunwoo Gim wrote:
>
> I'd love to get helped out; it'd be much appreciated.
>
> Best Wishes,
> Nicholas.
>
> On Tue, Mar 14, 2017 at 4:51 PM Gunwoo Gim  wrote:
>
>  Hello, I'm trying to deploy a ceph filestore cluster with LVM using
> ceph-ansible playbook. I've been fixing a couple of code blocks in
> ceph-ansible and ceph-disk/main.py and made some progress but now I'm stuck
> again; 'ceph-disk activate osd' fails.
>
>  Please let me just show you the error message and the output of 'ls':
>
> ~ # ceph-disk -v activate /dev/mapper/vg--hdd1-lv--hdd1p1
>
> [...]
>
> ceph_disk.main.Error: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs',
> '--mkkey', '-i', u'1', '--monmap',
> '/var/lib/ceph/tmp/mnt.cJDc7I/activate.monmap', '--osd-data',
> '/var/lib/ceph/tmp/mnt.cJDc7I', '--osd-journal',
> '/var/lib/ceph/tmp/mnt.cJDc7I/journal', '--osd-uuid',
> u'5097be3f-349e-480d-8b0d-d68c13ae2f72', '--keyring',
> '/var/lib/ceph/tmp/mnt.cJDc7I/keyring', '--setuser', 'ceph', '--setgroup',
> 'ceph'] failed : 2017-03-14 16:01:10.051537 7fdc9a025a40 -1
> filestore(/var/lib/ceph/tmp/mnt.cJDc7I) mkjournal error creating journal on
> /var/lib/ceph/tmp/mnt.cJDc7I/journal: (13) Permission denied
> 2017-03-14 16:01:10.051565 7fdc9a025a40 -1 OSD::mkfs: ObjectStore::mkfs
> failed with error -13
> 2017-03-14 16:01:10.051624 7fdc9a025a40 -1  ** ERROR: error creating empty
> object store in /var/lib/ceph/tmp/mnt.cJDc7I: (13) Permission denied
>
> ~ # ls -al /var/lib/ceph/tmp
> total 8
> drwxr-xr-x  2 ceph ceph 4096 Mar 14 16:01 .
> drwxr-xr-x 11 ceph ceph 4096 Mar 14 11:12 ..
> -rwxr-xr-x  1 root root0 Mar 14 11:12 ceph-disk.activate.lock
> -rwxr-xr-x  1 root root0 Mar 14 11:44 ceph-disk.prepare.lock
>
>
> ~ # ls -l /dev/mapper/vg-*-lv-*p*
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd1-lv--hdd1p1 ->
> ../dm-12
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd2-lv--hdd2p1 ->
> ../dm-14
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd3-lv--hdd3p1 ->
> ../dm-16
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd4-lv--hdd4p1 ->
> ../dm-18
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd5-lv--hdd5p1 ->
> ../dm-20
> lrwxrwxrwx 1 root root 8 Mar 14 13:46 /dev/mapper/vg--hdd6-lv--hdd6p1 ->
> ../dm-22
> lrwxrwxrwx 1 root root 

Re: [ceph-users] osd_disk_thread_ioprio_priority help

2017-03-15 Thread Florian Haas
On Wed, Mar 15, 2017 at 2:41 AM, Alex Gorbachev  
wrote:
> On Mon, Mar 13, 2017 at 6:09 AM, Florian Haas  wrote:
>> On Mon, Mar 13, 2017 at 11:00 AM, Dan van der Ster  
>> wrote:
 I'm sorry, I may have worded that in a manner that's easy to
 misunderstand. I generally *never* suggest that people use CFQ on
 reasonably decent I/O hardware, and thus have never come across any
 need to set this specific ceph.conf parameter.
>>>
>>> OTOH, cfq *does* help our hammer clusters. deadline's default
>>> behaviour is to delay writes up to 5 seconds if the disk is busy
>>> reading -- which it is, of couse, while deep scrubbing. And deadline
>>> does not offer any sort of fairness between processes accessing the
>>> same disk (which is admittedly less of an issue in jewel). But back in
>>> hammer days it was nice to be able to make the disk threads only read
>>> while the disk was otherwise idle.
>>
>> Thanks for pointing out the default 5000-ms write deadline. We
>> frequently tune that down to 1500ms. Disabling front merges also
>> sometimes seems to help.
>>
>> For the archives: those settings are in
>> /sys/block/*/queue/iosched/{write_expire,front_merges} and can be
>> persisted on Debian/Ubuntu with sysfsutils.
>
> Hey Florian :).

Привет Лёша! Long time no talk. :)

> I wrote a quick udev rule to do this on Ubuntu, here is is for others'
> reference.  Also saw earlier a recommendation to increase
> read_ahead_kb to 4096 for slower spinning disks
>
> root@croc1:/etc/udev/rules.d# cat 99-storcium-hdd.rules
>
> # Set write deadline to 1500 and disable front merges, and increase
> read_ahead_kb to 4096
> ACTION=="add|change", KERNEL=="sd*", SUBSYSTEM=="block",
> ATTR{queue/iosched/write_expire}="1500",
> ATTR{queue/iosched/front_merges}="0", ATTR{queue/read_ahead_kb}="4096"

Sure, that's an entirely viable alternative to using sysfsutils.
Indeed, rather a more elegant and contemporary one. :)

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-15 Thread nokia ceph
Hello,

We suspect these messages not only at the time of OSD creation. But in idle
conditions also. May I know what is the impact of these error? Can we
safely ignore this? Or is there any way/config to fix this problem

Few occurrence for these events as follows:---


2017-03-14 17:16:09.500370 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 started
2017-03-14 17:16:09.500374 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: memtable #1
done
2017-03-14 17:16:09.500376 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": 1489511769500289,
"job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, 0, 0],
"immutable_memtables": 0}
2017-03-14 17:16:09.500382 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-17:16:09.500330) [default] Level summary: base level 1 max bytes
base 268435456 files[2 4 6 0 0 0 0] max score 0.76

2017-03-14 17:16:09.500390 7fedeba61700  4 rocksdb: [JOB 17] Try to delete
WAL files size 244090350, prev total WAL file size 247331500, number of
live WAL files 2.

*2017-03-14 17:34:11.610513 7fedf3a71700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6*
*2017-03-14 17:34:11.672123 7fedf226e700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6*
*2017-03-14 17:34:11.696253 7fedf2a6f700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10*
*2017-03-14 17:34:11.739016 7fedf3a71700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10*
2017-03-14 17:51:08.808848 7fee05294700  4 rocksdb: reusing log 57 from
recycle list
=


=
2017-03-14 18:32:06.910308 7fedeb260700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1489516326910303, "job": 27, "event":
"table_file_deletion", "file_number": 69}
2017-03-14 18:32:06.910326 7fedeb260700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1489516326910325, "job": 27, "event":
"table_file_deletion", "file_number": 51}
2017-03-14 18:32:06.910343 7fedeb260700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1489516326910342, "job": 27, "event":
"table_file_deletion", "file_number": 50}
*2017-03-14 18:37:37.139963 7fedf3270700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10*
*2017-03-14 18:37:37.144147 7fedf4a73700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 9*
*2017-03-14 18:37:37.179302 7fedf226e700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10*
*2017-03-14 18:37:37.205677 7fedf2a6f700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
*2017-03-14 18:37:37.226650 7fedf5274700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
*2017-03-14 18:37:37.234589 7fedf3a71700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
2017-03-14 19:01:53.494364 7fee05294700  4 rocksdb: reusing log 61 from
recycle list
==

==
2017-03-14 19:38:04.946439 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-19:38:04.946355) EVENT_LOG_v1 {"time_micros": 1489520284946348,
"job": 29, "event": "flush_finished", "lsm_state": [2, 4, 13, 0, 0, 0, 0],
"immutable_memtables": 0}
2017-03-14 19:38:04.946441 7fedeba61700  4 rocksdb: (Original Log Time
2017/03/14-19:38:04.946383) [default] Level summary: base level 1 max bytes
base 268435456 files[2 4 13 0 0 0 0] max score 0.85

2017-03-14 19:38:04.946461 7fedeba61700  4 rocksdb: [JOB 29] Try to delete
WAL files size 244060984, prev total WAL file size 247581163, number of
live WAL files 2.

*2017-03-14 20:01:43.916418 7fedf4272700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 10*
*2017-03-14 20:01:43.951939 7fedf126c700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
*2017-03-14 20:01:43.960599 7fedf3270700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
*2017-03-14 20:01:43.969409 7fedf3a71700 -1
bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 11*
2017-03-14 20:13:04.291160 7fee05294700  4 rocksdb: reusing log 85 from
recycle list

2017-03-14 20:13:04.291254 7fee05294700  4 rocksdb: [default] New memtable
created with log file: #89. Immutable memtables: 0.
=

Thanks


On Wed, Mar 15, 2017 at 11:18 AM, nokia ceph 
wrote:

> Hello,
>
> Can we get any update for this problem?
>
> Thanks
>
>
> On Thu, Mar 2, 2017 at 2:16 PM, nokia ceph 
> wrote:
>
>> Hello,
>>
>> Env:- v11.2.0 - bluestore - EC 3 + 1
>>
>> We are getting below entries both in /var/log/messages and osd logs. May
>> I know what is the impact of the below message and as these message were
>> flooded in osd and sys logs.
>>
>> ~~~
>>
>> 2017-03-01 13:00:59.938839 7f6c96915700 -1 
>> bdev(/var/lib/ceph/osd/ceph-0/block)
>> aio_submit retries 2
>>
>> 2017-03-01 13:00:59.940939 7f6c96915700 -1 
>> bdev(/var/lib/ceph/osd/ceph-0/block)
>> aio_submit retries 4
>>
>> 2017-03-01 13:00:59.941126 7f6c96915700 -1 
>> bdev(/var/lib/ceph/osd/ceph-0/block)
>> aio_submit retries 1
>>

[ceph-users] Directly addressing files on individual OSD

2017-03-15 Thread Youssef Eldakar
We currently run a commodity cluster that supports a few petabytes of data. 
Each node in the cluster has 4 drives, currently mounted as /0 through /3. We 
have been researching alternatives for managing the storage, Ceph being one 
possibility, iRODS being another. For preservation purposes, we would like each 
file to exist as one whole piece per drive (as opposed to being striped across 
multiple drives). It appears this is the default in Ceph.

Now, it has always been convenient for us to run distributed jobs over SSH to, 
for instance, compile a list of checksums of all files in the cluster:

dsh -Mca 'find /{0..3}/items -name \*.warc.gz | xargs md5sum 
>/tmp/$HOSTNAME.md5sum'

And that nicely allows each node to process its own files using the local CPU.

Would this scenario still be possible where Ceph is managing the storage?

Thanks in advance for any feedback.

Youssef Eldakar
Bibliotheca Alexandrina
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com