[ceph-users] cephfs filesystem size

2015-09-23 Thread Dan Nica
Hi,

Can I set the size on cephfs ? when I mount the fs on the clients I see that 
the partition size is the whole cluster storage...

Thank
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Antw: Hammer reduce recovery impact

2015-09-23 Thread Steffen Weißgerber
Based on the book 'Learning Ceph' 
(https://www.packtpub.com/application-development/learning-ceph),
chapter performance tuning, we swapped the values for osd_recovery_op_priority
and osd_client_op_priority to 60 and 40.

"... osd recovery op priority: This is
 the priority set for recovery operation. Lower the number, higher the recovery 
priority.
 Higher recovery priority might cause performance degradation until recovery 
completes. "

So when setting the value for recovery_op_priority higher then the value for
client_op_priority the client requests should have higher priority than 
recovery requests.

Since setting the parameters to these values (Giant 0.87.2) our client 
performance is fine
when OSD's are removed or added to the cluster (e.g adding 25 osd's to the 
cluster,
having 20 OSD's until then).
Increasing osd_max_backfills higher than 2 again reduces the effect.

Maybe this helps.

Regards

Steffen


>>> Robert LeBlanc  10.09.2015 22:56 >>>
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We are trying to add some additional OSDs to our cluster, but the
impact of the backfilling has been very disruptive to client I/O and
we have been trying to figure out how to reduce the impact. We have
seen some client I/O blocked for more than 60 seconds. There has been
CPU and RAM head room on the OSD nodes, network has been fine, disks
have been busy, but not terrible.

11 OSD servers: 10 4TB disks with two Intel S3500 SSDs for journals
(10GB), dual 40Gb Ethernet, 64 GB RAM, single CPU E5-2640 Quanta
S51G-1UL.

Clients are QEMU VMs.

[ulhglive-root@ceph5 current]# ceph --version
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)

Some nodes are 0.94.3

[ulhglive-root@ceph5 current]# ceph status
cluster 48de182b-5488-42bb-a6d2-62e8e47b435c
 health HEALTH_WARN
3 pgs backfill
1 pgs backfilling
4 pgs stuck unclean
recovery 2382/33044847 objects degraded (0.007%)
recovery 50872/33044847 objects misplaced (0.154%)
noscrub,nodeep-scrub flag(s) set
 monmap e2: 3 mons at
{mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0}
election epoch 180, quorum 0,1,2 mon1,mon2,mon3
 osdmap e54560: 125 osds: 124 up, 124 in; 4 remapped pgs
flags noscrub,nodeep-scrub
  pgmap v10274197: 2304 pgs, 3 pools, 32903 GB data, 8059 kobjects
128 TB used, 322 TB / 450 TB avail
2382/33044847 objects degraded (0.007%)
50872/33044847 objects misplaced (0.154%)
2300 active+clean
   3 active+remapped+wait_backfill
   1 active+remapped+backfilling
recovery io 70401 kB/s, 16 objects/s
  client io 93080 kB/s rd, 46812 kB/s wr, 4927 op/s

Each pool is size 4 with min_size 2.

One problem we have is that the requirements of the cluster changed
after setting up our pools, so our PGs are really out of wack. Our
most active pool has only 256 PGs and each PG is about 120 GB is size.
We are trying to clear out a pool that has way too many PGs so that we
can split the PGs in that pool. I think these large PGs is part of our
issues.

Things I've tried:

* Lowered nr_requests on the spindles from 1000 to 100. This reduced
the max latency sometimes up to 3000 ms down to a max of 500-700 ms.
it has also reduced the huge swings in  latency, but has also reduced
throughput somewhat.
* Changed the scheduler from deadline to CFQ. I'm not sure if the the
OSD process gives the recovery threads a different disk priority or if
changing the scheduler without restarting the OSD allows the OSD to
use disk priorities.
* Reduced the number of osd_max_backfills from 2 to 1.
* Tried setting noin to give the new OSDs time to get the PG map and
peer before starting the backfill. This caused more problems than
solved as we had blocked I/O (over 200 seconds) until we set the new
OSDs to in.

Even adding one OSD disk into the cluster is causing these slow I/O
messages. We still have 5 more disks to add from this server and four
more servers to add.

In addition to trying to minimize these impacts, would it be better to
split the PGs then add the rest of the servers, or add the servers
then do the PG split. I'm thinking splitting first would be better,
but I'd like to get other opinions.

No spindle stays at high utilization for long and the await drops
below 20 ms usually within 10 seconds so I/O should be serviced
"pretty quick". My next guess is that the journals are getting full
and blocking while waiting for flushes, but I'm not exactly sure how
to identify that. We are using the defaults for the journal except for
size (10G). We'd like to have journals large to handle bursts, but if
they are getting filled with backfill traffic, it may be counter
productive. Can/does backfill/recovery bypass the journal?

Thanks,

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

Re: [ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet
Le mercredi 23 septembre 2015 à 13:41 +0200, Wido den Hollander a écrit
 :
> Hmm, that is weird. It works for me here from the Netherlands via
> IPv6:

You're right, I checked from other providers and it works.

So, a problem between Free (France) and Dreamhost ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Antw: Hammer reduce recovery impact

2015-09-23 Thread Dan van der Ster
On Wed, Sep 23, 2015 at 1:44 PM, Steffen Weißgerber
 wrote:
> "... osd recovery op priority: This is
>  the priority set for recovery operation. Lower the number, higher the 
> recovery priority.
>  Higher recovery priority might cause performance degradation until recovery 
> completes. "
>
> So when setting the value for recovery_op_priority higher then the value for
> client_op_priority the client requests should have higher priority than 
> recovery requests.

I don't think so. The op priorities are implemented as a weighted
priority queue -- priority is the weight given to those ops. bigger
value here means more relative priority.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon always election when change crushmap in firefly

2015-09-23 Thread Sage Weil
On Wed, 23 Sep 2015, Alexander Yang wrote:
> hello,
> We use Ceph+Openstack in our private cloud. In our cluster, we have
> 5 mons and 800 osds, the Capacity is about 1Pb. And run about 700 vms and
> 1100 volumes,
> recently, we increase our pg_num , now the cluster have about 7
> pgs. In my real intention? I want every osd have 100pgs. but after increase
> pg_num, I find I'm wrong. Because the different crush weight for different
> osd, the osd's pg_num is different, some osd have exceed  500pgs.
> Now, the problem is  appear?cause some reason when i want to change
> some osd  weight, that means change the crushmap.  This change cause about
> 0.03% data to migrate. the mon is always begin to election. It's will hung
> the cluster, and when they end, the  original  leader still is the new
> leader. And during the mon eclection?On the upper layer, vm have too many
> slow request will appear. so now i dare to do any operation about change
> crushmap. But i worry about an important thing, If  when our cluster  down
>  one host even down one rack.   By the time, the cluster curshmap will
> change large, and the migrate data also large. I worry the cluster will
> hung  long time. and result on upper layer, all vm became to  shutdown.
> In my opinion, I guess when I change the crushmap,* the leader mon
> maybe calculate the too many information*, or* too many client want to get
> the new crushmap from leader mon*.  It must be hung the mon thread, so the
> leader mon can't heatbeat to other mons, the other mons think the leader is
> down then begin the new election.  I am sorry if i guess is wrong.
> The crushmap in accessory. So who can give me some advice or guide,
> Thanks very much!

There were huge improvements made in hammer in terms of mon efficiency in 
these cases where it is under load.  I recommend upgrading as that will 
help.

You can also mitigate the problem somewhat by adjusting the mon_lease and 
associated settings up.  Scale all of mon_lease, mon_lease_renew_interval, 
mon_lease_ack_timeout, mon_accept_timeout by 2x or 3x.

It also sounds like you may be using some older tunables/settings 
for your pools or crush rules.  Can you attach the output of 'ceph osd 
dump' and 'ceph osd crush dump | tail -n 20' ?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] lttng duplicate registration problem when using librados2 and libradosstriper

2015-09-23 Thread Paul Mansfield
On 22/09/15 19:48, Jason Dillaman wrote:
> It's not the best answer, but it is the reason why it is currently
> disabled on RHEL 7.  Best bet for finding a long-term solution is
> still probably attaching with gdb and catching the abort function
> call.  Once the offending probe can be found, we can figure out how to
fix it.

I tried gdb and strace. I didn't find anything that gave me any insight.

Here's running it with gdb. I've not used gdb in anger in years, so
quite possibly I'm doing it wrongly

$ gdb ./testprogram
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /foo/bar/testprogram...done.
(gdb) handle SIGABRT stop nopass
SignalStop  Print   Pass to program Description
SIGABRT   Yes   Yes No  Aborted
(gdb) start
Temporary breakpoint 1 at 0x4017ac: file testprogram, line 184.
Starting program: /foo/bar/testprogram
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffed9da700 (LWP 53014)]
[New Thread 0x7fffed1d9700 (LWP 53015)]
LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate
registration of tracepoint probes having the same name is not allowed.

Program received signal SIGABRT, Aborted.
0x724b8925 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
CUnit-2.1.2-6.el6.x86_64 boost-system-1.41.0-18.el6.x86_64
boost-thread-1.41.0-18.el6.x86_64 cassandra-cpp-driver-2.0.1-1.el6.amd64
glibc-2.12-1.132.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64
krb5-libs-1.10.3-15.el6_5.1.x86_64 libcom_err-1.41.12-18.el6.x86_64
libgcc-4.4.7-4.el6.x86_64 librados2-0.94.3-0.el6.x86_64
libradosstriper1-0.94.3-0.el6.x86_64
libselinux-2.0.94-5.3.el6_4.1.x86_64 libstdc++-4.4.7-4.el6.x86_64
libuuid-2.17.2-12.14.el6.x86_64 libuv-1.2.1-1.el6.x86_64
lttng-ust-2.4.1-1.el6.x86_64 nspr-4.10.2-1.el6_5.x86_64
nss-3.15.3-6.el6_5.x86_64 nss-util-3.15.3-1.el6_5.x86_64
openssl-1.0.1e-16.el6_5.7.x86_64 userspace-rcu-0.7.7-1.el6.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) backtrace
#0  0x724b8925 in raise () from /lib64/libc.so.6
#1  0x724ba105 in abort () from /lib64/libc.so.6
#2  0x758c58f4 in ?? () from /usr/lib64/librados.so.2
#3  0x758f4936 in ?? () from /usr/lib64/librados.so.2
#4  0x7fffe9a8 in ?? ()
#5  0x0001 in ?? ()
#6  0x7fffe9a8 in ?? ()
#7  0x7555f51b in _init () from /usr/lib64/librados.so.2
#8  0x77fea000 in ?? ()
#9  0x77deb555 in _dl_init_internal () from
/lib64/ld-linux-x86-64.so.2
#10 0x77dddb3a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#11 0x0001 in ?? ()
#12 0x7fffec44 in ?? ()
#13 0x in ?? ()



This didn't tell me much. I tried using "nm" on the librados and
libradosstriper libraries and there was no symbol information.


I also tried strace which revealed two sub processes

$ grep "/dev/shm/lttng-ust" strace.out
[pid 49682] open("/dev/shm/lttng-ust-wait-5",
O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
[pid 49683] open("/dev/shm/lttng-ust-wait-5-2489",
O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3



$ grep "pid 49682" strace.out | more
[pid 49682] set_robust_list(0x7fe69cb5b9e0, 0x18 
[pid 49682] <... set_robust_list resumed> ) = 0
[pid 49682] socket(PF_FILE, SOCK_STREAM, 0 
[pid 49682] <... socket resumed> )  = 3
[pid 49682] fcntl(3, F_SETFD, FD_CLOEXECProcess 49683 attached
[pid 49682] connect(3, {sa_family=AF_FILE,
path="/var/run/lttng/lttng-ust-sock-5"}, 110 
[pid 49682] <... connect resumed> ) = -1 ENOENT (No such file or
directory)
[pid 49682] close(3 
[pid 49682] <... close resumed> )   = 0
[pid 49682] statfs("/dev/shm/",  
[pid 49682] <... statfs resumed> {f_type=0x1021994, f_bsize=4096,
f_blocks=8242437, f_bfree=8242435, f_bavail=8242435, f_files=8242437,
f_ffree=8242434, f_fsid={0, 0}, f_n
amelen=255, f_frsize=4096}) = 0
[pid 49682] futex(0x7fe69fd6b300, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 49682] open("/dev/shm/lttng-ust-wait-5",
O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
[pid 49682] fcntl(3, F_GETFD)   = 0x1 (flags FD_CLOEXEC)
[pid 49682] read(3, "\0\0\0\0", 4)  = 4
[pid 49682] mmap(NULL, 4096, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe6a717b000
[pid 49682] close(3)= 0
[pid 49682] futex(0x7fe6a13f15e0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 49682] futex(0x7fe6a717b000, FUTEX_WAIT, 0, NULL 
[pid 49682] +++ killed by SIGABRT (core dumped) +++


$ grep "pid 49683" strace.out | more
[pid 49683] set_robust_list(0x7fe69c35a9e0, 0x18 
[pid 49683] <... set_robust_list resumed> ) = 0
[pid 49683] 

[ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet
Hi,

since several hours http://ceph.com/ doesn't reply anymore in IPv6.
It pings, and we can open TCP socket, but nothing more :


~$ nc -w30 -v -6 ceph.com 80
Connection to ceph.com 80 port [tcp/http] succeeded!
GET / HTTP/1.0
Host: ceph.com




But, a HEAD query works :

~$ nc -w30 -v -6 ceph.com 80
Connection to ceph.com 80 port [tcp/http] succeeded!
HEAD / HTTP/1.0
Host: ceph.com
HTTP/1.0 200 OK
Date: Wed, 23 Sep 2015 11:35:27 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16
X-Powered-By: PHP/5.4.16
Set-Cookie: PHPSESSID=q0jf4mh9rqfk5du4kn8tcnqen1; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, 
pre-check=0
Pragma: no-cache
X-Pingback: http://ceph.com/xmlrpc.php
Link: ; rel=shortlink
Connection: close
Content-Type: text/html; charset=UTF-8



So, from my browser the website is unavailable.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Antw: Re: Antw: Hammer reduce recovery impact

2015-09-23 Thread Steffen Weißgerber


>>> Dan van der Ster  schrieb am Mittwoch, 23.
September 2015
um 14:04:
> On Wed, Sep 23, 2015 at 1:44 PM, Steffen Weißgerber
>  wrote:
>> "... osd recovery op priority: This is
>>  the priority set for recovery operation. Lower the number, higher
the 
> recovery priority.
>>  Higher recovery priority might cause performance degradation until
recovery 
> completes. "
>>
>> So when setting the value for recovery_op_priority higher then the
value for
>> client_op_priority the client requests should have higher priority
than 
> recovery requests.
> 
> I don't think so. The op priorities are implemented as a weighted
> priority queue -- priority is the weight given to those ops. bigger
> value here means more relative priority.
> 
> Cheers, Dan

Yes, normally I would think so too and was thinking when reading the
chapter in
the book for the first time.

Nevertheless my experience is now different and I'm happy to have
configuration
whithout the fear to loose client performance when the cluster
recovers.

Can anybody check this in a test environment?

Regards

Steffen

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph.com IPv6 down

2015-09-23 Thread Wido den Hollander


On 23-09-15 13:38, Olivier Bonvalet wrote:
> Hi,
> 
> since several hours http://ceph.com/ doesn't reply anymore in IPv6.
> It pings, and we can open TCP socket, but nothing more :
> 
> 
> ~$ nc -w30 -v -6 ceph.com 80
> Connection to ceph.com 80 port [tcp/http] succeeded!
> GET / HTTP/1.0
> Host: ceph.com
> 
> 
> 
> 
> But, a HEAD query works :
> 
> ~$ nc -w30 -v -6 ceph.com 80
> Connection to ceph.com 80 port [tcp/http] succeeded!
> HEAD / HTTP/1.0
> Host: ceph.com
> HTTP/1.0 200 OK
> Date: Wed, 23 Sep 2015 11:35:27 GMT
> Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16
> X-Powered-By: PHP/5.4.16
> Set-Cookie: PHPSESSID=q0jf4mh9rqfk5du4kn8tcnqen1; path=/
> Expires: Thu, 19 Nov 1981 08:52:00 GMT
> Cache-Control: no-store, no-cache, must-revalidate, post-check=0, 
> pre-check=0
> Pragma: no-cache
> X-Pingback: http://ceph.com/xmlrpc.php
> Link: ; rel=shortlink
> Connection: close
> Content-Type: text/html; charset=UTF-8
> 
> 

Hmm, that is weird. It works for me here from the Netherlands via IPv6:

wido@wido-desktop:~$ curl -v -X GET http://ceph.com/
* Hostname was NOT found in DNS cache
*   Trying 2607:f298:6050:51f3:f816:3eff:fe62:31d3...
* Connected to ceph.com (2607:f298:6050:51f3:f816:3eff:fe62:31d3) port
80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: ceph.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 23 Sep 2015 11:39:53 GMT
* Server Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16 is not
blacklisted
< Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16
< X-Powered-By: PHP/5.4.16
< Set-Cookie: PHPSESSID=cvm1cqfip4504db073to2v4me7; path=/
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0,
pre-check=0
< Pragma: no-cache
< X-Pingback: http://ceph.com/xmlrpc.php
< Link: ; rel=shortlink
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
<

I also tried via my browser on my Ubuntu desktop and that works just
fine as well.

Wido

> 
> So, from my browser the website is unavailable.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] failed to open http://apt-mirror.front.sepia.ceph.com

2015-09-23 Thread wangsongbo

Hi Loic and other Cephers,

I am running teuthology-suites in our testing, because the connection to 
"apt-mirror.front.sepia.ceph.com" timed out , "ceph-cm-ansible" failed.

And from a web-browser, I got the response like this : "502 Bad Gateway".
"64.90.32.37 apt-mirror.front.sepia.ceph.com" has been added to /etc/hosts.
Did the resources have been removed ?


Thanks and Regards,
WangSongbo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs filesystem size

2015-09-23 Thread John Spray
Yes, you can set a quota on any directory, although it's only
supported with the userspace client (i.e. ceph-fuse):
http://docs.ceph.com/docs/master/cephfs/quota/

John

On Wed, Sep 23, 2015 at 1:50 PM, Dan Nica  wrote:
> Hi,
>
>
>
> Can I set the size on cephfs ? when I mount the fs on the clients I see that
> the partition size is the whole cluster storage…
>
>
>
> Thank
>
> Dan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] lttng duplicate registration problem when using librados2 and libradosstriper

2015-09-23 Thread Jason Dillaman
It looks like the issue you are experiencing was fixed in the Infernalis/master 
branches [1].  I've opened a new tracker ticket to backport the fix to Hammer 
[2].

-- 

Jason Dillaman 

[1] 
https://github.com/sponce/ceph/commit/e4c27d804834b4a8bc495095ccf5103f8ffbcc1e
[2] http://tracker.ceph.com/issues/13210

- Original Message -
> From: "Paul Mansfield" 
> To: "Jason Dillaman" 
> Cc: ceph-users@lists.ceph.com
> Sent: Wednesday, September 23, 2015 6:25:36 AM
> Subject: Re: [ceph-users] lttng duplicate registration problem when using 
> librados2 and libradosstriper
> 
> On 22/09/15 19:48, Jason Dillaman wrote:
> > It's not the best answer, but it is the reason why it is currently
> > disabled on RHEL 7.  Best bet for finding a long-term solution is
> > still probably attaching with gdb and catching the abort function
> > call.  Once the offending probe can be found, we can figure out how to
> fix it.
> 
> I tried gdb and strace. I didn't find anything that gave me any insight.
> 
> Here's running it with gdb. I've not used gdb in anger in years, so
> quite possibly I'm doing it wrongly
> 
> $ gdb ./testprogram
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> ...
> Reading symbols from /foo/bar/testprogram...done.
> (gdb) handle SIGABRT stop nopass
> SignalStop  Print   Pass to program Description
> SIGABRT   Yes   Yes No  Aborted
> (gdb) start
> Temporary breakpoint 1 at 0x4017ac: file testprogram, line 184.
> Starting program: /foo/bar/testprogram
> [Thread debugging using libthread_db enabled]
> [New Thread 0x7fffed9da700 (LWP 53014)]
> [New Thread 0x7fffed1d9700 (LWP 53015)]
> LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate
> registration of tracepoint probes having the same name is not allowed.
> 
> Program received signal SIGABRT, Aborted.
> 0x724b8925 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> CUnit-2.1.2-6.el6.x86_64 boost-system-1.41.0-18.el6.x86_64
> boost-thread-1.41.0-18.el6.x86_64 cassandra-cpp-driver-2.0.1-1.el6.amd64
> glibc-2.12-1.132.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64
> krb5-libs-1.10.3-15.el6_5.1.x86_64 libcom_err-1.41.12-18.el6.x86_64
> libgcc-4.4.7-4.el6.x86_64 librados2-0.94.3-0.el6.x86_64
> libradosstriper1-0.94.3-0.el6.x86_64
> libselinux-2.0.94-5.3.el6_4.1.x86_64 libstdc++-4.4.7-4.el6.x86_64
> libuuid-2.17.2-12.14.el6.x86_64 libuv-1.2.1-1.el6.x86_64
> lttng-ust-2.4.1-1.el6.x86_64 nspr-4.10.2-1.el6_5.x86_64
> nss-3.15.3-6.el6_5.x86_64 nss-util-3.15.3-1.el6_5.x86_64
> openssl-1.0.1e-16.el6_5.7.x86_64 userspace-rcu-0.7.7-1.el6.x86_64
> zlib-1.2.3-29.el6.x86_64
> (gdb) backtrace
> #0  0x724b8925 in raise () from /lib64/libc.so.6
> #1  0x724ba105 in abort () from /lib64/libc.so.6
> #2  0x758c58f4 in ?? () from /usr/lib64/librados.so.2
> #3  0x758f4936 in ?? () from /usr/lib64/librados.so.2
> #4  0x7fffe9a8 in ?? ()
> #5  0x0001 in ?? ()
> #6  0x7fffe9a8 in ?? ()
> #7  0x7555f51b in _init () from /usr/lib64/librados.so.2
> #8  0x77fea000 in ?? ()
> #9  0x77deb555 in _dl_init_internal () from
> /lib64/ld-linux-x86-64.so.2
> #10 0x77dddb3a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #11 0x0001 in ?? ()
> #12 0x7fffec44 in ?? ()
> #13 0x in ?? ()
> 
> 
> 
> This didn't tell me much. I tried using "nm" on the librados and
> libradosstriper libraries and there was no symbol information.
> 
> 
> I also tried strace which revealed two sub processes
> 
> $ grep "/dev/shm/lttng-ust" strace.out
> [pid 49682] open("/dev/shm/lttng-ust-wait-5",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> [pid 49683] open("/dev/shm/lttng-ust-wait-5-2489",
> O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 3
> 
> 
> 
> $ grep "pid 49682" strace.out | more
> [pid 49682] set_robust_list(0x7fe69cb5b9e0, 0x18 
> [pid 49682] <... set_robust_list resumed> ) = 0
> [pid 49682] socket(PF_FILE, SOCK_STREAM, 0 
> [pid 49682] <... socket resumed> )  = 3
> [pid 49682] fcntl(3, F_SETFD, FD_CLOEXECProcess 49683 attached
> [pid 49682] connect(3, {sa_family=AF_FILE,
> path="/var/run/lttng/lttng-ust-sock-5"}, 110 
> [pid 49682] <... connect resumed> ) = -1 ENOENT (No such file or
> directory)
> [pid 49682] close(3 
> [pid 49682] <... close resumed> )   = 0
> [pid 49682] statfs("/dev/shm/",  
> [pid 49682] <... statfs resumed> {f_type=0x1021994, f_bsize=4096,
> 

Re: [ceph-users] ceph-mon always election when change crushmap in firefly

2015-09-23 Thread Michael Kidd
Hello Alexander,
  One other point on your email.. You indicate you desire each OSD to have
~100 PGs, but depending on your pool size, it seems you may have forgetting
about the additional PGs associated with replication itself.

Assuming 3x replication in your environment:
70,000 * 3

800 OSDs

=~ 262.5 PGs per OSD on average

While this PG to OSD ratio shouldn't cause significant pain, I would not go
to any higher PG count without adding more spindles.

For more specific PG count guidance and modeling, please see:
http://ceph.com/pgcalc

Hope this helps,

Michael J. Kidd
Sr. Storage Consultant
Red Hat Global Storage Consulting
+1 919-442-8878

On Wed, Sep 23, 2015 at 8:34 AM, Sage Weil  wrote:

> On Wed, 23 Sep 2015, Alexander Yang wrote:
> > hello,
> > We use Ceph+Openstack in our private cloud. In our cluster, we
> have
> > 5 mons and 800 osds, the Capacity is about 1Pb. And run about 700 vms and
> > 1100 volumes,
> > recently, we increase our pg_num , now the cluster have about
> 7
> > pgs. In my real intention? I want every osd have 100pgs. but after
> increase
> > pg_num, I find I'm wrong. Because the different crush weight for
> different
> > osd, the osd's pg_num is different, some osd have exceed  500pgs.
> > Now, the problem is  appear?cause some reason when i want to
> change
> > some osd  weight, that means change the crushmap.  This change cause
> about
> > 0.03% data to migrate. the mon is always begin to election. It's will
> hung
> > the cluster, and when they end, the  original  leader still is the new
> > leader. And during the mon eclection?On the upper layer, vm have too many
> > slow request will appear. so now i dare to do any operation about change
> > crushmap. But i worry about an important thing, If  when our cluster
> down
> >  one host even down one rack.   By the time, the cluster curshmap will
> > change large, and the migrate data also large. I worry the cluster will
> > hung  long time. and result on upper layer, all vm became to  shutdown.
> > In my opinion, I guess when I change the crushmap,* the leader
> mon
> > maybe calculate the too many information*, or* too many client want to
> get
> > the new crushmap from leader mon*.  It must be hung the mon thread, so
> the
> > leader mon can't heatbeat to other mons, the other mons think the leader
> is
> > down then begin the new election.  I am sorry if i guess is wrong.
> > The crushmap in accessory. So who can give me some advice or
> guide,
> > Thanks very much!
>
> There were huge improvements made in hammer in terms of mon efficiency in
> these cases where it is under load.  I recommend upgrading as that will
> help.
>
> You can also mitigate the problem somewhat by adjusting the mon_lease and
> associated settings up.  Scale all of mon_lease, mon_lease_renew_interval,
> mon_lease_ack_timeout, mon_accept_timeout by 2x or 3x.
>
> It also sounds like you may be using some older tunables/settings
> for your pools or crush rules.  Can you attach the output of 'ceph osd
> dump' and 'ceph osd crush dump | tail -n 20' ?
>
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Potential OSD deadlock?

2015-09-23 Thread Mark Nelson
FWIW, we've got some 40GbE Intel cards in the community performance 
cluster on a Mellanox 40GbE switch that appear (knock on wood) to be 
running fine with 3.10.0-229.7.2.el7.x86_64.  We did get feedback from 
Intel that older drivers might cause problems though.


Here's ifconfig from one of the nodes:

ens513f1: flags=4163  mtu 1500
inet 10.0.10.101  netmask 255.255.255.0  broadcast 10.0.10.255
inet6 fe80::6a05:caff:fe2b:7ea1  prefixlen 64  scopeid 0x20
ether 68:05:ca:2b:7e:a1  txqueuelen 1000  (Ethernet)
RX packets 169232242875  bytes 229346261232279 (208.5 TiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 153491686361  bytes 203976410836881 (185.5 TiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Mark

On 09/23/2015 01:48 PM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

OK, here is the update on the saga...

I traced some more of blocked I/Os and it seems that communication
between two hosts seemed worse than others. I did a two way ping flood
between the two hosts using max packet sizes (1500). After 1.5M
packets, no lost pings. Then then had the ping flood running while I
put Ceph load on the cluster and the dropped pings started increasing
after stopping the Ceph workload the pings stopped dropping.

I then ran iperf between all the nodes with the same results, so that
ruled out Ceph to a large degree. I then booted in the the
3.10.0-229.14.1.el7.x86_64 kernel and with an hour test so far there
hasn't been any dropped pings or blocked I/O. Our 40 Gb NICs really
need the network enhancements in the 4.x series to work well.

Does this sound familiar to anyone? I'll probably start bisecting the
kernel to see where this issue in introduced. Both of the clusters
with this issue are running 4.x, other than that, they are pretty
differing hardware and network configs.

Thanks,
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr
RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l
AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V
7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j
cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv
F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW
byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE
/XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe
LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok
UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98
sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX
KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X
4OEo
=P33I
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc  wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

This is IPoIB and we have the MTU set to 64K. There was some issues
pinging hosts with "No buffer space available" (hosts are currently
configured for 4GB to test SSD caching rather than page cache). I
found that MTU under 32K worked reliable for ping, but still had the
blocked I/O.

I reduced the MTU to 1500 and checked pings (OK), but I'm still seeing
the blocked I/O.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil  wrote:

On Tue, 22 Sep 2015, Samuel Just wrote:

I looked at the logs, it looks like there was a 53 second delay
between when osd.17 started sending the osd_repop message and when
osd.13 started reading it, which is pretty weird.  Sage, didn't we
once see a kernel issue which caused some messages to be mysteriously
delayed for many 10s of seconds?


Every time we have seen this behavior and diagnosed it in the wild it has
been a network misconfiguration.  Usually related to jumbo frames.

sage




What kernel are you running?
-Sam

On Tue, Sep 22, 2015 at 2:22 PM, Robert LeBlanc  wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

OK, looping in ceph-devel to see if I can get some more eyes. I've
extracted what I think are important entries from the logs for the
first blocked request. NTP is running all the servers so the logs
should be close in terms of time. Logs for 12:50 to 13:00 are
available at http://162.144.87.113/files/ceph_block_io.logs.tar.xz

2015-09-22 12:55:06.500374 - osd.17 gets I/O from client
2015-09-22 12:55:06.557160 - osd.17 submits I/O to osd.13
2015-09-22 12:55:06.557305 - osd.17 submits I/O to osd.16
2015-09-22 12:55:06.573711 - osd.16 gets I/O from osd.17
2015-09-22 12:55:06.595716 - osd.17 gets ondisk result=0 from osd.16
2015-09-22 12:55:06.640631 - osd.16 reports to osd.17 ondisk result=0
2015-09-22 12:55:36.926691 - osd.17 reports slow I/O > 30.439150 sec
2015-09-22 

Re: [ceph-users] Potential OSD deadlock?

2015-09-23 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We were able to only get ~17Gb out of the XL710 (heavily tweaked)
until we went to the 4.x kernel where we got ~36Gb (no tweaking). It
seems that there were some major reworks in the network handling in
the kernel to efficiently handle that network rate. If I remember
right we also saw a drop in CPU utilization. I'm starting to think
that we did see packet loss while congesting our ISLs in our initial
testing, but we could not tell where the dropping was happening. We
saw some on the switches, but it didn't seem to be bad if we weren't
trying to congest things. We probably already saw this issue, just
didn't know it.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Sep 23, 2015 at 1:10 PM, Mark Nelson  wrote:
> FWIW, we've got some 40GbE Intel cards in the community performance cluster
> on a Mellanox 40GbE switch that appear (knock on wood) to be running fine
> with 3.10.0-229.7.2.el7.x86_64.  We did get feedback from Intel that older
> drivers might cause problems though.
>
> Here's ifconfig from one of the nodes:
>
> ens513f1: flags=4163  mtu 1500
> inet 10.0.10.101  netmask 255.255.255.0  broadcast 10.0.10.255
> inet6 fe80::6a05:caff:fe2b:7ea1  prefixlen 64  scopeid 0x20
> ether 68:05:ca:2b:7e:a1  txqueuelen 1000  (Ethernet)
> RX packets 169232242875  bytes 229346261232279 (208.5 TiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 153491686361  bytes 203976410836881 (185.5 TiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> Mark
>
>
> On 09/23/2015 01:48 PM, Robert LeBlanc wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> OK, here is the update on the saga...
>>
>> I traced some more of blocked I/Os and it seems that communication
>> between two hosts seemed worse than others. I did a two way ping flood
>> between the two hosts using max packet sizes (1500). After 1.5M
>> packets, no lost pings. Then then had the ping flood running while I
>> put Ceph load on the cluster and the dropped pings started increasing
>> after stopping the Ceph workload the pings stopped dropping.
>>
>> I then ran iperf between all the nodes with the same results, so that
>> ruled out Ceph to a large degree. I then booted in the the
>> 3.10.0-229.14.1.el7.x86_64 kernel and with an hour test so far there
>> hasn't been any dropped pings or blocked I/O. Our 40 Gb NICs really
>> need the network enhancements in the 4.x series to work well.
>>
>> Does this sound familiar to anyone? I'll probably start bisecting the
>> kernel to see where this issue in introduced. Both of the clusters
>> with this issue are running 4.x, other than that, they are pretty
>> differing hardware and network configs.
>>
>> Thanks,
>> -BEGIN PGP SIGNATURE-
>> Version: Mailvelope v1.1.0
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr
>> RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l
>> AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V
>> 7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j
>> cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv
>> F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW
>> byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE
>> /XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe
>> LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok
>> UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98
>> sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX
>> KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X
>> 4OEo
>> =P33I
>> -END PGP SIGNATURE-
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc
>> wrote:
>>>
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA256
>>>
>>> This is IPoIB and we have the MTU set to 64K. There was some issues
>>> pinging hosts with "No buffer space available" (hosts are currently
>>> configured for 4GB to test SSD caching rather than page cache). I
>>> found that MTU under 32K worked reliable for ping, but still had the
>>> blocked I/O.
>>>
>>> I reduced the MTU to 1500 and checked pings (OK), but I'm still seeing
>>> the blocked I/O.
>>> - 
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil  wrote:

 On Tue, 22 Sep 2015, Samuel Just wrote:
>
> I looked at the logs, it looks like there was a 53 second delay
> between when osd.17 started sending the osd_repop message and when
> osd.13 started reading it, which is pretty weird.  Sage, didn't we
> once see a kernel issue which caused some messages to be mysteriously
> delayed for many 10s of seconds?


 Every 

[ceph-users] About cephfs with hadoop

2015-09-23 Thread Fulin Sun
Hi, all
I am trying to use cephfs as a replacement drop-in for hadoop hdfs. Maily 
configure steps according to doc here : 
http://docs.ceph.com/docs/master/cephfs/hadoop/ 

I am using a 3 node hadoop 2.7.1 cluster. Noting the official doc recommend 
using 1.1.x stable release, I am not sure
if using 2.x release would cause some unexpected trouble. Anyway, I think I had 
configured right for cephfs and ceph
storage cluster goes well. 

When I am trying to start the hadoop cluster, namenode failed with the 
following exception: 
  2015-09-23 14:47:39,600 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
   java.lang.IllegalArgumentException: Invalid URI for NameNode address 
(check fs.defaultFS): ceph://172.16.50.18:6789/ is not of scheme 'hdfs'.
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:477)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:461)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:512)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:612)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:632)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
   

Quite weird here cause the Namenode check seems to NOT ALLOW any schema expect 
default hdfs. Then how would cephfs-hadoop work actually?

Really need your help here. 
Best,
Sun.





CertusNet

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IPv6 connectivity after website changes

2015-09-23 Thread Wido den Hollander


On 23-09-15 03:49, Dan Mick wrote:
> On 09/22/2015 05:22 AM, Sage Weil wrote:
>> On Tue, 22 Sep 2015, Wido den Hollander wrote:
>>> Hi,
>>>
>>> After the recent changes in the Ceph website the IPv6 connectivity got lost.
>>>
>>> www.ceph.com
>>> docs.ceph.com
>>> download.ceph.com
>>> git.ceph.com
>>>
>>> The problem I'm now facing with a couple of systems is that they can't
>>> download the Package signing key from git.ceph.com or anything from
>>> download.ceph.com
>>>
>>> I see everything is still hosted at Dreamhost which has native IPv6 to
>>> all systems, so it's mainly just adding the -records and it should
>>> be fixed.
>>
>> Yep... we'll get this added today!  (Dan, if you send me the v6 addrs on 
>> irc, I can update DNS.)
>>
>> sage
>>
> 
> I've added ceph.com and download.ceph.com so far, and confirmed they're
> answering.  git.ceph.com and docs.ceph.com should not have changed, but
> I don't see  records for them either; investigating.
> 

Indeed, ceph.com and download.ceph.com work fine now.

git and docs are indeed still missing the records.

Wido

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Diffrent OSD capacity & what is the weight of item

2015-09-23 Thread wikison
Hi, 
I have four storage machines to build a ceph storage cluster as storage 
nodes. Each of them is attached a 120 GB HDD and a 1 TB HDD. Is it OK to think 
that those storage devices are same when write a ceph.conf?
For example, when setting osd pool default pg num , I thought: osd pool 
default pg num = (100 * 8 ) / 3 =  266, where osd pool default size = 3 and the 
number of OSDs is 8 (one Daemon per device).
   
And, when Add the OSD to the CRUSH map so that it can begin receiving 
data. You may also decompile the CRUSH map, add the OSD to the device list, add 
the host as a bucket (if it’s not already in the CRUSH map), add the device as 
an item in the host, assign it a weight, recompile it and set it.
ceph [--cluster {cluster-name}] osd crush add {id-or-name} {weight} 
[{bucket-type}={bucket-name} ...]
What is the meaning of weight? How should I set it to satisfy my 
hardware condition ?




--

Zhen Wang___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] lttng duplicate registration problem when using librados2 and libradosstriper

2015-09-23 Thread Paul Mansfield


On 22/09/15 19:48, Jason Dillaman wrote:
>> On 22/09/15 17:46, Jason Dillaman wrote:
>>> As a background, I believe LTTng-UST is disabled for RHEL7 in the Ceph
>>> project only due to the fact that EPEL 7 doesn't provide the required
>>> packages [1].
>>
>> interesting. so basically our program might only work on RHEL7 by accident!!
>>
> 
> It's not the best answer, but it is the reason why it is currently disabled 
> on RHEL 7.  Best bet for finding a long-term solution is still probably 
> attaching with gdb and catching the abort function call.  Once the offending 
> probe can be found, we can figure out how to fix it.

thanks, we will try that.

is it possible in the meanwhile to build some R6 packages with tracing
disabled?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] C example of using libradosstriper?

2015-09-23 Thread Paul Mansfield
Hi,
thanks very much for posting that, much appreciated.

We were able to build and test it on Red Hat EL7.

On 17/09/15 04:01, 张冬卯 wrote:
> 
> Hi,
> 
> src/tools/rados.c has some  striper rados snippet.
> 
> and I have  this little project using striper rados.
> see:https://github.com/thesues/striprados
> 
> wish could help you
> 
> Dongmao Zhang
> 
> 在 2015年09月17日 01:05, Paul Mansfield 写道:
>> Hello,
>> I'm using the C interface librados striper and am looking for examples
>> on how to use it.
>>
>>
>> Please can someone point me to any useful code snippets? All I've found
>> so far is the source code :-(
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Important security noticed regarding release signing key

2015-09-23 Thread wangsongbo

Hi  Ken,
Just now, I run teuthology-suites in our testing, it failed because 
of lacking these packages, such as 
qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64, 
qemu-kvm-tools-0.12.1.2-2.415.el6.3ceph etc.
The modify " rm ceph-extras repository config#137" remove the 
repository , but did not solve the ansible's dependence.


How to solve this dependence ?

Thanks and Regards,
WangSongbo

On 15/9/23 上午10:50, Ken Dreyer wrote:

Hi Songbo, It's been removed from Ansible now:
https://github.com/ceph/ceph-cm-ansible/pull/137

- Ken

On Tue, Sep 22, 2015 at 8:33 PM, wangsongbo  wrote:

Hi Ken,
 Thanks for your reply. But in the ceph-cm-ansible project scheduled by
teuthology, "ceph.com/packages/ceph-extras" is in used now, such as
qemu-kvm-0.12.1.2-2.415.el6.3ceph, qemu-kvm-tools-0.12.1.2-2.415.el6.3ceph
etc.
 Any new releases will be provided ?


On 15/9/22 下午10:24, Ken Dreyer wrote:

On Tue, Sep 22, 2015 at 2:38 AM, Songbo Wang  wrote:

Hi, all,
  Since the last week‘s attack, “ceph.com/packages/ceph-extras”
can be
opened never, but where can I get the releases of ceph-extra now?

Thanks and Regards,
WangSongbo


The packages in "ceph-extras" were old and subject to CVEs (the big
one being VENOM, CVE-2015-3456). So I don't intend to host ceph-extras
in the new location.

- Ken




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd map failing for image with exclusive-lock feature

2015-09-23 Thread Allen Liao
Hi all,

I'm unable to map a block device for an image that was created with
exclusive-lock feature:

$ sudo rbd create foo --size 4096 --image-features=4 --image-format=2
$ sudo rbd map foo
rbd: sysfs write failed
rbd: map failed: (6) No such device or address

How do I map the image?  I've tried locking an image before mapping it but
I get the same result.  If I omit the "--image-features=4" when creating
the image then I can map it just fine.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Potential OSD deadlock?

2015-09-23 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

OK, here is the update on the saga...

I traced some more of blocked I/Os and it seems that communication
between two hosts seemed worse than others. I did a two way ping flood
between the two hosts using max packet sizes (1500). After 1.5M
packets, no lost pings. Then then had the ping flood running while I
put Ceph load on the cluster and the dropped pings started increasing
after stopping the Ceph workload the pings stopped dropping.

I then ran iperf between all the nodes with the same results, so that
ruled out Ceph to a large degree. I then booted in the the
3.10.0-229.14.1.el7.x86_64 kernel and with an hour test so far there
hasn't been any dropped pings or blocked I/O. Our 40 Gb NICs really
need the network enhancements in the 4.x series to work well.

Does this sound familiar to anyone? I'll probably start bisecting the
kernel to see where this issue in introduced. Both of the clusters
with this issue are running 4.x, other than that, they are pretty
differing hardware and network configs.

Thanks,
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr
RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l
AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V
7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j
cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv
F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW
byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE
/XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe
LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok
UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98
sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX
KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X
4OEo
=P33I
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> This is IPoIB and we have the MTU set to 64K. There was some issues
> pinging hosts with "No buffer space available" (hosts are currently
> configured for 4GB to test SSD caching rather than page cache). I
> found that MTU under 32K worked reliable for ping, but still had the
> blocked I/O.
>
> I reduced the MTU to 1500 and checked pings (OK), but I'm still seeing
> the blocked I/O.
> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil  wrote:
>> On Tue, 22 Sep 2015, Samuel Just wrote:
>>> I looked at the logs, it looks like there was a 53 second delay
>>> between when osd.17 started sending the osd_repop message and when
>>> osd.13 started reading it, which is pretty weird.  Sage, didn't we
>>> once see a kernel issue which caused some messages to be mysteriously
>>> delayed for many 10s of seconds?
>>
>> Every time we have seen this behavior and diagnosed it in the wild it has
>> been a network misconfiguration.  Usually related to jumbo frames.
>>
>> sage
>>
>>
>>>
>>> What kernel are you running?
>>> -Sam
>>>
>>> On Tue, Sep 22, 2015 at 2:22 PM, Robert LeBlanc  wrote:
>>> > -BEGIN PGP SIGNED MESSAGE-
>>> > Hash: SHA256
>>> >
>>> > OK, looping in ceph-devel to see if I can get some more eyes. I've
>>> > extracted what I think are important entries from the logs for the
>>> > first blocked request. NTP is running all the servers so the logs
>>> > should be close in terms of time. Logs for 12:50 to 13:00 are
>>> > available at http://162.144.87.113/files/ceph_block_io.logs.tar.xz
>>> >
>>> > 2015-09-22 12:55:06.500374 - osd.17 gets I/O from client
>>> > 2015-09-22 12:55:06.557160 - osd.17 submits I/O to osd.13
>>> > 2015-09-22 12:55:06.557305 - osd.17 submits I/O to osd.16
>>> > 2015-09-22 12:55:06.573711 - osd.16 gets I/O from osd.17
>>> > 2015-09-22 12:55:06.595716 - osd.17 gets ondisk result=0 from osd.16
>>> > 2015-09-22 12:55:06.640631 - osd.16 reports to osd.17 ondisk result=0
>>> > 2015-09-22 12:55:36.926691 - osd.17 reports slow I/O > 30.439150 sec
>>> > 2015-09-22 12:55:59.790591 - osd.13 gets I/O from osd.17
>>> > 2015-09-22 12:55:59.812405 - osd.17 gets ondisk result=0 from osd.13
>>> > 2015-09-22 12:56:02.941602 - osd.13 reports to osd.17 ondisk result=0
>>> >
>>> > In the logs I can see that osd.17 dispatches the I/O to osd.13 and
>>> > osd.16 almost silmutaniously. osd.16 seems to get the I/O right away,
>>> > but for some reason osd.13 doesn't get the message until 53 seconds
>>> > later. osd.17 seems happy to just wait and doesn't resend the data
>>> > (well, I'm not 100% sure how to tell which entries are the actual data
>>> > transfer).
>>> >
>>> > It looks like osd.17 is receiving responses to start the communication
>>> > with osd.13, but the op 

[ceph-users] rgw cache lru size

2015-09-23 Thread Ben Hines
We have a ton of memory on our RGW servers, 96GB.

Can someone explain how the rgw lru cache functions? It is worth
bumping the 'rgw cache lru size' to a huge number?

Our gateway seems to only be using about 1G of memory with the default setting.

Also currently still using apache/fastcgi due to the extra
configurability and logging of apache. Willing to switch to civetweb
if given a good reason..

thanks-

-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Basic object storage question

2015-09-23 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

If you use RADOS gateway, RBD or CephFS, then you don't need to worry
about striping. If you write your own application that uses librados,
then you have to worry about it. I understand that there is a
radosstriper library that should help with that. There is also a limit
to the size of an object that can be stored. I think I've seen the
number of 100GB thrown around.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Sep 23, 2015 at 7:04 PM, Cory Hawkless  wrote:
> Ok, so I have found this
>
>
>
> “The objects Ceph stores in the Ceph Storage Cluster are not striped. Ceph
> Object Storage, Ceph Block Device, and the Ceph Filesystem stripe their data
> over multiple Ceph Storage Cluster objects. Ceph Clients that write directly
> to the Ceph Storage Cluster via librados must perform the striping (and
> parallel I/O) for themselves to obtain these benefits.” On -
> http://docs.ceph.com/docs/master/architecture/
>
>
>
> So it appears that the breaking a single object up into chunks(or stripes)
> are the responsibility of the application writing to Ceph, not of the RADOS
> engine itself?
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Cory Hawkless
> Sent: Thursday, 24 September 2015 10:22 AM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Basic object storage question
>
>
>
> Hi all,
>
>
>
> I have basic question around how Ceph stores individual objects.
>
> Say I have a pool with a replica size of 3 and I upload a 1GB file to this
> pool. It appears as if this 1GB file gets placed into 3PG’s on 3 OSD’s ,
> simple enough?
>
>
>
> Are individual objects never split up? What if I want to storage backup
> files or Openstack Glance images totalling 100’s of GB’s.
>
>
>
> Potentially I could run into issues is I have an object who’s size exceeds
> the available space on any of the OSD’s, say I have 1TB OSD’s and they are
> all 50% full and I try to upload a 501GB image, I presume this would fail
> even through there is sufficient space in the pool there is not a single OSD
> with >500GB of space available.
>
>
>
> Do I have this right? If so is there any way around this? Ideally I’d like
> to use Ceph as target for all of my servers backups, but some of these total
> in the TB’s but none of my OSD’s are this big(Currently using 900GB SAS
> disks).
>
>
>
> Thanks in advance
>
> Cory
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWA3acCRDmVDuy+mK58QAAoXkP/R6+fWoKhqhZv0gIUTFo
tm0AEhbr/z7U1bwaeDLUgc01DuIg3GNPLSGXKB1kdO7A6nfl6KdEUJwFLUAR
kBRV3CEtPpgCVxaLu91KbHrnlfVFFvzea+H65z7YhWNqiTiji76ZiEbr0K79
9eenwgkNsPqhdDjIbklCIyKz/Ny8sxht78j6V9gda81v6ZexuyNqJ4/chAA7
uw9PXPw4o+1onrOK2O5LCbMzcD5WOBO94B67GNkiSUTYavioOM4SMJhyXKB3
69VR47CMnrvPpMAWaPp5VngoCRaGzDfIRGFXsDgoTilPH4xAHO/u0kp1sm30
m/L8Eqfo9BR6ZEct2xdVs00puPUqR+pPHUTvHdodgGxumsBraA4D2gAU4yLA
akEtnlmvI1GQjdMpQIIncs3D1KS3Y1enUBL5AbKbwfdfiJ0MqizqaoBvogbR
3AzeQptL1PuGvirQf9MmNI9i3FK4XeU1NFQVeV0FteA+sW38l6pnZ5503cBh
24MpDYDlQW578HRPIHaZfVLG8mauyKOZL9ntp3RjFT1MNvR0E+NFXou+At1+
8UqpLdwnZscRtXpdBksAlyof+ArKFkujOvCZSIir5QZP4f/8MWsOVLcqFfde
U4JxShKbN8XLkK/UJ+f2atqSBQtlfox3HZhA/nJYyZKexbpKS+vH5M1Awkj+
xIbz
=xBpT
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Important security noticed regarding release signing key

2015-09-23 Thread wangsongbo

Hi  Ken,
Just now, I run teuthology-suites in our testing, it failed because of lacking 
these packages,
such as qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64, 
qemu-kvm-tools-0.12.1.2-2.415.el6.3ceph etc.
The modify "rm ceph-extras repository config#137" only remove the repository , 
but did not solve the ansible's dependence.
How to solve this dependence ?

Thanks and Regards,
WangSongbo


On 15/9/23 上午10:50, Ken Dreyer wrote:

Hi Songbo, It's been removed from Ansible now:
https://github.com/ceph/ceph-cm-ansible/pull/137

- Ken

On Tue, Sep 22, 2015 at 8:33 PM, wangsongbo  wrote:

Hi Ken,
 Thanks for your reply. But in the ceph-cm-ansible project scheduled by
teuthology, "ceph.com/packages/ceph-extras" is in used now, such as
qemu-kvm-0.12.1.2-2.415.el6.3ceph, qemu-kvm-tools-0.12.1.2-2.415.el6.3ceph
etc.
 Any new releases will be provided ?


On 15/9/22 下午10:24, Ken Dreyer wrote:

On Tue, Sep 22, 2015 at 2:38 AM, Songbo Wang  wrote:

Hi, all,
  Since the last week‘s attack, “ceph.com/packages/ceph-extras”
can be
opened never, but where can I get the releases of ceph-extra now?

Thanks and Regards,
WangSongbo


The packages in "ceph-extras" were old and subject to CVEs (the big
one being VENOM, CVE-2015-3456). So I don't intend to host ceph-extras
in the new location.

- Ken




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon timeout

2015-09-23 Thread 黑铁柱
I can not connect my mon.0.   probe timeout!!

ceph version:0.80.7
I just have one mon server--->10.123.5.29:6789;
why it find other mon?

ceph.conf
///
[global]
auth service required = cephx
filestore xattr use omap = true
auth client required = cephx
auth cluster required = cephx
mon host = 10.123.5.29
mon initial members = 10_123_5_29
fsid = eeabaeaf-4eb0-468a-a0e0-a3a3bece28af
debug ms = 1/5
[mon]
mon data = /var/lib/ceph/mon/ceph-$id
debug mon = 20
debug paxos = 1/5
debug auth = 2
[mon.0]
host = 10_123_5_29
mon addr = 10.123.5.29:6789
///


ceph-mon.0.log:
2015-09-24 10:13:02.051630 7f5fbe06f700  4 mon.0@-1(probing) e0
probe_timeout 0x29201b0
2015-09-24 10:13:02.051651 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
2015-09-24 10:13:02.051653 7f5fbe06f700 10 mon.0@-1(probing) e0
sync_reset_requester
2015-09-24 10:13:02.051654 7f5fbe06f700 10 mon.0@-1(probing) e0
unregister_cluster_logger - not registered
2015-09-24 10:13:02.051656 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:02.051658 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
2015-09-24 10:13:02.051659 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:02.051661 7f5fbe06f700 10 mon.0@-1(probing) e0
timecheck_finish
2015-09-24 10:13:02.051663 7f5fbe06f700 10 mon.0@-1(probing) e0 scrub_reset
2015-09-24 10:13:02.051669 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:02.051671 7f5fbe06f700 10 mon.0@-1(probing) e0
reset_probe_timeout 0x29201a0 after 2 seconds
2015-09-24 10:13:02.051679 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
other monitors
2015-09-24 10:13:02.051683 7f5fbe06f700  1 -- 10.123.5.29:6789/0 --> mon.0
0.0.0.0:0/1 -- mon_probe(probe eeabaeaf-4eb0-468a-a0e0-a3a3bece28af name 0
new) v6 -- ?+0 0x2d3b980
q12015-09-24 10:13:04.028927 7f5fbe06f700 11 mon.0@-1(probing) e0 tick
2015-09-24 10:13:04.028950 7f5fbe06f700 20 mon.0@-1(probing) e0
sync_trim_providers
2015-09-24 10:13:04.051741 7f5fbe06f700  4 mon.0@-1(probing) e0
probe_timeout 0x29201a0
2015-09-24 10:13:04.051753 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
2015-09-24 10:13:04.051755 7f5fbe06f700 10 mon.0@-1(probing) e0
sync_reset_requester
2015-09-24 10:13:04.051756 7f5fbe06f700 10 mon.0@-1(probing) e0
unregister_cluster_logger - not registered
2015-09-24 10:13:04.051758 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:04.051760 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
2015-09-24 10:13:04.051761 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:04.051762 7f5fbe06f700 10 mon.0@-1(probing) e0
timecheck_finish
2015-09-24 10:13:04.051765 7f5fbe06f700 10 mon.0@-1(probing) e0 scrub_reset
2015-09-24 10:13:04.051770 7f5fbe06f700 10 mon.0@-1(probing) e0
cancel_probe_timeout (none scheduled)
2015-09-24 10:13:04.051772 7f5fbe06f700 10 mon.0@-1(probing) e0
reset_probe_timeout 0x2920200 after 2 seconds
2015-09-24 10:13:04.051780 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
other monitors
2015-09-24 10:13:04.051783 7f5fbe06f700  1 -- 10.123.5.29:6789/0 --> mon.0
0.0.0.0:0/1 -- mon_probe(probe eeabaeaf-4eb0-468a-a0e0-a3a3bece28af name 0
new) v6 -- ?+0 0x2d3b700
///
problem:
[root@10_123_5_29 /var/log/ceph]# ceph -w
2015-09-24 09:43:10.738514 7fb5e88ae700  0 monclient(hunting): authenticate
timed out after 300
2015-09-24 09:43:10.738555 7fb5e88ae700  0 librados: client.admin
authentication error (110) Connection timed out
Error connecting to cluster: TimedOut
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon timeout

2015-09-23 Thread 黑铁柱
[root@10_123_5_29 /var/log/ceph]# ceph --admin-daemon
/var/run/ceph/ceph-mon.0.asok mon_status
{ "name": "0",
  "rank": -1,
  "state": "probing",
  "election_epoch": 0,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [
"10.123.5.29:6789\/0"],
  "sync_provider": [],
  "monmap": { "epoch": 0,
  "fsid": "eeabaeaf-4eb0-468a-a0e0-a3a3bece28af",
  "modified": "2015-09-23 22:03:45.415605",
  "created": "2015-09-23 22:03:45.415605",
  "mons": [
{ "rank": 0,
  "name": "10_123_5_29",
  "addr": "0.0.0.0:0\/1"}]}}

[root@10_123_5_29 /var/log/ceph]#  ceph --admin-daemon
/var/run/ceph/ceph-mon.0.asok quorum_status
{ "election_epoch": 0,
  "quorum": [],
  "quorum_names": [],
  "quorum_leader_name": "",
  "monmap": { "epoch": 0,
  "fsid": "eeabaeaf-4eb0-468a-a0e0-a3a3bece28af",
  "modified": "2015-09-23 22:03:45.415605",
  "created": "2015-09-23 22:03:45.415605",
  "mons": [
{ "rank": 0,
  "name": "10_123_5_29",
  "addr": "0.0.0.0:0\/1"}]}}

2015-09-24 10:19 GMT+08:00 黑铁柱 :

> I can not connect my mon.0.   probe timeout!!
>
> ceph version:0.80.7
> I just have one mon server--->10.123.5.29:6789;
> why it find other mon?
>
> ceph.conf
> ///
> [global]
> auth service required = cephx
> filestore xattr use omap = true
> auth client required = cephx
> auth cluster required = cephx
> mon host = 10.123.5.29
> mon initial members = 10_123_5_29
> fsid = eeabaeaf-4eb0-468a-a0e0-a3a3bece28af
> debug ms = 1/5
> [mon]
> mon data = /var/lib/ceph/mon/ceph-$id
> debug mon = 20
> debug paxos = 1/5
> debug auth = 2
> [mon.0]
> host = 10_123_5_29
> mon addr = 10.123.5.29:6789
> ///
>
>
> ceph-mon.0.log:
> 2015-09-24 10:13:02.051630 7f5fbe06f700  4 mon.0@-1(probing) e0
> probe_timeout 0x29201b0
> 2015-09-24 10:13:02.051651 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
> 2015-09-24 10:13:02.051653 7f5fbe06f700 10 mon.0@-1(probing) e0
> sync_reset_requester
> 2015-09-24 10:13:02.051654 7f5fbe06f700 10 mon.0@-1(probing) e0
> unregister_cluster_logger - not registered
> 2015-09-24 10:13:02.051656 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:02.051658 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
> 2015-09-24 10:13:02.051659 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:02.051661 7f5fbe06f700 10 mon.0@-1(probing) e0
> timecheck_finish
> 2015-09-24 10:13:02.051663 7f5fbe06f700 10 mon.0@-1(probing) e0
> scrub_reset
> 2015-09-24 10:13:02.051669 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:02.051671 7f5fbe06f700 10 mon.0@-1(probing) e0
> reset_probe_timeout 0x29201a0 after 2 seconds
> 2015-09-24 10:13:02.051679 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
> other monitors
> 2015-09-24 10:13:02.051683 7f5fbe06f700  1 -- 10.123.5.29:6789/0 -->
> mon.0 0.0.0.0:0/1 -- mon_probe(probe eeabaeaf-4eb0-468a-a0e0-a3a3bece28af
> name 0 new) v6 -- ?+0 0x2d3b980
> q12015-09-24 10:13:04.028927 7f5fbe06f700 11 mon.0@-1(probing) e0 tick
> 2015-09-24 10:13:04.028950 7f5fbe06f700 20 mon.0@-1(probing) e0
> sync_trim_providers
> 2015-09-24 10:13:04.051741 7f5fbe06f700  4 mon.0@-1(probing) e0
> probe_timeout 0x29201a0
> 2015-09-24 10:13:04.051753 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
> 2015-09-24 10:13:04.051755 7f5fbe06f700 10 mon.0@-1(probing) e0
> sync_reset_requester
> 2015-09-24 10:13:04.051756 7f5fbe06f700 10 mon.0@-1(probing) e0
> unregister_cluster_logger - not registered
> 2015-09-24 10:13:04.051758 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:04.051760 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
> 2015-09-24 10:13:04.051761 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:04.051762 7f5fbe06f700 10 mon.0@-1(probing) e0
> timecheck_finish
> 2015-09-24 10:13:04.051765 7f5fbe06f700 10 mon.0@-1(probing) e0
> scrub_reset
> 2015-09-24 10:13:04.051770 7f5fbe06f700 10 mon.0@-1(probing) e0
> cancel_probe_timeout (none scheduled)
> 2015-09-24 10:13:04.051772 7f5fbe06f700 10 mon.0@-1(probing) e0
> reset_probe_timeout 0x2920200 after 2 seconds
> 2015-09-24 10:13:04.051780 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
> other monitors
> 2015-09-24 10:13:04.051783 7f5fbe06f700  1 -- 10.123.5.29:6789/0 -->
> mon.0 0.0.0.0:0/1 -- mon_probe(probe eeabaeaf-4eb0-468a-a0e0-a3a3bece28af
> name 0 new) v6 -- ?+0 0x2d3b700
> ///
> problem:
> [root@10_123_5_29 /var/log/ceph]# ceph -w
> 2015-09-24 09:43:10.738514 7fb5e88ae700  0 monclient(hunting):
> authenticate timed out after 300
> 2015-09-24 09:43:10.738555 7fb5e88ae700  0 librados: client.admin
> authentication error (110) Connection timed out
> Error connecting to cluster: TimedOut
>

Re: [ceph-users] mon timeout

2015-09-23 Thread 黑铁柱
我发现这个问题是mon的名字不可以是单纯的一个数字导致的

2015-09-24 10:22 GMT+08:00 黑铁柱 :

> [root@10_123_5_29 /var/log/ceph]# ceph --admin-daemon
> /var/run/ceph/ceph-mon.0.asok mon_status
> { "name": "0",
>   "rank": -1,
>   "state": "probing",
>   "election_epoch": 0,
>   "quorum": [],
>   "outside_quorum": [],
>   "extra_probe_peers": [
> "10.123.5.29:6789\/0"],
>   "sync_provider": [],
>   "monmap": { "epoch": 0,
>   "fsid": "eeabaeaf-4eb0-468a-a0e0-a3a3bece28af",
>   "modified": "2015-09-23 22:03:45.415605",
>   "created": "2015-09-23 22:03:45.415605",
>   "mons": [
> { "rank": 0,
>   "name": "10_123_5_29",
>   "addr": "0.0.0.0:0\/1"}]}}
>
> [root@10_123_5_29 /var/log/ceph]#  ceph --admin-daemon
> /var/run/ceph/ceph-mon.0.asok quorum_status
> { "election_epoch": 0,
>   "quorum": [],
>   "quorum_names": [],
>   "quorum_leader_name": "",
>   "monmap": { "epoch": 0,
>   "fsid": "eeabaeaf-4eb0-468a-a0e0-a3a3bece28af",
>   "modified": "2015-09-23 22:03:45.415605",
>   "created": "2015-09-23 22:03:45.415605",
>   "mons": [
> { "rank": 0,
>   "name": "10_123_5_29",
>   "addr": "0.0.0.0:0\/1"}]}}
>
> 2015-09-24 10:19 GMT+08:00 黑铁柱 :
>
>> I can not connect my mon.0.   probe timeout!!
>>
>> ceph version:0.80.7
>> I just have one mon server--->10.123.5.29:6789;
>> why it find other mon?
>>
>> ceph.conf
>> ///
>> [global]
>> auth service required = cephx
>> filestore xattr use omap = true
>> auth client required = cephx
>> auth cluster required = cephx
>> mon host = 10.123.5.29
>> mon initial members = 10_123_5_29
>> fsid = eeabaeaf-4eb0-468a-a0e0-a3a3bece28af
>> debug ms = 1/5
>> [mon]
>> mon data = /var/lib/ceph/mon/ceph-$id
>> debug mon = 20
>> debug paxos = 1/5
>> debug auth = 2
>> [mon.0]
>> host = 10_123_5_29
>> mon addr = 10.123.5.29:6789
>> ///
>>
>>
>> ceph-mon.0.log:
>> 2015-09-24 10:13:02.051630 7f5fbe06f700  4 mon.0@-1(probing) e0
>> probe_timeout 0x29201b0
>> 2015-09-24 10:13:02.051651 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
>> 2015-09-24 10:13:02.051653 7f5fbe06f700 10 mon.0@-1(probing) e0
>> sync_reset_requester
>> 2015-09-24 10:13:02.051654 7f5fbe06f700 10 mon.0@-1(probing) e0
>> unregister_cluster_logger - not registered
>> 2015-09-24 10:13:02.051656 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:02.051658 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
>> 2015-09-24 10:13:02.051659 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:02.051661 7f5fbe06f700 10 mon.0@-1(probing) e0
>> timecheck_finish
>> 2015-09-24 10:13:02.051663 7f5fbe06f700 10 mon.0@-1(probing) e0
>> scrub_reset
>> 2015-09-24 10:13:02.051669 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:02.051671 7f5fbe06f700 10 mon.0@-1(probing) e0
>> reset_probe_timeout 0x29201a0 after 2 seconds
>> 2015-09-24 10:13:02.051679 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
>> other monitors
>> 2015-09-24 10:13:02.051683 7f5fbe06f700  1 -- 10.123.5.29:6789/0 -->
>> mon.0 0.0.0.0:0/1 -- mon_probe(probe
>> eeabaeaf-4eb0-468a-a0e0-a3a3bece28af name 0 new) v6 -- ?+0 0x2d3b980
>> q12015-09-24 10:13:04.028927 7f5fbe06f700 11 mon.0@-1(probing) e0 tick
>> 2015-09-24 10:13:04.028950 7f5fbe06f700 20 mon.0@-1(probing) e0
>> sync_trim_providers
>> 2015-09-24 10:13:04.051741 7f5fbe06f700  4 mon.0@-1(probing) e0
>> probe_timeout 0x29201a0
>> 2015-09-24 10:13:04.051753 7f5fbe06f700 10 mon.0@-1(probing) e0 bootstrap
>> 2015-09-24 10:13:04.051755 7f5fbe06f700 10 mon.0@-1(probing) e0
>> sync_reset_requester
>> 2015-09-24 10:13:04.051756 7f5fbe06f700 10 mon.0@-1(probing) e0
>> unregister_cluster_logger - not registered
>> 2015-09-24 10:13:04.051758 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:04.051760 7f5fbe06f700 10 mon.0@-1(probing) e0 _reset
>> 2015-09-24 10:13:04.051761 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:04.051762 7f5fbe06f700 10 mon.0@-1(probing) e0
>> timecheck_finish
>> 2015-09-24 10:13:04.051765 7f5fbe06f700 10 mon.0@-1(probing) e0
>> scrub_reset
>> 2015-09-24 10:13:04.051770 7f5fbe06f700 10 mon.0@-1(probing) e0
>> cancel_probe_timeout (none scheduled)
>> 2015-09-24 10:13:04.051772 7f5fbe06f700 10 mon.0@-1(probing) e0
>> reset_probe_timeout 0x2920200 after 2 seconds
>> 2015-09-24 10:13:04.051780 7f5fbe06f700 10 mon.0@-1(probing) e0 probing
>> other monitors
>> 2015-09-24 10:13:04.051783 7f5fbe06f700  1 -- 10.123.5.29:6789/0 -->
>> mon.0 0.0.0.0:0/1 -- mon_probe(probe
>> eeabaeaf-4eb0-468a-a0e0-a3a3bece28af name 0 new) v6 -- ?+0 0x2d3b700
>> ///
>> problem:
>> [root@10_123_5_29 /var/log/ceph]# ceph -w
>> 2015-09-24 09:43:10.738514 

[ceph-users] Basic object storage question

2015-09-23 Thread Cory Hawkless
Hi all,

I have basic question around how Ceph stores individual objects.
Say I have a pool with a replica size of 3 and I upload a 1GB file to this 
pool. It appears as if this 1GB file gets placed into 3PG's on 3 OSD's , simple 
enough?

Are individual objects never split up? What if I want to storage backup files 
or Openstack Glance images totalling 100's of GB's.

Potentially I could run into issues is I have an object who's size exceeds the 
available space on any of the OSD's, say I have 1TB OSD's and they are all 50% 
full and I try to upload a 501GB image, I presume this would fail even through 
there is sufficient space in the pool there is not a single OSD with >500GB of 
space available.

Do I have this right? If so is there any way around this? Ideally I'd like to 
use Ceph as target for all of my servers backups, but some of these total in 
the TB's but none of my OSD's are this big(Currently using 900GB SAS disks).

Thanks in advance
Cory


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EU Ceph mirror changes

2015-09-23 Thread Matt Taylor

Apologies for the delay!

au.ceph.com has been updated accordingly.

Regards,
Matthew.

On 22/09/2015 00:03, Wido den Hollander wrote:

Hi,

Since the security notice regarding ceph.com the mirroring system broke.
This meant that eu.ceph.com didn't serve new packages since the whole
download system changed.

I didn't have much time to fix this, but today I resolved it by
installing Varnish [0] on eu.ceph.com

The VCL which is being used on eu.ceph.com can be found on my Github
gist page [1].

All URLs to eu.ceph.com should still work and data is served from the EU.

rsync is however no longer available since all data is stored in memory
and is downloaded from 'download.ceph.com' when not available locally.

It's not a ideal mirroring system right now, but it still works. If you
have multiple machines downloading the same package, the first request
for a RPM / DEB might be a cache-miss which requires a download from the
US, but afterwards it should serve the other requests from cache.

The VCL [1] I use can also be used locally on your own mirror. Feel free
to use it. I'll try to keep the one on Github as up to date as possible.

Wido

[0]: http://www.varnish-cache.org/
[1]: https://gist.github.com/wido/40ffb92ea99842c2666b


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Basic object storage question

2015-09-23 Thread Cory Hawkless
Ok, so I have found this

"The objects Ceph stores in the Ceph Storage Cluster are not striped. Ceph 
Object Storage, Ceph Block Device, and the Ceph Filesystem stripe their data 
over multiple Ceph Storage Cluster objects. Ceph Clients that write directly to 
the Ceph Storage Cluster via librados must perform the striping (and parallel 
I/O) for themselves to obtain these benefits." On - 
http://docs.ceph.com/docs/master/architecture/

So it appears that the breaking a single object up into chunks(or stripes) are 
the responsibility of the application writing to Ceph, not of the RADOS engine 
itself?

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Cory 
Hawkless
Sent: Thursday, 24 September 2015 10:22 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Basic object storage question

Hi all,

I have basic question around how Ceph stores individual objects.
Say I have a pool with a replica size of 3 and I upload a 1GB file to this 
pool. It appears as if this 1GB file gets placed into 3PG's on 3 OSD's , simple 
enough?

Are individual objects never split up? What if I want to storage backup files 
or Openstack Glance images totalling 100's of GB's.

Potentially I could run into issues is I have an object who's size exceeds the 
available space on any of the OSD's, say I have 1TB OSD's and they are all 50% 
full and I try to upload a 501GB image, I presume this would fail even through 
there is sufficient space in the pool there is not a single OSD with >500GB of 
space available.

Do I have this right? If so is there any way around this? Ideally I'd like to 
use Ceph as target for all of my servers backups, but some of these total in 
the TB's but none of my OSD's are this big(Currently using 900GB SAS disks).

Thanks in advance
Cory


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to get a mount list?

2015-09-23 Thread 黑铁柱
thanks

2015-09-21 17:29 GMT+08:00 John Spray :

> I'm assuming you mean from the server: you can list the clients of an
> MDS by SSHing to the server where it's running and doing "ceph daemon
> mds. session ls".  This has been in releases since Giant iirc.
>
> Cheers,
> John
>
> On Mon, Sep 21, 2015 at 4:24 AM, domain0  wrote:
> > hi ,
> > if use cephfs, how to get a client mount list?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com