[ceph-users] osd laggy algorithm

2015-03-11 Thread Artem Savinov
hello.
ceph transfers osd node in the down status by default , after receiving 3
reports about disabled nodes. Reports are sent per   osd heartbeat grace
seconds, but the settings of mon_osd_adjust_heartbeat_gratse = true,
mon_osd_adjust_down_out_interval = true timeout to transfer nodes in down
status may vary. Tell me please: what algorithm enables changes timeout for
the transfer nodes occur in down/out status and which parameters are
affected?
thanks.

--
Artem
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Valery Tschopp

Hi Loic,

Nope, only the versions from 0.81-trusty to 0.93-1trusty are available 
in http://ceph.com/debian-testing/pool/main/c/ceph/


But the firefly deb source packages for 0.80.9-1trusty is not available :(

Cheers,
Valery

On 11/03/15 14:11 , Loic Dachary wrote:

Hi Valery,

They should be here http://ceph.com/debian-testing/

Cheers

On 11/03/2015 10:07, Valery Tschopp wrote:

Where can I find the debian trusty source package for v0.80.9?

Cheers,
Valery

On 10/03/15 20:34 , Sage Weil wrote:

This is a bugfix release for firefly.  It fixes a performance regression
in librbd, an important CRUSH misbehavior (see below), and several RGW
bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see
http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

Adjusting CRUSH maps


* This point release fixes several issues with CRUSH that trigger
excessive data migration when adjusting OSD weights.  These are most
obvious when a very small weight change (e.g., a change from 0 to
.01) triggers a large amount of movement, but the same set of bugs
can also lead to excessive (though less noticeable) movement in
other cases.

However, because the bug may already have affected your cluster,
fixing it may trigger movement *back* to the more correct location.
For this reason, you must manually opt-in to the fixed behavior.

In order to set the new tunable to correct the behavior::

   ceph osd crush set-tunable straw_calc_version 1

Note that this change will have no immediate effect.  However, from
this point forward, any 'straw' bucket in your CRUSH map that is
adjusted will get non-buggy internal weights, and that transition
may trigger some rebalancing.

You can estimate how much rebalancing will eventually be necessary
on your cluster with::

   ceph osd getcrushmap -o /tmp/cm
   crushtool -i /tmp/cm --num-rep 3 --test --show-mappings  /tmp/a 21
   crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
   crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
   crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings  /tmp/b 21
   wc -l /tmp/a  # num total mappings
   diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

 Divide the total number of lines in /tmp/a with the number of lines
 changed.  We've found that most clusters are under 10%.

 You can force all of this rebalancing to happen at once with::

   ceph osd crush reweight-all

 Otherwise, it will happen at some unknown point in the future when
 CRUSH weights are next adjusted.

Notable Changes
---

* ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
* crush: fix straw bucket weight calculation, add straw_calc_version
tunable (#10095 Sage Weil)
* crush: fix tree bucket (Rongzu Zhu)
* crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
* crushtool: add --reweight (Sage Weil)
* librbd: complete pending operations before losing image (#10299 Jason
Dillaman)
* librbd: fix read caching performance regression (#9854 Jason Dillaman)
* librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
* mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
* osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
* osd: handle no-op write with snapshot (#10262 Sage Weil)
* radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
* rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
* rgw: don't overwrite bucket/object owner when setting ACLs (#10978
Yehuda Sadeh)
* rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
* rgw: fix partial swift GET (#10553 Yehuda Sadeh)
* rgw: fix quota disable (#9907 Dong Lei)
* rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
* rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
* rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
* rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
Sadeh)
* rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
* rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
* rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
* rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
* rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
* rgw: update swift subuser permission masks when authenticating (#9918
Yehuda Sadeh)
* rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
* rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
* rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
* For packages, see 

Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread Somnath Roy
Kevin,
This is a known issue and should be fixed in the latest krbd. The problem is, 
it is not backported to 14.04 krbd yet. You need to build it from latest krbd 
source if you want to stick with 14.04.
The workaround is, you need to unmap your clients before restarting osds.

Thanks  Regards
Somnath

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of kevin 
parrikar
Sent: Wednesday, March 11, 2015 11:44 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] client crashed when osd gets restarted - hammer 0.93

Hi,
 I am trying hammer 0.93 on Ubuntu 14.04.
rbd is mapped in client ,which is also ubuntu 14.04 .
When i did a stop ceph-osd-all and then a start,client machine crashed and 
attached pic was in the console.Not sure if its related to ceph.

Thanks



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cache Tier Flush = immediate base tier journal sync?

2015-03-11 Thread Nick Fisk
I'm not sure if it's something I'm doing wrong or just experiencing an
oddity, but when my cache tier flushes dirty blocks out to the base tier,
the writes seem to hit the OSD's straight away instead of coalescing in the
journals, is this correct?

For example if I create a RBD on a standard 3 way replica pool and run fio
via librbd 128k writes, I see the journals take all the io's until I hit my
filestore_min_sync_interval and then I see it start writing to the
underlying disks.

Doing the same on a full cache tier (to force flushing)  I immediately see
the base disks at a very high utilisation. The journals also have some write
IO at the same time. The only other odd thing I can see via iostat is that
most of the time whilst I'm running Fio, is that I can see the underlying
disks doing very small write IO's of around 16kb with an occasional big
burst of activity.

I know erasure coding+cache tier is slower than just plain replicated pools,
but even with various high queue depths I'm struggling to get much above
100-150 iops compared to a 3 way replica pool which can easily achieve
1000-1500. The base tier is comprised of 40 disks. It seems quite a marked
difference and I'm wondering if this strange journal behaviour is the cause.

Does anyone have any ideas?

Nick


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Duplication name Container

2015-03-11 Thread Steffen W Sørensen
On 11/03/2015, at 15.31, Wido den Hollander w...@42on.com wrote:

 On 03/11/2015 03:23 PM, Jimmy Goffaux wrote:
 Hello All,
 
 I use Ceph in production for several months. but i have an errors with
 Ceph Rados Gateway for multiple users.
 
 I am faced with the following error:
 
 Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists
 
 Which corresponds to the documentation :
 http://ceph.com/docs/master/radosgw/s3/bucketops/
 
 By which means I can avoid this kind of problem?
 
 You can not. Bucket names are unique inside the RADOS Gateway. Just as
 with Amazon S3.
Well it can be avoided but not at the Ceph level but at your Application level 
:)
Either ignore already exist errors in your App or try to verify bucket exists 
before creating buckets... 

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen
On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote:

 On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
 
 What kind of application is that?
 Commercial Email platform from Openwave.com
 
 Maybe it could be worked around using an apache rewrite rule. In any case, I 
 opened issue #11091.
 Okay, how, by rewriting the response?
 Thanks, where can tickets be followed/viewed?
Ah here: http://tracker.ceph.com/projects/rgw/issues

 Not at the moment. There's already issue #6961, I bumped its priority 
 higher, and we'll take a look at it.
Please also backport to Giant if possible :)

/Steffen




signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread kevin parrikar
Hi,
 I am trying hammer 0.93 on Ubuntu 14.04.
rbd is mapped in client ,which is also ubuntu 14.04 .
When i did a stop ceph-osd-all and then a start,client machine crashed and
attached pic was in the console.Not sure if its related to ceph.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client crashed when osd gets restarted - hammer 0.93

2015-03-11 Thread kevin parrikar
thanks i will follow this work around.

On Thu, Mar 12, 2015 at 12:18 AM, Somnath Roy somnath@sandisk.com
wrote:

  Kevin,

 This is a known issue and should be fixed in the latest krbd. The problem
 is, it is not backported to 14.04 krbd yet. You need to build it from
 latest krbd source if you want to stick with 14.04.

 The workaround is, you need to unmap your clients before restarting osds.



 Thanks  Regards

 Somnath



 *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
 Of *kevin parrikar
 *Sent:* Wednesday, March 11, 2015 11:44 AM
 *To:* ceph-users@lists.ceph.com
 *Subject:* [ceph-users] client crashed when osd gets restarted - hammer
 0.93



 Hi,

  I am trying hammer 0.93 on Ubuntu 14.04.

 rbd is mapped in client ,which is also ubuntu 14.04 .

 When i did a stop ceph-osd-all and then a start,client machine crashed and
 attached pic was in the console.Not sure if its related to ceph.



 Thanks

 --

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If
 the reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
For each of those pgs, you'll need to identify the pg copy you want to 
be the winner and either
1) Remove all of the other ones using ceph-objectstore-tool and 
hopefully the winner you left alone will allow the pg to recover and go 
active.
2) Export the winner using ceph-objectstore-tool, use 
ceph-objectstore-tool to delete *all* copies of the pg, use 
force_create_pg to recreate the pg empty, use ceph-objectstore-tool to 
do a rados import on the exported pg copy.


Also, the pgs which are still down still have replicas which need to be 
brought back or marked lost.

-Sam

On 03/11/2015 07:29 AM, joel.merr...@gmail.com wrote:

I'd like to not have to null them if possible, there's nothing
outlandishly valuable, its more the time to reprovision (users have
stuff on there, mainly testing but I have a nasty feeling some users
won't have backed up their test instances). When you say complicated
and fragile, could you expand?

Thanks again!
Joel

On Wed, Mar 11, 2015 at 1:21 PM, Samuel Just sj...@redhat.com wrote:

Ok, you lost all copies from an interval where the pgs went active. The
recovery from this is going to be complicated and fragile.  Are the pools
valuable?
-Sam


On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote:

For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
joel.merr...@gmail.com wrote:

Sure thing, n.b. I increased pg count to see if it would help. Alas not.
:)

Thanks again!

health_detail
https://gist.github.com/199bab6d3a9fe30fbcae

osd_dump
https://gist.github.com/499178c542fa08cc33bb

osd_tree
https://gist.github.com/02b62b2501cbd684f9b2

Random selected queries:
queries/0.19.query
https://gist.github.com/f45fea7c85d6e665edf8
queries/1.a1.query
https://gist.github.com/dd68fbd5e862f94eb3be
queries/7.100.query
https://gist.github.com/d4fd1fb030c6f2b5e678
queries/7.467.query
https://gist.github.com/05dbcdc9ee089bd52d0c

On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote:

Yeah, get a ceph pg query on one of the stuck ones.
-Sam

On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:

Stuck unclean and stuck inactive. I can fire up a full query and
health dump somewhere useful if you want (full pg query info on ones
listed in health detail, tree, osd dump etc). There were blocked_by
operations that no longer exist after doing the OSD addition.

Side note, spent some time yesterday writing some bash to do this
programatically (might be useful to others, will throw on github)

On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote:

What do you mean by unblocked but still stuck?
-Sam

On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:

On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote:

You'll probably have to recreate osds with the same ids (empty
ones),
let them boot, stop them, and mark them lost.  There is a feature in
the
tracker to improve this behavior:
http://tracker.ceph.com/issues/10976
-Sam

Thanks Sam, I've readded the OSDs, they became unblocked but there
are
still the same number of pgs stuck. I looked at them in some more
detail and it seems they all have num_bytes='0'. Tried a repair too,
for good measure. Still nothing I'm afraid.

Does this mean some underlying catastrophe has happened and they are
never going to recover? Following on, would that cause data loss.
There are no missing objects and I'm hoping there's appropriate
checksumming / replicas to balance that out, but now I'm not so sure.

Thanks again,
Joel






--
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-11 Thread Francois Lafont
Hi,

I was always in the same situation: I couldn't remove an OSD without
have some PGs definitely stuck to the active+remapped state.

But I remembered I read on IRC that, before to mark out an OSD, it
could be sometimes a good idea to reweight it to 0. So, instead of
doing [1]:

ceph osd out 3

I have tried [2]:

ceph osd crush reweight osd.3 0 # waiting for the rebalancing...
ceph osd out 3

and it worked. Then I could remove my osd with the online documentation:
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

Now, the osd is removed and my cluster is HEALTH_OK. \o/

Now, my question is: why my cluster was definitely stuck to active+remapped
with [1] but was not with [2]? Personally, I have absolutely no explanation.
If you have an explanation, I'd love to know it. 

Should the reweight command be present in the online documentation?
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual
If yes, I can make a pull request on the doc with pleasure. ;)

Regards.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd pegging CPU on giant, no snapshots involved this time

2015-03-11 Thread Adolfo R. Brandes
On Wed, Feb 18, 2015 at 9:19 PM, Florian Haas wrote:
 Hey everyone,

 I must confess I'm still not fully understanding this problem and
 don't exactly know where to start digging deeper, but perhaps other
 users have seen this and/or it rings a bell.

 System info: Ceph giant on CentOS 7; approx. 240 OSDs, 6 pools using 2
 different rulesets where the problem applies to hosts and PGs using a
 bog-standard default crushmap.

 Symptom: out of the blue, ceph-osd processes on a single OSD node
 start going to 100% CPU utilization. The problems turns so bad that
 the machine is effectively becoming CPU bound and can't cope with any
 client requests anymore. Stopping and restarting all OSDs brings the
 problem right back, as does rebooting the machine — right after
 ceph-osd processes start, CPU utilization shoots up again. Stopping
 and marking out several OSDs on the machine makes the problem go away
 but obviously causes massive backfilling. All the logs show while CPU
 utilization is implausibly high are slow requests (which would be
 expected in a system that can barely do anything).

 Now I've seen issues like this before on dumpling and firefly, but
 besides the fact that they have all been addressed and should now be
 fixed, they always involved the prior mass removal of RBD snapshots.
 This system only used a handful of snapshots in testing, and is
 presently not using any snapshots at all.

 I'll be spending some time looking for clues in the log files of the
 OSDs that were shut down which caused the problem to go away, but if
 this sounds familiar to anyone willing to offer clues, I'd be more
 than interested. :) Thanks!

 Cheers,
 Florian

Dan vd Ster was kind enough to pitch in an incredibly helpful off-list
reply, which I am taking the liberty to paraphrase here:

That mysterious OSD madness seems to be caused by NUMA zone reclaim,
which is enabled by default on Intel machines with recent kernels. It
can be disabled as follows:

echo 0  /proc/sys/vm/zone_reclaim_mode

or of course, sysctl -w vm.zone_reclaim_mode=0 or the corresponding
sysctl.conf entry.

On the machines affected, that seems to have removed the CPU pegging
issue, at least it has not reappeared for several days now.

Dan and Sage have discussed the issue recently in this thread:
http://www.spinics.net/lists/ceph-users/msg14914.html

Thanks a million to Dan.

I'm looking into the original issue Florian describes above.  It seems
that unsetting zone_reclaim_mode wasn't the magical fix we hoped.  After
a couple of weeks, we're seeing pegged CPUs again, but his time we
managed to get a perf top snapshot of it happening.  These are the topmost
(ahem) lines:

8.33% [kernel] [k] _raw_spin_lock
3.14% perf [.] 0x000da124
2.58% [unknown] [.] 0x7f8a2901042d
1.85% libpython2.7.so.1.0 [.] 0x0006dac2
1.61% libc-2.17.so [.] __memcpy_ssse3_back
1.54% perf [.] dso__find_symbol
1.44% libc-2.17.so [.] __strcmp_sse42
1.41% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
1.25% [kernel] [k] native_write_msr_safe
1.24% perf [.] hists__output_resort
1.11% libleveldb.so.1.0.7 [.] 0x0003cde8
0.86% perf [.] perf_evsel__parse_sample
0.81% libtcmalloc.so.4.1.2 [.] operator new(unsigned long)
0.76% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
0.73% [kernel] [k] apic_timer_interrupt
0.71% [kernel] [k] page_fault
0.71% [kernel] [k] _raw_spin_lock_irqsave
0.62% libpthread-2.17.so [.] pthread_mutex_unlock
0.62% libc-2.17.so [.] __memcmp_sse4_1
0.61% libc-2.17.so [.] _int_malloc
0.60% perf [.] rb_next
0.58% [kernel] [k] clear_page_c_e
0.56% [kernel] [k] tg_load_down

The server in question was booted without any OSDs.  A few were started after
invoking 'perf top', during which run the CPUs were saturated.

Any ideas?

Cheers!
Adolfo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Add monitor unsuccesful

2015-03-11 Thread Steffen W Sørensen

On 12/03/2015, at 00.55, Jesus Chavez (jeschave) jesch...@cisco.com wrote:

 can anybody tell me a good blog link that explain how to add monitor? I have 
 tried manually and also with ceph-deploy without success =(
Dunno if these might help U:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(

Help
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.comhttp://www.cisco.com/





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hang osd --zap-disk

2015-03-11 Thread Jesus Chavez (jeschave)

I don’t know what is going on =( the system hangs with the message below after 
commaand ceph-deploy osd --zap-disk create tauro:sdb”

[tauro][WARNING] No data was received after 300 seconds, disconnecting...
[ceph_deploy.osd][DEBUG ] Host tauro is now ready for osd use.
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy osd activate 
tauro:sdb1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks tauro:/dev/sdb1:
[tauro][DEBUG ] connection detected need for sudo
[tauro][DEBUG ] connected to host: tauro
[tauro][DEBUG ] detect platform information from remote host
[tauro][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[ceph_deploy.osd][DEBUG ] activating host tauro disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[tauro][INFO  ] Running command: sudo ceph-disk -v activate --mark-init 
sysvinit --mount /dev/sdb1
[tauro][WARNING] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue 
-- /dev/sdb1
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[tauro][WARNING] DEBUG:ceph-disk:Mounting /dev/sdb1 on 
/var/lib/ceph/tmp/mnt.lNpFro with options noatime,inode64
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o 
noatime,inode64 -- /dev/sdb1 /var/lib/ceph/tmp/mnt.lNpFro
[tauro][WARNING] DEBUG:ceph-disk:Cluster uuid is 
fc72a252-15be-40e9-9de1-34593be5668a
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[tauro][WARNING] DEBUG:ceph-disk:Cluster name is ceph
[tauro][WARNING] DEBUG:ceph-disk:OSD uuid is 
bf192166-86e9-4c68-9bff-7ced1c9ba8ee
[tauro][WARNING] DEBUG:ceph-disk:Allocating OSD id...
[tauro][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph 
--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd create --concise bf192166-86e9-4c68-9bff-7ced1c9ba8ee
[tauro][WARNING] 2015-03-11 17:49:31.782184 7f9cf05a8700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9cec0253f0 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cec025680).fault
[tauro][WARNING] 2015-03-11 17:49:35.782524 7f9cf04a7700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:37.781846 7f9cf05a8700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9ce00030e0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0003370).fault
[tauro][WARNING] 2015-03-11 17:49:41.782566 7f9cf04a7700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:43.782303 7f9cf05a8700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce00025d0).fault
[tauro][WARNING] 2015-03-11 17:49:47.784627 7f9cf04a7700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9cec00 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9cee90).fault
[tauro][WARNING] 2015-03-11 17:49:49.782712 7f9cf05a8700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9ce00031b0 sd=5 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0002c60).fault
[tauro][WARNING] 2015-03-11 17:49:53.784690 7f9cf04a7700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9ce0003fb0 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0004240).fault
[tauro][WARNING] 2015-03-11 17:49:55.783248 7f9cf05a8700  0 -- :/1015927  
192.168.4.35:6789/0 pipe(0x7f9ce0004930 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f9ce0004bc0)
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.comhttp://www.cisco.com/





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(

Help
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.comhttp://www.cisco.com/





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck unclean active+remapped after an osd marked out

2015-03-11 Thread Francois Lafont
Le 11/03/2015 05:44, Francois Lafont a écrit :

 PS: here is my conf.
 [...]

I have this too:

~# ceph osd crush show-tunables
{ choose_local_tries: 0,
  choose_local_fallback_tries: 0,
  choose_total_tries: 50,
  chooseleaf_descend_once: 1,
  chooseleaf_vary_r: 0,
  straw_calc_version: 1,
  profile: unknown,
  optimal_tunables: 0,
  legacy_tunables: 0,
  require_feature_tunables: 1,
  require_feature_tunables2: 1,
  require_feature_tunables3: 0,
  has_v2_rules: 0,
  has_v3_rules: 0}

And in the online documentation, I can read this:
http://ceph.com/docs/master/rados/operations/crush-map/#crush-tunables3

Legacy default is 0, but with this value CRUSH is sometimes unable to
find a mapping.

Is this my problem?
Should I do this in my cluster?

ceph osd crush set-tunable chooseleaf_vary_r 1

But here 
http://ceph.com/docs/master/rados/operations/crush-map/#which-client-versions-support-crush-tunables3,
I can read: Linux kernel version v3.15 or later (for the file system and RBD 
kernel clients)
and It could be a problem for me because I have clients with kernel version
3.13 (Ubuntu 14.04).

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen

On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote:

 On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
 
 What kind of application is that?
 Commercial Email platform from Openwave.com
 
 Maybe it could be worked around using an apache rewrite rule. In any case, I 
 opened issue #11091.
 Okay, how, by rewriting the response?
 Thanks, where can tickets be followed/viewed?
 
 Asked my vendor what confuses their App about the reply. Would be nice if 
 they could work against Ceph S3 :)
 
 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?
 
 That's a bug.
 Ok, any resolution/work-around to this?
 
 Not at the moment. There's already issue #6961, I bumped its priority 
 higher, and we'll take a look at it.
 Thanks!
BTW running Giant:

[root@rgw ~]# rpm -qa| grep -i ceph
httpd-tools-2.2.22-1.ceph.el6.x86_64
ceph-common-0.87.1-0.el6.x86_64
mod_fastcgi-2.4.7-1.ceph.el6.x86_64
libcephfs1-0.87.1-0.el6.x86_64
xfsprogs-3.1.1-14_ceph.el6.x86_64
ceph-radosgw-0.87.1-0.el6.x86_64
httpd-2.2.22-1.ceph.el6.x86_64
python-ceph-0.87.1-0.el6.x86_64
ceph-0.87.1-0.el6.x86_64

[root@rgw ~]# uname -a
Linux rgw.sprawl.dk 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

[root@rgw ~]# cat /etc/redhat-release 
CentOS release 6.6 (Final)



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 = pipe.fault, server, going to standby

2015-03-11 Thread Ilya Dryomov
On Wed, Mar 11, 2015 at 1:21 PM, LOPEZ Jean-Charles jelo...@redhat.com wrote:
 Hi Florent

 What are the « rules » for stripe_unit  object_size ? - stripe_unit *
 stripe_count = object_size

 So in your case set stripe_unit = 2

 JC


 On 11 Mar 2015, at 19:59, Florent B flor...@coppint.com wrote:

 Hi all,

 I'm testing CephFS with Giant and I have a problem when I set these attrs :

 setfattr -n ceph.dir.layout.stripe_unit -v 65536 pool_cephfs01/
 setfattr -n ceph.dir.layout.stripe_count -v 1 pool_cephfs01/
 setfattr -n ceph.dir.layout.object_size -v 1310720 pool_cephfs01/
 setfattr -n ceph.dir.layout.pool -v cephfs01 pool_cephfs01/

 When a client writes files in pool_cephfs01/, It got failed: Transport
 endpoint is not connected (107) and these errors on MDS :

 10.111.0.6:6801/41706  10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2
 pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby

 When I set stripe_unit=1048576  object_size=1048576, it seems working.

 What are the rules for stripe_unit  object_size ?

stripe_unit * stripe_count = object_size is definitely not correct.
The current rules are:

- object_size is a multiple of stripe_unit
- stripe_unit (and consequently object_size) is 64k-aligned
- stripe_count is at least 1 (i.e. at least 1 object in an object set)

However, the above layout is pretty bogus - there is basically no
striping going on, so it's probably a bug in the way it's handled.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Nick Fisk
Hi Stefan,

If the majority of your hot data fits on the cache tier you will see quite a
marked improvement in read performance and similar write performance
(assuming you would have had your hdds backed by SSD journals).

However for data that is not in the cache tier you will get 10-20% less read
performance and anything up to 10x less write performance. This is because a
cache write miss has to read the entire object from the backing store into
the cache and then modify it.

The read performance degradation will probably be fixed in Hammer with proxy
reads, but writes will most likely still be an issue.

Nick


 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Stefan Priebe - Profihost AG
 Sent: 11 March 2015 07:27
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Firefly Tiering
 
 Hi,
 
 has anybody successfully tested tiering while using firefly? How much does
it
 impact performance vs. a normal pool? I mean is there any difference
 between a full SSD pool und a tiering SSD pool with SATA Backend?
 
 Greets,
 Stefan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: stripe_unit=65536 + object_size=1310720 = pipe.fault, server, going to standby

2015-03-11 Thread LOPEZ Jean-Charles
Hi Florent

What are the « rules » for stripe_unit  object_size ? - stripe_unit * 
stripe_count = object_size

So in your case set stripe_unit = 2

JC


 On 11 Mar 2015, at 19:59, Florent B flor...@coppint.com wrote:
 
 Hi all,
 
 I'm testing CephFS with Giant and I have a problem when I set these attrs :
 
 setfattr -n ceph.dir.layout.stripe_unit -v 65536 pool_cephfs01/
 setfattr -n ceph.dir.layout.stripe_count -v 1 pool_cephfs01/
 setfattr -n ceph.dir.layout.object_size -v 1310720 pool_cephfs01/
 setfattr -n ceph.dir.layout.pool -v cephfs01 pool_cephfs01/ 
 
 When a client writes files in pool_cephfs01/, It got failed: Transport 
 endpoint is not connected (107) and these errors on MDS :
 
 10.111.0.6:6801/41706  10.111.17.118:0/9384 pipe(0x5e3a580 sd=27 :6801 s=2 
 pgs=2 cs=1 l=0 c=0x6a8d1e0).fault, server, going to standby
 
 When I set stripe_unit=1048576  object_size=1048576, it seems working.
 
 What are the rules for stripe_unit  object_size ?
 
 Thank you.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread joel.merr...@gmail.com
For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
joel.merr...@gmail.com wrote:
 Sure thing, n.b. I increased pg count to see if it would help. Alas not. :)

 Thanks again!

 health_detail
 https://gist.github.com/199bab6d3a9fe30fbcae

 osd_dump
 https://gist.github.com/499178c542fa08cc33bb

 osd_tree
 https://gist.github.com/02b62b2501cbd684f9b2

 Random selected queries:
 queries/0.19.query
 https://gist.github.com/f45fea7c85d6e665edf8
 queries/1.a1.query
 https://gist.github.com/dd68fbd5e862f94eb3be
 queries/7.100.query
 https://gist.github.com/d4fd1fb030c6f2b5e678
 queries/7.467.query
 https://gist.github.com/05dbcdc9ee089bd52d0c

 On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote:
 Yeah, get a ceph pg query on one of the stuck ones.
 -Sam

 On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:
 Stuck unclean and stuck inactive. I can fire up a full query and
 health dump somewhere useful if you want (full pg query info on ones
 listed in health detail, tree, osd dump etc). There were blocked_by
 operations that no longer exist after doing the OSD addition.

 Side note, spent some time yesterday writing some bash to do this
 programatically (might be useful to others, will throw on github)

 On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote:
  What do you mean by unblocked but still stuck?
  -Sam
 
  On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:
  On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote:
   You'll probably have to recreate osds with the same ids (empty ones),
   let them boot, stop them, and mark them lost.  There is a feature in 
   the
   tracker to improve this behavior: http://tracker.ceph.com/issues/10976
   -Sam
 
  Thanks Sam, I've readded the OSDs, they became unblocked but there are
  still the same number of pgs stuck. I looked at them in some more
  detail and it seems they all have num_bytes='0'. Tried a repair too,
  for good measure. Still nothing I'm afraid.
 
  Does this mean some underlying catastrophe has happened and they are
  never going to recover? Following on, would that cause data loss.
  There are no missing objects and I'm hoping there's appropriate
  checksumming / replicas to balance that out, but now I'm not so sure.
 
  Thanks again,
  Joel
 
 








 --
 $ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'



-- 
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Nick Fisk


 Am 11.03.2015 um 11:17 schrieb Nick Fisk:
 
 
  Hi Nick,
 
  Am 11.03.2015 um 10:52 schrieb Nick Fisk:
  Hi Stefan,
 
  If the majority of your hot data fits on the cache tier you will see
  quite a marked improvement in read performance
  I don't have writes ;-) just around 5%. 95% are writes.
 
  and similar write performance
  (assuming you would have had your hdds backed by SSD journals).
 
  similar write performance of SSD cache tier or HDD backend tier?
 
  I'm mainly interested in a writeback mode.
 
  Writes on Cache tiering are the same speed as a non cache tiering
  solution (with SSD journals), if the blocks are in the cache.
 
 
 
  However for data that is not in the cache tier you will get 10-20%
  less read performance and anything up to 10x less write performance.
  This is because a cache write miss has to read the entire object
  from the backing store into the cache and then modify it.
 
  The read performance degradation will probably be fixed in Hammer
  with proxy reads, but writes will most likely still be an issue.
 
  Why is writing to the HOT part so slow?
 
 
  If the object is in the cache tier or currently doesn't exist, then
  writes are fast as it just has to write directly to the cache tier
  SSD's. However if the object is in the slow tier and you write to it,
then its
 very slow.
  This is because it has to read it off the slow tier (~12ms), write it
  on to the cache tier(~.5ms) and then update it (~.5ms).
 
 Mhm sounds correct. So it's better to stuck with journals instead of using
a
 cache tier.

That's purely down to your workload, but in general if you are doing lots of
writes, a cache tier will probably slow you down at the moment.


 
 Stefan
 
 
  With a non caching solution, you would have just written straight to
  the journal (~.5ms)
 
  Stefan
 
  Nick
 
 
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
  Behalf Of Stefan Priebe - Profihost AG
  Sent: 11 March 2015 07:27
  To: ceph-users@lists.ceph.com
  Subject: [ceph-users] Firefly Tiering
 
  Hi,
 
  has anybody successfully tested tiering while using firefly? How
  much does
  it
  impact performance vs. a normal pool? I mean is there any
  difference between a full SSD pool und a tiering SSD pool with SATA
 Backend?
 
  Greets,
  Stefan
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph days

2015-03-11 Thread Karan Singh
Check out ceph youtube page.

- Karan -

 On 11 Mar 2015, at 00:45, Tom Deneau tom.den...@amd.com wrote:
 
 Are the slides or videos from ceph days presentations made available
 somewhere?  I noticed some links in the Frankfurt Ceph day, but not for the
 other Ceph Days.
 
 -- Tom
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Stefan Priebe - Profihost AG
Hi Nick,

Am 11.03.2015 um 10:52 schrieb Nick Fisk:
 Hi Stefan,
 
 If the majority of your hot data fits on the cache tier you will see quite a
 marked improvement in read performance
I don't have writes ;-) just around 5%. 95% are writes.

 and similar write performance
 (assuming you would have had your hdds backed by SSD journals).

similar write performance of SSD cache tier or HDD backend tier?

I'm mainly interested in a writeback mode.

 However for data that is not in the cache tier you will get 10-20% less read
 performance and anything up to 10x less write performance. This is because a
 cache write miss has to read the entire object from the backing store into
 the cache and then modify it.
 
 The read performance degradation will probably be fixed in Hammer with proxy
 reads, but writes will most likely still be an issue.

Why is writing to the HOT part so slow?

Stefan

 Nick
 
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Stefan Priebe - Profihost AG
 Sent: 11 March 2015 07:27
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Firefly Tiering

 Hi,

 has anybody successfully tested tiering while using firefly? How much does
 it
 impact performance vs. a normal pool? I mean is there any difference
 between a full SSD pool und a tiering SSD pool with SATA Backend?

 Greets,
 Stefan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Valery Tschopp

Where can I find the debian trusty source package for v0.80.9?

Cheers,
Valery

On 10/03/15 20:34 , Sage Weil wrote:

This is a bugfix release for firefly.  It fixes a performance regression
in librbd, an important CRUSH misbehavior (see below), and several RGW
bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
and libcephfs.

We recommend that all Firefly users upgrade.

For more detailed information, see
   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

Adjusting CRUSH maps


* This point release fixes several issues with CRUSH that trigger
   excessive data migration when adjusting OSD weights.  These are most
   obvious when a very small weight change (e.g., a change from 0 to
   .01) triggers a large amount of movement, but the same set of bugs
   can also lead to excessive (though less noticeable) movement in
   other cases.

   However, because the bug may already have affected your cluster,
   fixing it may trigger movement *back* to the more correct location.
   For this reason, you must manually opt-in to the fixed behavior.

   In order to set the new tunable to correct the behavior::

  ceph osd crush set-tunable straw_calc_version 1

   Note that this change will have no immediate effect.  However, from
   this point forward, any 'straw' bucket in your CRUSH map that is
   adjusted will get non-buggy internal weights, and that transition
   may trigger some rebalancing.

   You can estimate how much rebalancing will eventually be necessary
   on your cluster with::

  ceph osd getcrushmap -o /tmp/cm
  crushtool -i /tmp/cm --num-rep 3 --test --show-mappings  /tmp/a 21
  crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
  crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
  crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings  /tmp/b 21
  wc -l /tmp/a  # num total mappings
  diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

Divide the total number of lines in /tmp/a with the number of lines
changed.  We've found that most clusters are under 10%.

You can force all of this rebalancing to happen at once with::

  ceph osd crush reweight-all

Otherwise, it will happen at some unknown point in the future when
CRUSH weights are next adjusted.

Notable Changes
---

* ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
* crush: fix straw bucket weight calculation, add straw_calc_version
   tunable (#10095 Sage Weil)
* crush: fix tree bucket (Rongzu Zhu)
* crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
* crushtool: add --reweight (Sage Weil)
* librbd: complete pending operations before losing image (#10299 Jason
   Dillaman)
* librbd: fix read caching performance regression (#9854 Jason Dillaman)
* librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
* mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
* osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
* osd: handle no-op write with snapshot (#10262 Sage Weil)
* radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
* rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
   Yehuda Sadeh)
* rgw: don't overwrite bucket/object owner when setting ACLs (#10978
   Yehuda Sadeh)
* rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
* rgw: fix partial swift GET (#10553 Yehuda Sadeh)
* rgw: fix quota disable (#9907 Dong Lei)
* rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
* rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
* rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
* rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
   Sadeh)
* rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
* rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
* rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
* rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
* rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
* rgw: update swift subuser permission masks when authenticating (#9918
   Yehuda Sadeh)
* rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
   Yehuda Sadeh)
* rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
* rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
SWITCH
--
Valery Tschopp, Software Engineer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
email: 

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-11 Thread Karan Singh
Thanks Sage

I will create a “new feature” request on tracker.ceph.com 
http://tracker.ceph.com/ so that this discussion should not get buried under 
mailing list. 

Developers can implement this as per their convenience.



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


 On 10 Mar 2015, at 14:26, Sage Weil s...@newdream.net wrote:
 
 On Tue, 10 Mar 2015, Christian Eichelmann wrote:
 Hi Sage,
 
 we hit this problem a few monthes ago as well and it took us quite a while to
 figure out what's wrong.
 
 As a Systemadministrator I don't like the idea that daemons or even init
 scripts are changing system wide configuration parameters, so I wouldn't like
 to see the OSDs do it themself.
 
 This is my general feeling as well.  As we move to systemd, I'd like to 
 have the ceph unit file get away from this entirely and have the admin set 
 these values in /etc/security/limits.conf or /etc/sysctl.d.  The main 
 thing making this problematic right now is that the daemons run as root 
 instead of a 'ceph' user.
 
 The idea with the warning is on one hand a good hint, on the other hand it
 also may confuse people, since changing this setting is not required for
 common hardware.
 
 If we make it warn only if it reaches  50% of the threshold that is 
 probably safe...
 
 sage
 
 
 
 Regards,
 Christian
 
 On 03/09/2015 08:01 PM, Sage Weil wrote:
 On Mon, 9 Mar 2015, Karan Singh wrote:
 Thanks Guys kernel.pid_max=4194303 did the trick.
 Great to hear!  Sorry we missed that you only had it at 65536.
 
 This is a really common problem that people hit when their clusters start
 to grow.  Is there somewhere in the docs we can put this to catch more
 users?  Or maybe a warning issued by the osds themselves or something if
 they see limits that are low?
 
 sage
 
 - Karan -
 
   On 09 Mar 2015, at 14:48, Christian Eichelmann
   christian.eichelm...@1und1.de wrote:
 
 Hi Karan,
 
 as you are actually writing in your own book, the problem is the
 sysctl
 setting kernel.pid_max. I've seen in your bug report that you were
 setting it to 65536, which is still to low for high density hardware.
 
 In our cluster, one OSD server has in an idle situation about 66.000
 Threads (60 OSDs per Server). The number of threads increases when you
 increase the number of placement groups in the cluster, which I think
 has triggered your problem.
 
 Set the kernel.pid_max setting to 4194303 (the maximum) like Azad
 Aliyar suggested, and the problem should be gone.
 
 Regards,
 Christian
 
 Am 09.03.2015 11:41, schrieb Karan Singh:
   Hello Community need help to fix a long going Ceph
   problem.
 
   Cluster is unhealthy , Multiple OSDs are DOWN. When i am
   trying to
   restart OSD?s i am getting this error
 
 
   /2015-03-09 12:22:16.312774 7f760dac9700 -1
   common/Thread.cc
   http://Thread.cc: In function 'void
   Thread::create(size_t)' thread
   7f760dac9700 time 2015-03-09 12:22:16.311970/
   /common/Thread.cc http://Thread.cc: 129: FAILED
   assert(ret == 0)/
 
 
   *Environment *:  4 Nodes , OSD+Monitor , Firefly latest ,
   CentOS6.5
   , 3.17.2-1.el6.elrepo.x86_64
 
   Tried upgrading from 0.80.7 to 0.80.8  but no Luck
 
   Tried centOS stock kernel 2.6.32  but no Luck
 
   Memory is not a problem more then 150+GB is free
 
 
   Did any one every faced this problem ??
 
   *Cluster status *
   *
   *
   / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/
   / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down;
   1 pgs
   incomplete; 1735 pgs peering; 8938 pgs stale; 1/
   /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs
   stuck unclean;
   recovery 6061/31080 objects degraded (19/
   /.501%); 111/196 in osds are down; clock skew detected on
   mon.pouta-s02,
   mon.pouta-s03/
   / monmap e3: 3 mons at
 {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX
   .50.3:6789/
   //0}, election epoch 1312, quorum 0,1,2
   pouta-s01,pouta-s02,pouta-s03/
   /   * osdmap e26633: 239 osds: 85 up, 196 in*/
   /  pgmap v60389: 17408 pgs, 13 pools, 42345 MB data,
   10360 objects/
   /4699 GB used, 707 TB / 711 TB avail/
   /6061/31080 objects degraded (19.501%)/
   /  14 down+remapped+peering/
   /  39 active/
   /3289 active+clean/
   / 547 peering/
   / 663 stale+down+peering/
   / 705 stale+active+remapped/
   /   1 active+degraded+remapped/
   /   1 

Re: [ceph-users] Firefly Tiering

2015-03-11 Thread Stefan Priebe - Profihost AG
Am 11.03.2015 um 11:17 schrieb Nick Fisk:
 
 
 Hi Nick,

 Am 11.03.2015 um 10:52 schrieb Nick Fisk:
 Hi Stefan,

 If the majority of your hot data fits on the cache tier you will see
 quite a marked improvement in read performance
 I don't have writes ;-) just around 5%. 95% are writes.

 and similar write performance
 (assuming you would have had your hdds backed by SSD journals).

 similar write performance of SSD cache tier or HDD backend tier?

 I'm mainly interested in a writeback mode.
 
 Writes on Cache tiering are the same speed as a non cache tiering solution
 (with SSD journals), if the blocks are in the cache. 
 
 

 However for data that is not in the cache tier you will get 10-20%
 less read performance and anything up to 10x less write performance.
 This is because a cache write miss has to read the entire object from
 the backing store into the cache and then modify it.

 The read performance degradation will probably be fixed in Hammer with
 proxy reads, but writes will most likely still be an issue.

 Why is writing to the HOT part so slow?

 
 If the object is in the cache tier or currently doesn't exist, then writes
 are fast as it just has to write directly to the cache tier SSD's. However
 if the object is in the slow tier and you write to it, then its very slow.
 This is because it has to read it off the slow tier (~12ms), write it on to
 the cache tier(~.5ms) and then update it (~.5ms).

Mhm sounds correct. So it's better to stuck with journals instead of
using a cache tier.

Stefan

 
 With a non caching solution, you would have just written straight to the
 journal (~.5ms)
 
 Stefan

 Nick


 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Stefan Priebe - Profihost AG
 Sent: 11 March 2015 07:27
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Firefly Tiering

 Hi,

 has anybody successfully tested tiering while using firefly? How much
 does
 it
 impact performance vs. a normal pool? I mean is there any difference
 between a full SSD pool und a tiering SSD pool with SATA Backend?

 Greets,
 Stefan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] S3 RadosGW - Create bucket OP

2015-03-11 Thread Steffen W Sørensen
On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:

 What kind of application is that?
 Commercial Email platform from Openwave.com
 
 Maybe it could be worked around using an apache rewrite rule. In any case, I 
 opened issue #11091.
Okay, how, by rewriting the response?
Thanks, where can tickets be followed/viewed?

Asked my vendor what confuses their App about the reply. Would be nice if they 
could work against Ceph S3 :)

 2. at every create bucket OP the GW create what looks like new containers
 for ACLs in .rgw pool, is this normal
 or howto avoid such multiple objects clottering the GW pools?
 Is there something wrong since I get multiple ACL object for this bucket
 everytime my App tries to recreate same bucket or
 is this a feature/bug in radosGW?
 
 That's a bug.
 Ok, any resolution/work-around to this?
 
 Not at the moment. There's already issue #6961, I bumped its priority higher, 
 and we'll take a look at it.
Thanks!

/Steffen


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Dan van der Ster
Hi Sage,

On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil sw...@redhat.com wrote:
 Adjusting CRUSH maps
 

 * This point release fixes several issues with CRUSH that trigger
   excessive data migration when adjusting OSD weights.  These are most
   obvious when a very small weight change (e.g., a change from 0 to
   .01) triggers a large amount of movement, but the same set of bugs
   can also lead to excessive (though less noticeable) movement in
   other cases.

   However, because the bug may already have affected your cluster,
   fixing it may trigger movement *back* to the more correct location.
   For this reason, you must manually opt-in to the fixed behavior.

   In order to set the new tunable to correct the behavior::

  ceph osd crush set-tunable straw_calc_version 1


Since it's not obvious in this case, does setting straw_calc_version =
1 still allow older firefly clients to connect?

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Loic Dachary
Hi Valery,

They should be here http://ceph.com/debian-testing/

Cheers

On 11/03/2015 10:07, Valery Tschopp wrote:
 Where can I find the debian trusty source package for v0.80.9?
 
 Cheers,
 Valery
 
 On 10/03/15 20:34 , Sage Weil wrote:
 This is a bugfix release for firefly.  It fixes a performance regression
 in librbd, an important CRUSH misbehavior (see below), and several RGW
 bugs.  We have also backported support for flock/fcntl locks to ceph-fuse
 and libcephfs.

 We recommend that all Firefly users upgrade.

 For more detailed information, see
http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt

 Adjusting CRUSH maps
 

 * This point release fixes several issues with CRUSH that trigger
excessive data migration when adjusting OSD weights.  These are most
obvious when a very small weight change (e.g., a change from 0 to
.01) triggers a large amount of movement, but the same set of bugs
can also lead to excessive (though less noticeable) movement in
other cases.

However, because the bug may already have affected your cluster,
fixing it may trigger movement *back* to the more correct location.
For this reason, you must manually opt-in to the fixed behavior.

In order to set the new tunable to correct the behavior::

   ceph osd crush set-tunable straw_calc_version 1

Note that this change will have no immediate effect.  However, from
this point forward, any 'straw' bucket in your CRUSH map that is
adjusted will get non-buggy internal weights, and that transition
may trigger some rebalancing.

You can estimate how much rebalancing will eventually be necessary
on your cluster with::

   ceph osd getcrushmap -o /tmp/cm
   crushtool -i /tmp/cm --num-rep 3 --test --show-mappings  /tmp/a 21
   crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
   crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
   crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings  /tmp/b 21
   wc -l /tmp/a  # num total mappings
   diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings

 Divide the total number of lines in /tmp/a with the number of lines
 changed.  We've found that most clusters are under 10%.

 You can force all of this rebalancing to happen at once with::

   ceph osd crush reweight-all

 Otherwise, it will happen at some unknown point in the future when
 CRUSH weights are next adjusted.

 Notable Changes
 ---

 * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
 * crush: fix straw bucket weight calculation, add straw_calc_version
tunable (#10095 Sage Weil)
 * crush: fix tree bucket (Rongzu Zhu)
 * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
 * crushtool: add --reweight (Sage Weil)
 * librbd: complete pending operations before losing image (#10299 Jason
Dillaman)
 * librbd: fix read caching performance regression (#9854 Jason Dillaman)
 * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
 * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
 * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
 * osd: handle no-op write with snapshot (#10262 Sage Weil)
 * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
 * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
 * rgw: don't overwrite bucket/object owner when setting ACLs (#10978
Yehuda Sadeh)
 * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
 * rgw: fix partial swift GET (#10553 Yehuda Sadeh)
 * rgw: fix quota disable (#9907 Dong Lei)
 * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
 * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
 * rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
 * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda
Sadeh)
 * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
 * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
 * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
 * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
 * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
 * rgw: update swift subuser permission masks when authenticating (#9918
Yehuda Sadeh)
 * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis,
Yehuda Sadeh)
 * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
 * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)

 Getting Ceph
 

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
 * For packages, see http://ceph.com/docs/master/install/get-packages
 * For ceph-deploy, see 
 http://ceph.com/docs/master/install/install-ceph-deploy
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

Re: [ceph-users] Stuck PGs blocked_by non-existent OSDs

2015-03-11 Thread Samuel Just
Ok, you lost all copies from an interval where the pgs went active. The 
recovery from this is going to be complicated and fragile.  Are the 
pools valuable?

-Sam

On 03/11/2015 03:35 AM, joel.merr...@gmail.com wrote:

For clarity too, I've tried to drop the min_size before as suggested,
doesn't make a difference unfortunately

On Wed, Mar 11, 2015 at 9:50 AM, joel.merr...@gmail.com
joel.merr...@gmail.com wrote:

Sure thing, n.b. I increased pg count to see if it would help. Alas not. :)

Thanks again!

health_detail
https://gist.github.com/199bab6d3a9fe30fbcae

osd_dump
https://gist.github.com/499178c542fa08cc33bb

osd_tree
https://gist.github.com/02b62b2501cbd684f9b2

Random selected queries:
queries/0.19.query
https://gist.github.com/f45fea7c85d6e665edf8
queries/1.a1.query
https://gist.github.com/dd68fbd5e862f94eb3be
queries/7.100.query
https://gist.github.com/d4fd1fb030c6f2b5e678
queries/7.467.query
https://gist.github.com/05dbcdc9ee089bd52d0c

On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just sj...@redhat.com wrote:

Yeah, get a ceph pg query on one of the stuck ones.
-Sam

On Tue, 2015-03-10 at 14:41 +, joel.merr...@gmail.com wrote:

Stuck unclean and stuck inactive. I can fire up a full query and
health dump somewhere useful if you want (full pg query info on ones
listed in health detail, tree, osd dump etc). There were blocked_by
operations that no longer exist after doing the OSD addition.

Side note, spent some time yesterday writing some bash to do this
programatically (might be useful to others, will throw on github)

On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just sj...@redhat.com wrote:

What do you mean by unblocked but still stuck?
-Sam

On Mon, 2015-03-09 at 22:54 +, joel.merr...@gmail.com wrote:

On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just sj...@redhat.com wrote:

You'll probably have to recreate osds with the same ids (empty ones),
let them boot, stop them, and mark them lost.  There is a feature in the
tracker to improve this behavior: http://tracker.ceph.com/issues/10976
-Sam

Thanks Sam, I've readded the OSDs, they became unblocked but there are
still the same number of pgs stuck. I looked at them in some more
detail and it seems they all have num_bytes='0'. Tried a repair too,
for good measure. Still nothing I'm afraid.

Does this mean some underlying catastrophe has happened and they are
never going to recover? Following on, would that cause data loss.
There are no missing objects and I'm hoping there's appropriate
checksumming / replicas to balance that out, but now I'm not so sure.

Thanks again,
Joel










--
$ echo kpfmAdpoofdufevq/dp/vl | perl -pe 's/(.)/chr(ord($1)-1)/ge'





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Stefan Priebe - Profihost AG wrote:
 Hi Sage,
 Am 11.03.2015 um 04:14 schrieb Sage Weil:
  On Wed, 11 Mar 2015, Christian Balzer wrote:
  On Tue, 10 Mar 2015 12:34:14 -0700 (PDT) Sage Weil wrote:
 
 
  Adjusting CRUSH maps
  
 
  * This point release fixes several issues with CRUSH that trigger
excessive data migration when adjusting OSD weights.  These are most
obvious when a very small weight change (e.g., a change from 0 to
.01) triggers a large amount of movement, but the same set of bugs
can also lead to excessive (though less noticeable) movement in
other cases.
 
However, because the bug may already have affected your cluster,
fixing it may trigger movement *back* to the more correct location.
For this reason, you must manually opt-in to the fixed behavior.
 
  It would be nice to know at what version of Ceph those bugs were
  introduced.
  
  This bug has been present in CRUSH since the beginning.
 
 So peaople upgrading from dumplang have todo the same?
 
 1.) They need to set tunables to optimal (to get firefly tunables)
 2.) They have to set those options you mention?

Nothing has to (or probably should be) done as part of the upgrade process 
itself.

This tunable can be set without changing to firefly tunables.  It affects 
the monitor-side generation of internal weight values one, and has no 
dependency or compatibility issue with clients or OSDs.  And the bug only 
triggers when a weight is changed.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Gabri Mate wrote:
 May I assume this fix will be in Hammer? So can I use this to fix my
 cluster after upgrading Giant to Hammer?

Yes, the fix is also in Hammer, but the same procedure should be followed 
to opt-in to the new behavior.

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Sage Weil
On Wed, 11 Mar 2015, Dan van der Ster wrote:
 Hi Sage,
 
 On Tue, Mar 10, 2015 at 8:34 PM, Sage Weil sw...@redhat.com wrote:
  Adjusting CRUSH maps
  
 
  * This point release fixes several issues with CRUSH that trigger
excessive data migration when adjusting OSD weights.  These are most
obvious when a very small weight change (e.g., a change from 0 to
.01) triggers a large amount of movement, but the same set of bugs
can also lead to excessive (though less noticeable) movement in
other cases.
 
However, because the bug may already have affected your cluster,
fixing it may trigger movement *back* to the more correct location.
For this reason, you must manually opt-in to the fixed behavior.
 
In order to set the new tunable to correct the behavior::
 
   ceph osd crush set-tunable straw_calc_version 1
 
 
 Since it's not obvious in this case, does setting straw_calc_version =
 1 still allow older firefly clients to connect?

Correct.  The bug only affects the generation of internal weight values 
that are stored in the crush map itself (crush_calc_straw()).  Setting the 
tunable makes the *monitors* behave properly (if adjusting weights via the 
ceph cli) or *crushtool* calculate weights properly if you are compiling 
the crush map via 'crushtool -c ...'.  There is no dependency or 
compatibility issue with clients, and no need to set tunables to 'firefly' 
to set straw_calc_version.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Duplication name Container

2015-03-11 Thread Wido den Hollander
On 03/11/2015 03:23 PM, Jimmy Goffaux wrote:
 Hello All,
 
 I use Ceph in production for several months. but i have an errors with
 Ceph Rados Gateway for multiple users.
 
 I am faced with the following error:
 
 Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists
 
 Which corresponds to the documentation :
 http://ceph.com/docs/master/radosgw/s3/bucketops/
 
 By which means I can avoid this kind of problem?
 

You can not. Bucket names are unique inside the RADOS Gateway. Just as
with Amazon S3.

 Here are my versions used:
 
 radosgw-agent  = 1.2-1precise
 ceph   = 0.87-1precise
 
 Thank you for your help
 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding Monitor Stuck

2015-03-11 Thread Jesus Chavez (jeschave)
I am really stuck adding second monitor =(, ceph-deploy mon create seems to 
finish with some error like monitor may not be able to form quorum and they are 
not definite in mon initial…
I have found there is a way to get it work and is doing the next commands:

ceph mon add tauro 192.168.4.35:6789

but this is weird because seems to be a command that you usually run after mkfs 
something like this (do ceph-mon -i {mon-id} --mkfs --monmap 
{tmp}/{map-filename} --keyring {tmp}/{key-filename}) :@ but that depends on 
monmap and keyring things that you are not able to do in the “new monitor” 
since it has nothing =( so even the manual way that if you follow the steps you 
get lost because you don’t really know which command is for which server.

Also it says that you should star the new monitor while the “add command” is 
hunting a mon client, but that depends on monmap and keyring again things that 
you don’t have in the new server…

=( Im getting crazy can anybody explain how does really work?

Thanks
[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.comhttp://www.cisco.com/





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for 
Company Registration Information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.9 Firefly released

2015-03-11 Thread Gabri Mate
Hi,

May I assume this fix will be in Hammer? So can I use this to fix my
cluster after upgrading Giant to Hammer?

Best regards,
Mate

On 12:34 Tue 10 Mar , Sage Weil wrote:
 This is a bugfix release for firefly.  It fixes a performance regression 
 in librbd, an important CRUSH misbehavior (see below), and several RGW 
 bugs.  We have also backported support for flock/fcntl locks to ceph-fuse 
 and libcephfs.
 
 We recommend that all Firefly users upgrade.
 
 For more detailed information, see
   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
 
 Adjusting CRUSH maps
 
 
 * This point release fixes several issues with CRUSH that trigger
   excessive data migration when adjusting OSD weights.  These are most
   obvious when a very small weight change (e.g., a change from 0 to
   .01) triggers a large amount of movement, but the same set of bugs
   can also lead to excessive (though less noticeable) movement in
   other cases.
 
   However, because the bug may already have affected your cluster,
   fixing it may trigger movement *back* to the more correct location.
   For this reason, you must manually opt-in to the fixed behavior.
 
   In order to set the new tunable to correct the behavior::
 
  ceph osd crush set-tunable straw_calc_version 1
 
   Note that this change will have no immediate effect.  However, from
   this point forward, any 'straw' bucket in your CRUSH map that is
   adjusted will get non-buggy internal weights, and that transition
   may trigger some rebalancing.
 
   You can estimate how much rebalancing will eventually be necessary
   on your cluster with::
 
  ceph osd getcrushmap -o /tmp/cm
  crushtool -i /tmp/cm --num-rep 3 --test --show-mappings  /tmp/a 21
  crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
  crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
  crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings  /tmp/b 21
  wc -l /tmp/a  # num total mappings
  diff -u /tmp/a /tmp/b | grep -c ^+# num changed mappings
 
Divide the total number of lines in /tmp/a with the number of lines
changed.  We've found that most clusters are under 10%.
 
You can force all of this rebalancing to happen at once with::
 
  ceph osd crush reweight-all
 
Otherwise, it will happen at some unknown point in the future when
CRUSH weights are next adjusted.
 
 Notable Changes
 ---
 
 * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
 * crush: fix straw bucket weight calculation, add straw_calc_version 
   tunable (#10095 Sage Weil)
 * crush: fix tree bucket (Rongzu Zhu)
 * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
 * crushtool: add --reweight (Sage Weil)
 * librbd: complete pending operations before losing image (#10299 Jason 
   Dillaman)
 * librbd: fix read caching performance regression (#9854 Jason Dillaman)
 * librbd: gracefully handle deleted/renamed pools (#10270 Jason Dillaman)
 * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
 * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
 * osd: handle no-op write with snapshot (#10262 Sage Weil)
 * radosgw-admin: create subuser when creating user (#10103 Yehuda Sadeh)
 * rgw: change multipart uplaod id magic (#10271 Georgio Dimitrakakis, 
   Yehuda Sadeh)
 * rgw: don't overwrite bucket/object owner when setting ACLs (#10978 
   Yehuda Sadeh)
 * rgw: enable IPv6 for embedded civetweb (#10965 Yehuda Sadeh)
 * rgw: fix partial swift GET (#10553 Yehuda Sadeh)
 * rgw: fix quota disable (#9907 Dong Lei)
 * rgw: index swift keys appropriately (#10471 Hemant Burman, Yehuda Sadeh)
 * rgw: make setattrs update bucket index (#5595 Yehuda Sadeh)
 * rgw: pass civetweb configurables (#10907 Yehuda Sadeh)
 * rgw: remove swift user manifest (DLO) hash calculation (#9973 Yehuda 
   Sadeh)
 * rgw: return correct len for 0-len objects (#9877 Yehuda Sadeh)
 * rgw: S3 object copy content-type fix (#9478 Yehuda Sadeh)
 * rgw: send ETag on S3 object copy (#9479 Yehuda Sadeh)
 * rgw: send HTTP status reason explicitly in fastcgi (Yehuda Sadeh)
 * rgw: set ulimit -n from sysvinit (el6) init script (#9587 Sage Weil)
 * rgw: update swift subuser permission masks when authenticating (#9918 
   Yehuda Sadeh)
 * rgw: URL decode query params correctly (#10271 Georgio Dimitrakakis, 
   Yehuda Sadeh)
 * rgw: use attrs when reading object attrs (#10307 Yehuda Sadeh)
 * rgw: use \r\n for http headers (#9254 Benedikt Fraunhofer, Yehuda Sadeh)
 
 Getting Ceph
 
 
 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.80.9.tar.gz
 * For packages, see http://ceph.com/docs/master/install/get-packages
 * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

2015-03-11 Thread Malcolm Haak
Sorry about all the unrelated grep issues..

So I've rebuilt and reinstalled and it's still broken. 

On the working node, even with the new packages, everything works.
On the new broken node, I've added a mon and it works. But I still cannot start 
an OSD on the new node.

What else do you need from me? I'll get logs run any number of tests.

I've got data in this cluster already, and it's full so I need to expand it, 
I've already got hardware.

Thanks in advance for even having a look


-Original Message-
From: Samuel Just [mailto:sj...@redhat.com] 
Sent: Wednesday, 11 March 2015 1:41 AM
To: Malcolm Haak; jl...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues with fresh 0.93 OSD adding to existing cluster

Joao, it looks like map 2759 is causing trouble, how would he get the
full and incremental maps for that out of the mons?
-Sam

On Tue, 2015-03-10 at 14:12 +, Malcolm Haak wrote:
 Hi Samuel,
 
 The sha1? I'm going to admit ignorance as to what you are looking for. They 
 are all running the same release if that is what you are asking. 
 Same tarball built into rpms using rpmbuild on both nodes... 
 Only difference being that the other node has been upgraded and the problem 
 node is fresh.
 
 added the requested config here is the command line output
 
 microserver-1:/etc # /etc/init.d/ceph start osd.3
 === osd.3 === 
 Mounting xfs on microserver-1:/var/lib/ceph/osd/ceph-3
 2015-03-11 01:00:13.492279 7f05b2f72700  1 -- :/0 messenger.start
 2015-03-11 01:00:13.492823 7f05b2f72700  1 -- :/1002795 -- 
 192.168.0.10:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 
 0x7f05ac0290b0 con 0x7f05ac027c40
 2015-03-11 01:00:13.510814 7f05b07ef700  1 -- 192.168.0.250:0/1002795 learned 
 my addr 192.168.0.250:0/1002795
 2015-03-11 01:00:13.527653 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 1  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.527899 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 2  auth_reply(proto 1 0 (0) Success) v1  
 24+0+0 (3859410672 0 0) 0x7f05ae70 con 0x7f05ac027c40
 2015-03-11 01:00:13.527973 7f05abfff700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x7f05ac029730 
 con 0x7f05ac027c40
 2015-03-11 01:00:13.528124 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029a50 con 0x7f05ac027c40
 2015-03-11 01:00:13.528265 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 
 0x7f05ac029f20 con 0x7f05ac027c40
 2015-03-11 01:00:13.530359 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 3  mon_map magic: 0 v1  191+0+0 (1112175541 
 0 0) 0x7f05aab0 con 0x7f05ac027c40
 2015-03-11 01:00:13.530548 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 4  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.531114 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 5  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0002800 con 0x7f05ac027c40
 2015-03-11 01:00:13.531772 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.532186 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 7  osd_map(3277..3277 src has 2757..3277) v3 
  5366+0+0 (3110999244 0 0) 0x7f05a0001250 con 0x7f05ac027c40
 2015-03-11 01:00:13.532260 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 
 (3648139960 0 0) 0x7f05afb0 con 0x7f05ac027c40
 2015-03-11 01:00:13.556748 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: get_command_descriptions} v 
 0) v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.564968 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 9  mon_command_ack([{prefix: 
 get_command_descriptions}]=0  v0) v1  72+0+34995 (1092875540 0 
 1727986498) 0x7f05aa70 con 0x7f05ac027c40
 2015-03-11 01:00:13.770122 7f05b2f72700  1 -- 192.168.0.250:0/1002795 -- 
 192.168.0.10:6789/0 -- mon_command({prefix: osd crush create-or-move, 
 args: [host=microserver-1, root=default], id: 3, weight: 1.81} v 0) 
 v1 -- ?+0 0x7f05ac016ac0 con 0x7f05ac027c40
 2015-03-11 01:00:13.772299 7f05abfff700  1 -- 192.168.0.250:0/1002795 == 
 mon.0 192.168.0.10:6789/0 10  mon_command_ack([{prefix: osd crush 
 create-or-move, args: [host=microserver-1, root=default], id: 3, 
 weight: 1.81}]=0 create-or-move updated item name 'osd.3' weight 1.81 at 
 location {host=microserver-1,root=default} 

Re: [ceph-users] Add monitor unsuccesful

2015-03-11 Thread Jesus Chavez (jeschave)
Thanks Steffen I have followed everything not sure what is going on, the mon 
keyring and client admin are individual? Per mon host? Or do I need to copy 
from the first initial mon node?

Thanks again!


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255

CCIE - 44433

On Mar 11, 2015, at 6:28 PM, Steffen W Sørensen 
ste...@me.commailto:ste...@me.com wrote:


On 12/03/2015, at 00.55, Jesus Chavez (jeschave) 
jesch...@cisco.commailto:jesch...@cisco.com wrote:

can anybody tell me a good blog link that explain how to add monitor? I have 
tried manually and also with ceph-deploy without success =(
Dunno if these might help U:

http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start

/Steffen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can not list objects in large bucket

2015-03-11 Thread Sean Sullivan
I have a single radosgw user with 2 s3 keys and 1 swift key. I have created a 
few buckets and I can list all of the contents of bucket A and C but not B with 
either S3 (boto) or python-swiftclient. I am able to list the first 1000 
entries using radosgw-admin 'bucket list --bucket=bucketB' without any issues 
but this doesn't really help.

The odd thing is I can still upload and download objects in the bucket. I just 
can't list them. I tried setting the bucket canned_acl to private and public 
but I still can't list the objects inside.

I'm using ceph .87 (Giant) Here is some info about the cluster::
http://pastebin.com/LvQYnXem -- ceph.conf
http://pastebin.com/efBBPCwa -- ceph -s
http://pastebin.com/tF62WMU9 -- radosgw-admin bucket list
http://pastebin.com/CZ8TkyNG -- python list bucket objects script
http://pastebin.com/TUCyxhMD -- radosgw-admin bucket stats --bucketB
http://pastebin.com/uHbEtGHs -- rados -p .rgw.buckets ls | grep default.20283.2 
(bucketB marker)
http://pastebin.com/WYwfQndV -- Python Error when trying to list BucketB via 
boto

I have no idea why this could be happening outside of the acl. Has anyone seen 
this before? Any idea on how I can get access to this bucket again via 
s3/swift? Also is there a way to list the full list of a bucket via 
radosgw-admin and not the first 9000 lines / 1000 entries, or a way to page 
through them?

EDIT:: I just fixed it (I hope) but the fix doesn't make any sense:

radosgw-admin bucket unlink --uid=user --bucket=bucketB
radosgw-admin bucket link --uid=user --bucket=bucketB 
--bucket-id=default.20283.2

Now with swift or s3 (boto) I am able to list the bucket contents without issue 
^_^

Can someone elaborate on why this works and how it broken in the first place 
when ceph was health_ok the entire time? With 3 replicas how did this happen? 
Could this be a bug?  sorry for the rambling. I am confused and tired ;p



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Shadow files

2015-03-11 Thread Ben

Anyone got any info on this?

Is it safe to delete shadow files?

On 2015-03-11 10:03, Ben wrote:

We have a large number of shadow files in our cluster that aren't
being deleted automatically as data is deleted.

Is it safe to delete these files?
Is there something we need to be aware of when deleting them?
Is there a script that we can run that will delete these safely?

Is there something wrong with our cluster that it isn't deleting these
files when it should be?

We are using civetweb with radosgw, with tengine ssl proxy infront of 
it


Any advice please
Thanks

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com