Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
On Wed, Mar 23, 2016 at 01:22:45AM +0100, Loic Dachary wrote:
> On 23/03/2016 01:12, Chris Dunlop wrote:
>> On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote:
>>> On 23/03/2016 00:39, Chris Dunlop wrote:
 "The old OS'es" that were being supported up to v0.94.5 includes debian
 wheezy. It would be quite surprising and unexpected to drop support for an
 OS in the middle of a stable series.
>>>
>>> I'm unsure if wheezy is among the old OS'es. It predates my involvement in 
>>> the stable releases effort. I know for sure el6 and 12.04 are supported for 
>>> 0.94.x. 
>> 
>> From http://download.ceph.com/debian-hammer/pool/main/c/ceph/
>> 
>> ceph-common_0.94.1-1~bpo70+1_i386.deb  15-Dec-2015 15:32 
>>10217628
>> ceph-common_0.94.3-1~bpo70+1_amd64.deb 19-Oct-2015 18:54 
>> 9818964
>> ceph-common_0.94.4-1~bpo70+1_amd64.deb 26-Oct-2015 20:48 
>> 9868020
>> ceph-common_0.94.5-1~bpo70+1_amd64.deb 15-Dec-2015 15:32 
>> 9868188
>> 
>> That's all debian wheezy.
>> 
>> (Huh. I'd never noticed 0.94.1 was i386 only!)
>> 
> 
> Indeed. Were these packages created as a lucky side effect or because there 
> was a commitment at some point ? I'm curious to know the answer as well :-)

Who would know?  Sage?  (cc'ed)

Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] root and non-root user for ceph/ceph-deploy

2016-03-22 Thread yang
Hi, everyone,

In my ceph cluster, first I deploy my ceph using ceph-deploy with user root, I 
don't set up any thing after it's setup,
to my surprise, the cluster can auto-start after my host reboot, all thing is 
ok, mon is running and OSDs of device is mounted itself and also running 
properly. But I can NOT find any ceph/OSD info in /etc/fstab!

My question is, where ceph stores the cluster info? Why it can auto-start after 
machine reboot?

However, when I deploy with another user (not root), everything is changed. mon 
can not auto-start and OSD is not mounted and not running any more.

And, after I umount the OSD, re-deploy the cluster, the old OSD is still 
display in my new cluster.

My another question is, what's the difference between root and non-root user 
for ceph/ceph-deploy?

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

yang,
Thanks very much.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Dotslash Lu
Hello Gonçalo,

Thanks for your reminding. I was just setting up the cluster for test, so don't 
worry, I can just remove the pool. And I learnt that since the replication 
number and pool number are related to pg_num, I'll consider them carefully 
before deploying any data. 

> On Mar 23, 2016, at 6:58 AM, Goncalo Borges  
> wrote:
> 
> Hi Zhang...
> 
> If I can add some more info, the change of PGs is a heavy operation, and as 
> far as i know, you should NEVER decrease PGs. From the notes in pgcalc 
> (http://ceph.com/pgcalc/):
> 
> "It's also important to know that the PG count can be increased, but NEVER 
> decreased without destroying / recreating the pool. However, increasing the 
> PG Count of a pool is one of the most impactful events in a Ceph Cluster, and 
> should be avoided for production clusters if possible."
> 
> So, in your case, I would consider in adding more OSDs. 
> 
> Cheers
> Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread David Wang
Hi Zhang,
From the ceph health detail, I suggest NTP server should be calibrated.

Can you share crush map output?

2016-03-22 18:28 GMT+08:00 Zhang Qiang :

> Hi Reddy,
> It's over a thousand lines, I pasted it on gist:
> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4
>
> On Tue, 22 Mar 2016 at 18:15 M Ranga Swami Reddy 
> wrote:
>
>> Hi,
>> Can you please share the "ceph health detail" output?
>>
>> Thanks
>> Swami
>>
>> On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang 
>> wrote:
>> > Hi all,
>> >
>> > I have 20 OSDs and 1 pool, and, as recommended by the
>> > doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/),
>> I
>> > configured pg_num and pgp_num to 4096, size 2, min size 1.
>> >
>> > But ceph -s shows:
>> >
>> > HEALTH_WARN
>> > 534 pgs degraded
>> > 551 pgs stuck unclean
>> > 534 pgs undersized
>> > too many PGs per OSD (382 > max 300)
>> >
>> > Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work?  And
>> what
>> > does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has
>> 382
>> > PGs I would have had 7640 PGs.
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Periodic evicting & flushing

2016-03-22 Thread Christian Balzer

Hello,

On Tue, 22 Mar 2016 12:28:22 -0400 Maran wrote:

> Hey guys,
> 
> I'm trying to wrap my head about the Ceph Cache Tiering to discover if
> what I want is achievable.
> 
> My cluster exists of 6 OSD nodes with normal HDD and one cache tier of
> SSDs.
> 
One cache tier being what, one node? 
That's a SPOF and disaster waiting to happen.

Also the usual (so we're not comparing apples with oranges), as in what
types of SSDs, OS, Ceph versions, network, everything.

> What I would love is that Ceph flushes and evicts data as soon as a file
> hasn't been requested by a client in a certain timeframe, even if there
> is enough space to keep it there longer. The reason I would prefer this
> is that I have a feeling overall performance suffers if new writes are
> coming into the cache tier while at the same time flush and evicts are
> happening.
> 
You will want to read my recent thread titled 
"Cache tier operation clarifications"

where I asked for something along those lines.

The best thing you could do right now and which I'm planning to do if
flushing (evictions should be very light impact wise) turns out to be
detrimental performance wise is to lower the ratios at low utilization
times and raise them again for peak times. 
Again, read the thread above.

> It also seems that for some reason my cache node is not using the
> cluster network as much as I expected. Where all HDD nodes are using the
> cluster network to the fullest (multiple TBs) my SSD node only used 1GB
> on the cluster network. Is there anyway to diagnose this problem or is
> this intended behaviour? I expected the flushes to happen over the
> cluster network.
>
That is to be expected, as the cache tier is a client from the Ceph
perspective. 
 
Unfortunate, but AFAIK there are no plans to change this behavior.

> I appreciate any pointers you might have for me.
> 
You will also want to definitely read the recent thread titled 
"data corruption with hammer".

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Loic Dachary


On 23/03/2016 01:12, Chris Dunlop wrote:
> Hi Loïc,
> 
> On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote:
>> On 23/03/2016 00:39, Chris Dunlop wrote:
>>> "The old OS'es" that were being supported up to v0.94.5 includes debian
>>> wheezy. It would be quite surprising and unexpected to drop support for an
>>> OS in the middle of a stable series.
>>
>> I'm unsure if wheezy is among the old OS'es. It predates my involvement in 
>> the stable releases effort. I know for sure el6 and 12.04 are supported for 
>> 0.94.x. 
> 
>>From http://download.ceph.com/debian-hammer/pool/main/c/ceph/
> 
> ceph-common_0.94.1-1~bpo70+1_i386.deb  15-Dec-2015 15:32  
>   10217628
> ceph-common_0.94.3-1~bpo70+1_amd64.deb 19-Oct-2015 18:54  
>9818964
> ceph-common_0.94.4-1~bpo70+1_amd64.deb 26-Oct-2015 20:48  
>9868020
> ceph-common_0.94.5-1~bpo70+1_amd64.deb 15-Dec-2015 15:32  
>9868188
> 
> That's all debian wheezy.
> 
> (Huh. I'd never noticed 0.94.1 was i386 only!)
> 

Indeed. Were these packages created as a lucky side effect or because there was 
a commitment at some point ? I'm curious to know the answer as well :-)

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc,

On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote:
> On 23/03/2016 00:39, Chris Dunlop wrote:
>> "The old OS'es" that were being supported up to v0.94.5 includes debian
>> wheezy. It would be quite surprising and unexpected to drop support for an
>> OS in the middle of a stable series.
> 
> I'm unsure if wheezy is among the old OS'es. It predates my involvement in 
> the stable releases effort. I know for sure el6 and 12.04 are supported for 
> 0.94.x. 

From http://download.ceph.com/debian-hammer/pool/main/c/ceph/

ceph-common_0.94.1-1~bpo70+1_i386.deb  15-Dec-2015 15:32
10217628
ceph-common_0.94.3-1~bpo70+1_amd64.deb 19-Oct-2015 18:54
 9818964
ceph-common_0.94.4-1~bpo70+1_amd64.deb 26-Oct-2015 20:48
 9868020
ceph-common_0.94.5-1~bpo70+1_amd64.deb 15-Dec-2015 15:32
 9868188

That's all debian wheezy.

(Huh. I'd never noticed 0.94.1 was i386 only!)

Cheers,

Chris,
OnTheNet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Loic Dachary
Hi Chris,

On 23/03/2016 00:39, Chris Dunlop wrote:
> Hi Loïc,
> 
> On Wed, Mar 23, 2016 at 12:14:27AM +0100, Loic Dachary wrote:
>> On 22/03/2016 23:49, Chris Dunlop wrote:
>>> Hi Stable Release Team for v0.94,
>>>
>>> Let's try again... Any news on a release of v0.94.6 for debian wheezy 
>>> (bpo70)?
>>
>> I don't think publishing a debian wheezy backport for v0.94.6 is planned. 
>> Maybe it's a good opportunity to initiate a community effort ? Would you 
>> like to work with me on this ?
> 
> It's my understanding, from statements by both Sage and yourself, that
> existing OS'es would continue to be supported in the stable series, e.g.:
> 
>  On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote:
>  > I think you misread what Sage wrote : "The intention was to continue
>  > building stable releases (0.94.x) on the old list of supported platforms
>  > (which inclues 12.04 and el6)". In other words, the old OS'es are still
>  > supported. Their absence is a glitch in the release process that will be
>  > fixed.
> 
> "The old OS'es" that were being supported up to v0.94.5 includes debian
> wheezy. It would be quite surprising and unexpected to drop support for an
> OS in the middle of a stable series.

I'm unsure if wheezy is among the old OS'es. It predates my involvement in the 
stable releases effort. I know for sure el6 and 12.04 are supported for 0.94.x. 

> If that is indeed what's happening, and it's not just an oversight, I'd
> prefer to put my efforts into moving to a supported OS rather than keeping
> the older OS on life support.

That makes sense. Should you change your mind, I'll be around to help.

> Just to be clear, I understand it is quite a burden maintaining releases for
> old OSes, I'm only voicing mild surprise and a touch of regret: I'm very
> happy with the Ceph project!

I'm hopefull we'll be able to support more OS in the future, both with a 
lightweight release process and more community support. The Ceph releases 
should scale out, just as Ceph does ;-)

Cheers
> 
> Cheers,
> 
> Chris,
> OnTheNet
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc,

On Wed, Mar 23, 2016 at 12:14:27AM +0100, Loic Dachary wrote:
> On 22/03/2016 23:49, Chris Dunlop wrote:
>> Hi Stable Release Team for v0.94,
>> 
>> Let's try again... Any news on a release of v0.94.6 for debian wheezy 
>> (bpo70)?
> 
> I don't think publishing a debian wheezy backport for v0.94.6 is planned. 
> Maybe it's a good opportunity to initiate a community effort ? Would you like 
> to work with me on this ?

It's my understanding, from statements by both Sage and yourself, that
existing OS'es would continue to be supported in the stable series, e.g.:

 On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote:
 > I think you misread what Sage wrote : "The intention was to continue
 > building stable releases (0.94.x) on the old list of supported platforms
 > (which inclues 12.04 and el6)". In other words, the old OS'es are still
 > supported. Their absence is a glitch in the release process that will be
 > fixed.

"The old OS'es" that were being supported up to v0.94.5 includes debian
wheezy. It would be quite surprising and unexpected to drop support for an
OS in the middle of a stable series.

If that is indeed what's happening, and it's not just an oversight, I'd
prefer to put my efforts into moving to a supported OS rather than keeping
the older OS on life support.

Just to be clear, I understand it is quite a burden maintaining releases for
old OSes, I'm only voicing mild surprise and a touch of regret: I'm very
happy with the Ceph project!

Cheers,

Chris,
OnTheNet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Loic Dachary


On 22/03/2016 23:49, Chris Dunlop wrote:
> Hi Stable Release Team for v0.94,
> 
> Let's try again... Any news on a release of v0.94.6 for debian wheezy (bpo70)?

I don't think publishing a debian wheezy backport for v0.94.6 is planned. Maybe 
it's a good opportunity to initiate a community effort ? Would you like to work 
with me on this ?

> 
> Cheers,
> 
> Chris
> 
> On Thu, Mar 17, 2016 at 12:43:15PM +1100, Chris Dunlop wrote:
>> Hi Chen,
>>
>> On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote:
>>> It’s already there, in 
>>> http://download.ceph.com/debian-hammer/pool/main/c/ceph/.
>>
>> I can only see ceph*_0.94.6-1~bpo80+1_amd64.deb there. Debian wheezy would
>> be bpo70.
>>
>> Cheers,
>>
>> Chris
>>
>>> On 3/17/16, 7:20 AM, "Chris Dunlop"  wrote:
>>>
 Hi Stable Release Team for v0.94,

 On Thu, Mar 10, 2016 at 11:00:06AM +1100, Chris Dunlop wrote:
> On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote:
>> I think you misread what Sage wrote : "The intention was to
>> continue building stable releases (0.94.x) on the old list of
>> supported platforms (which inclues 12.04 and el6)". In other
>> words, the old OS'es are still supported. Their absence is a
>> glitch in the release process that will be fixed.
>
> Any news on a release of v0.94.6 for debian wheezy?

 Any news on a release of v0.94.6 for debian wheezy?

 Cheers,

 Chris
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Goncalo Borges
Hi Zhang...

If I can add some more info, the change of PGs is a heavy operation, and as far 
as i know, you should NEVER decrease PGs. From the notes in pgcalc 
(http://ceph.com/pgcalc/):

"It's also important to know that the PG count can be increased, but NEVER 
decreased without destroying / recreating the pool. However, increasing the PG 
Count of a pool is one of the most impactful events in a Ceph Cluster, and 
should be avoided for production clusters if possible."

So, in your case, I would consider in adding more OSDs.

Cheers
Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Stable Release Team for v0.94,

Let's try again... Any news on a release of v0.94.6 for debian wheezy (bpo70)?

Cheers,

Chris

On Thu, Mar 17, 2016 at 12:43:15PM +1100, Chris Dunlop wrote:
> Hi Chen,
> 
> On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote:
>> It’s already there, in 
>> http://download.ceph.com/debian-hammer/pool/main/c/ceph/.
> 
> I can only see ceph*_0.94.6-1~bpo80+1_amd64.deb there. Debian wheezy would
> be bpo70.
> 
> Cheers,
> 
> Chris
> 
>> On 3/17/16, 7:20 AM, "Chris Dunlop"  wrote:
>> 
>>> Hi Stable Release Team for v0.94,
>>>
>>> On Thu, Mar 10, 2016 at 11:00:06AM +1100, Chris Dunlop wrote:
 On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote:
> I think you misread what Sage wrote : "The intention was to
> continue building stable releases (0.94.x) on the old list of
> supported platforms (which inclues 12.04 and el6)". In other
> words, the old OS'es are still supported. Their absence is a
> glitch in the release process that will be fixed.
 
 Any news on a release of v0.94.6 for debian wheezy?
>>>
>>> Any news on a release of v0.94.6 for debian wheezy?
>>>
>>> Cheers,
>>>
>>> Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infernalis .rgw.buckets.index objects becoming corrupted in on RHEL 7.2 during recovery

2016-03-22 Thread Brandon Morris, PMP
I was able to get this back to HEALTH_OK by doing the following:

1. Allow ceph-objectstore-tool to run over a weekend attempting to export
the PG.  Looking at timestamps it took approximately 6 hours to complete
successfully
2. Import the PG into unused PG and start it up+out
3. Allow the cluster to detect the imported PG and backfill the exported
object / keys that were stuck and unfound
4. start OSD 388 and allow the cluster to query it for any other recovery
information.  OSD 388 no longer experienced the suicide timeout after
backfill was complete.
5. Wait for the cluster to finish recovering the PG to 3 new OSD's


I am still not sure what caused it to get into this state or if there is a
better way to resolve it.  Like I stated before, this is the second time
this has happened while the .rgw.buckets.index pool is recovering.

Has anyone else run into this or is it just our specific deployment recipe?

Thanks,

Brandon


On Thu, Mar 17, 2016 at 4:37 PM, Brandon Morris, PMP <
brandon.morris@gmail.com> wrote:

> List,
>
> We have stood up a Infernalis 9.2.0 cluster on RHEL 7.2.  We are using the
> radosGW to store potentially billions of small to medium sized objects (64k
> - 1MB).
>
> We have run into an issue twice thus far where .rgw.bucket.index placement
> groups will become corrupt during recovery after a drive failure.  This
> corruption will cause the OSD to crash with a  suicide_timeout error when
> trying to backfill the corrupted index file to a different OSD.  Exporting
> the corrupted placement group using the ceph-objectstore-tool will also
> hang. When this first came up, we were able to simply rebuild the .rgw
> pools and start from scratch.  There were no underlying XFS issues.
>
> Before we put this cluster into full operation, we are looking to
> determine what caused this and if there is a hard limit to the number of
> objects in a bucket.  We are currently putting all objects into 1 bucket,
> but should probably divide these up.
>
> I have uploaded the OSD and ceph-objectstore tool debug files here:
> https://github.com/garignack/ceph_misc/raw/master/ceph-osd.388.zip   Any
> help would be greatly appreciated.
>
> I'm not a ceph expert by any means, but here is where I've gotten to thus
> far. (And may be way off base)
>
> The PG in question only has 1 object - .dir.default.808642.1.163
> | [root@node13 ~]# ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-388/ --journal-path
> /var/lib/ceph/osd/ceph-388/journal --pgid 24.197 --op list
> | SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00
> 00 00 20 00 00 00 00 00 83 1c 00 00 00 00 00 00 00 00 00 00 00 00
> | SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 18 00 00
> 00 00 20 00 00 00 00 00 83 1c 00 00 00 00 00 00 00 00 00 00 00 00
> |
> ["24.197",{"oid":".dir.default.808642.1.163","key":"","snapid":-2,"hash":491874711,"max":0,"pool":24,"namespace":"","max":0}]
>
> Here are the final lines of the ceph-objectstore-tool before it hangs:
>
> | e140768: 570 osds: 558 up, 542 in
> | Read 24/1d516997/.dir.default.808642.1.163/head
> | size=0
> | object_info:
> 24/1d516997/.dir.default.808642.1.163/head(139155'2197754
> client.1137891.0:20837319 dirty|omap|data_digest s 0 uv 2197754 dd )
> | attrs size 2
>
> This leads me to suspect something between line 564 and line 576 in the
> tool is hanging.
> https://github.com/ceph/ceph/blob/master/src/tools/ceph_objectstore_tool.cc#L564.
> Current suspect is the objectstore read command.
>
> | ret = store->read(cid, obj, offset, len, rawdatabl);
>
> Looking through the OSD debug logs, I also see a strange
> size(18446744073709551615) on the recovery operation for the
> 24/1d516997/.dir.default.808642.1.163/head object
>
> | 2016-03-17 12:12:29.753446 7f972ca3d700 10 osd.388 154849 dequeue_op
> 0x7f97580d3500 prio 2 cost 1049576 latency 0.000185 MOSDPGPull(24.197
> 154849 [PullOp(24/1d516997/.dir.default.808642.1.163/head, recovery_info:
> ObjectRecoveryInfo(24/1d516997/.dir.default.808642.1.163/head@139155'2197754,
> size: 18446744073709551615, copy_subset: [0~18446744073709551615],
> clone_subset: {}), recovery_progress: ObjectRecoveryProgress(first,
> data_recovered_to:0, data_complete:false, omap_recovered_to:,
> omap_complete:false))]) v2 pg pg[24.197( v 139155'2197754
> (139111'2194700,139155'2197754] local-les=154480 n=1 ec=128853 les/c/f
> 154268/138679/0 154649/154650/154650) [179,443,517]/[306,441] r=-1
> lpr=154846 pi=138674-154649/37 crt=139155'2197752 lcod 0'0 inactive NOTIFY
> NIBBLEWISE]
>
> this error eventually causes the thread to hang and eventually trigger the
> suicide timeout
> | 2016-03-17 12:12:45.541528 7f973524e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f972ca3d700' had timed out after 15
> | 2016-03-17 12:12:45.541533 7f973524e700 20 heartbeat_map is_healthy
> = NOT HEALTHY, total workers: 29, number of unhealthy: 1
> | 2016-03-17 12:12:45.541534 

Re: [ceph-users] CephFS Advice

2016-03-22 Thread Gregory Farnum
On Tue, Mar 22, 2016 at 9:37 AM, John Spray  wrote:
> On Tue, Mar 22, 2016 at 2:37 PM, Ben Archuleta  wrote:
>> Hello All,
>>
>> I have experience using Lustre but I am new to the Ceph world, I have some 
>> questions to the Ceph users out there.
>>
>> I am thinking about deploying a Ceph storage cluster that lives in multiple 
>> location "Building A" and "Building B”, this cluster will be comprised of 
>> two dell servers with 10TB (5 * 2TB Disks) of JBOD storage and a MDS server 
>> over a 10GB network. We will be using CephFS to serve multiple operating 
>> systems (Windows, Linux, OS X).
>
> A two node Ceph cluster is rarely wise.  If one of your servers goes
> down, you're going to be down to a single copy of the data (unless
> you've got a whopping 4 replicas to begin with), and so you'd be ill
> advised to write anything to the cluster while it's in a degraded
> state.  If you've only got one MDS server, your system is going to
> have a single point of failure anyway.
>
> You should probably look again at what levels of resilience and
> availability you're trying to achieve here and think about whether
> what you really want might be two NFS servers backing up to each
> other.
>
>> My main question is how well does CephFS work in a multi-operating system 
>> environment and how well does it support NFS/CIFS?
>
> Exporting CephFS over NFS works (either kernel NFS or nfs-ganesha),
> beyond that CephFS doesn't care too much.  The Samba integration is
> less advanced and less tested.

Well, Samba support is probably less advanced, but all of those
combinations get run in our nightly tests and do pretty well.

>  Bug reports are welcome if you try it
> out.

*thumbs up*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Zhang Qiang
I got it, the pg_num suggested is the total, I need to divide it by the
number of replications.
Thanks Oliver, your answer is very thorough and helpful!


On 23 March 2016 at 02:19, Oliver Dzombic  wrote:

> Hi Zhang,
>
> yeah i saw your answer already.
>
> At very first, you should make sure that there is no clock skew.
> This can cause some sideeffects.
>
> 
>
> According to
>
> http://docs.ceph.com/docs/master/rados/operations/placement-groups/
>
> you have to:
>
> (OSDs * 100)
> Total PGs =  
>   pool size
>
>
> Means:
>
> 20 OSD's of you * 100 = 2000
>
> Poolsize is:
>
> Where pool size is either the number of replicas for replicated pools or
> the K+M sum for erasure coded pools (as returned by ceph osd
> erasure-code-profile get).
>
> --
>
> So lets say, you have 2 replications, you should have 1000 PG's.
>
> If you have 3 replications, you should have 2000 / 3 = 666 PG's.
>
> But you configured 4096 PGs. Thats simply far too much.
>
> Reduce it. Or, if you can not, get more OSD's into this.
>
> I dont know any other way.
>
> Good luck !
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 22.03.2016 um 19:02 schrieb Zhang Qiang:
> > Hi Oliver,
> >
> > Thanks for your reply to my question on Ceph mailing list. I somehow
> > wasn't able to receive your reply in my mailbox, but I saw your reply in
> > the archive, so I have to mail you personally.
> >
> > I have pasted the whole ceph health output on gist:
> > https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4
> >
> > Hope this will help. Thank you!
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] recorded data digest != on disk

2016-03-22 Thread Gregory Farnum
On Tue, Mar 22, 2016 at 1:19 AM, Max A. Krasilnikov  wrote:
> Hello!
>
> I have 3-node cluster running ceph version 0.94.6 
> (e832001feaf8c176593e0325c8298e3f16dfb403)
> on Ubuntu 14.04. When scrubbing I get error:
>
> -9> 2016-03-21 17:36:09.047029 7f253a4f6700  5 -- op tracker -- seq: 
> 48045, time: 2016-03-21 17:36:09.046984, event: all_read, op: 
> osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] 
> snapc=0=[])
> -8> 2016-03-21 17:36:09.047035 7f253a4f6700  5 -- op tracker -- seq: 
> 48045, time: 0.00, event: dispatched, op: osd_sub_op(unknown.0.0:0 5.ca 
> 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
> -7> 2016-03-21 17:36:09.047066 7f254411b700  5 -- op tracker -- seq: 
> 48045, time: 2016-03-21 17:36:09.047066, event: reached_pg, op: 
> osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] 
> snapc=0=[])
> -6> 2016-03-21 17:36:09.047086 7f254411b700  5 -- op tracker -- seq: 
> 48045, time: 2016-03-21 17:36:09.047086, event: started, op: 
> osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] 
> snapc=0=[])
> -5> 2016-03-21 17:36:09.047127 7f254411b700  5 -- op tracker -- seq: 
> 48045, time: 2016-03-21 17:36:09.047127, event: done, op: 
> osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] 
> snapc=0=[])
> -4> 2016-03-21 17:36:09.047173 7f253f912700  2 osd.13 pg_epoch: 23286 
> pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
> ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
> crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
> active+clean+scrubbing+deep+repair] scrub_compare_maps   osd.13 has 10 items
> -3> 2016-03-21 17:36:09.047377 7f253f912700  2 osd.13 pg_epoch: 23286 
> pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
> ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
> crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
> active+clean+scrubbing+deep+repair] scrub_compare_maps replica 21 has 10 items
> -2> 2016-03-21 17:36:09.047983 7f253f912700  2 osd.13 pg_epoch: 23286 
> pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
> ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
> crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
> active+clean+scrubbing+deep+repair] 5.ca recorded data digest 0xb284fef9 != 
> on disk 0x43d61c5d on 6134ccca/rb
> d_data.86280c78aaf7da.000e0bb5/17//5
>
> -1> 2016-03-21 17:36:09.048201 7f253f912700 -1 log_channel(cluster) log 
> [ERR] : 5.ca recorded data digest 0xb284fef9 != on disk 0x43d61c5d on 
> 6134ccca/rbd_data.86280c78aaf7da.000e0bb5/17//5
>  0> 2016-03-21 17:36:09.050672 7f253f912700 -1 osd/osd_types.cc: In 
> function 'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread 
> 7f253f912700 time 2016-03-21 17:36:09.048341
> osd/osd_types.cc: 4103: FAILED assert(clone_size.count(clone))

This is the part causing crashes, not the data digest. Searching for
that error led me to http://tracker.ceph.com/issues/12954
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Need help for PG problem

2016-03-22 Thread Oliver Dzombic
Hi Zhang,

yeah i saw your answer already.

At very first, you should make sure that there is no clock skew.
This can cause some sideeffects.



According to

http://docs.ceph.com/docs/master/rados/operations/placement-groups/

you have to:

(OSDs * 100)
Total PGs =  
  pool size


Means:

20 OSD's of you * 100 = 2000

Poolsize is:

Where pool size is either the number of replicas for replicated pools or
the K+M sum for erasure coded pools (as returned by ceph osd
erasure-code-profile get).

--

So lets say, you have 2 replications, you should have 1000 PG's.

If you have 3 replications, you should have 2000 / 3 = 666 PG's.

But you configured 4096 PGs. Thats simply far too much.

Reduce it. Or, if you can not, get more OSD's into this.

I dont know any other way.

Good luck !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 22.03.2016 um 19:02 schrieb Zhang Qiang:
> Hi Oliver, 
> 
> Thanks for your reply to my question on Ceph mailing list. I somehow
> wasn't able to receive your reply in my mailbox, but I saw your reply in
> the archive, so I have to mail you personally.
> 
> I have pasted the whole ceph health output on gist:
> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4 
> 
> Hope this will help. Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Teuthology installation issue CentOS 6.5 (Python 2.6)

2016-03-22 Thread Mick McCarthy
Hello All,

I’m experiencing some issues installing Teuthology on CentOS 6.5.
I’ve tried installing it in a number of ways:

  *   Wishing a python virtual environment
  *   Using "pip install teuthology” directly

The installation fails in both cases.

a) In a python virtual environment (using pip install –r requirements.txt), I 
encounter the following error which causes the installation to fail:

Installed /home/vagrant/teuthology
Processing dependencies for teuthology==0.1.0
error: Installed distribution setuptools 0.9.8 conflicts with requirement 
setuptools>=11.3

I have tried installing both mentioned versions of setuptools, but the error 
persists in both cases.

b) Using pip install teuthology:

Dependencies are pulled down and installed, but installation fails with the 
following stacktrace:

Running setup.py install for gevent
Complete output from command /usr/bin/python -c "import setuptools, 
tokenize;__file__='/tmp/pip-build-eqBfwh/gevent/setup.py';exec(compile(getattr(tokenize,
 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" 
install --record /tmp/pip-zTkXgi-record/install-record.txt 
--single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.6
creating build/lib.linux-x86_64-2.6/gevent
copying gevent/thread.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/queue.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/timeout.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/pywsgi.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/socket.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/__init__.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/ssl.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/win32util.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/server.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/baseserver.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/local.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/http.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/coros.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/select.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/backdoor.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/dns.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/event.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/wsgi.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/rawgreenlet.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/httplib.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/pool.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/util.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/sslold.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/greenlet.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/hub.py -> build/lib.linux-x86_64-2.6/gevent
copying gevent/monkey.py -> build/lib.linux-x86_64-2.6/gevent
running build_ext
building 'gevent.core' extension
creating build/temp.linux-x86_64-2.6
creating build/temp.linux-x86_64-2.6/gevent
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv 
-DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE 
-fPIC -fwrapv -fPIC -I/usr/include/python2.6 -c gevent/core.c -o 
build/temp.linux-x86_64-2.6/gevent/core.o
In file included from gevent/core.c:225:
gevent/libevent.h:9:19: error: event.h: No such file or directory
gevent/libevent.h:38:20: error: evhttp.h: No such file or directory
gevent/libevent.h:39:19: error: evdns.h: No such file or directory
gevent/core.c:361: error: field ‘ev’ has incomplete type
gevent/core.c:741: warning: parameter names (without types) in function 
declaration
gevent/core.c: In function ‘__pyx_f_6gevent_4core___event_handler’:
gevent/core.c:1619: warning: implicit declaration of function 
‘event_pending’
gevent/core.c:1619: error: ‘EV_READ’ undeclared (first use in this function)
gevent/core.c:1619: error: (Each undeclared identifier is reported only once
gevent/core.c:1619: error: for each function it appears in.)
gevent/core.c:1619: error: ‘EV_WRITE’ undeclared (first use in this 
function)
gevent/core.c:1619: error: ‘EV_SIGNAL’ undeclared (first use in this 
function)
gevent/core.c:1619: error: ‘EV_TIMEOUT’ undeclared (first use in this 
function)
gevent/core.c: In function ‘__pyx_pf_6gevent_4core_5event___init__’:
gevent/core.c:1827: warning: implicit declaration of function ‘evtimer_set’
gevent/core.c:1839: warning: implicit declaration of function ‘event_set’
gevent/core.c: In function 

Re: [ceph-users] About the NFS on RGW

2016-03-22 Thread Ilya Dryomov
On Tue, Mar 22, 2016 at 1:12 PM, Xusangdi  wrote:
> Hi Matt & Cephers,
>
> I am looking for advise on setting up a file system based on Ceph. As CephFS 
> is not yet productive ready(or I missed some breakthroughs?), the new NFS on 
> RadosGW should be a promising alternative, especially for large files, which 
> is what we are most interested in. However, after searching around the Ceph 
> documentation (http://docs.ceph.com/docs/master/) and recent community mails, 
> I cannot find much information about it. Could you please provide some 
> introduction about the new NFS, and (if possible) a raw way to try it? Thank 
> you!

Note that CephFS is declared "stable, at last" in Jewel.  Not all
functionality falls under that heading (e.g. no snapshots, for now),
but if you want a file system based on Ceph, you really shouldn't be
looking at NFS-on-RGW.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Advice

2016-03-22 Thread John Spray
On Tue, Mar 22, 2016 at 2:37 PM, Ben Archuleta  wrote:
> Hello All,
>
> I have experience using Lustre but I am new to the Ceph world, I have some 
> questions to the Ceph users out there.
>
> I am thinking about deploying a Ceph storage cluster that lives in multiple 
> location "Building A" and "Building B”, this cluster will be comprised of two 
> dell servers with 10TB (5 * 2TB Disks) of JBOD storage and a MDS server over 
> a 10GB network. We will be using CephFS to serve multiple operating systems 
> (Windows, Linux, OS X).

A two node Ceph cluster is rarely wise.  If one of your servers goes
down, you're going to be down to a single copy of the data (unless
you've got a whopping 4 replicas to begin with), and so you'd be ill
advised to write anything to the cluster while it's in a degraded
state.  If you've only got one MDS server, your system is going to
have a single point of failure anyway.

You should probably look again at what levels of resilience and
availability you're trying to achieve here and think about whether
what you really want might be two NFS servers backing up to each
other.

> My main question is how well does CephFS work in a multi-operating system 
> environment and how well does it support NFS/CIFS?

Exporting CephFS over NFS works (either kernel NFS or nfs-ganesha),
beyond that CephFS doesn't care too much.  The Samba integration is
less advanced and less tested.  Bug reports are welcome if you try it
out.

> What are the chances of data corruption.

There's no simple answer to a question like that.  It's highly
unlikely to eat your data on a properly configured cluster.

> Also on average how well does CephFS handle variable size files ranging from 
> really small to really large?

Large files just get striped into smaller objects (4MB by default).
Small files have a higher metadata overhead per data, as in any
system.

Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Ilya Dryomov
On Tue, Mar 22, 2016 at 4:48 PM, Jason Dillaman  wrote:
>> Hi Jason,
>>
>> Le 22/03/2016 14:12, Jason Dillaman a écrit :
>> >
>> > We actually recommend that OpenStack be configured to use writeback cache
>> > [1].  If the guest OS is properly issuing flush requests, the cache will
>> > still provide crash-consistency.  By default, the cache will automatically
>> > start up in writethrough mode (when configured for writeback) until the
>> > first OS flush is received.
>> >
>>
>> Phew, that was the good reasoning then, thank you for your confirmation. :)
>>
>> >> I interpret native as kernel-managed I/O, and as the RBD through librbd
>> >> isn't exposed as a block device on the hypervisor, I configured threads
>> >> I/O for all our guest VMs.
>> >
>> > While I have nothing to empirically back up the following statement, I
>> > would actually recommend "native".  When set to "threads", QEMU will use a
>> > dispatch thread to invoke librbd IO operations instead of passing the IO
>> > request "directly" from the guest OS.  librbd itself already has its own
>> > IO dispatch thread with is enabled by default (via the
>> > rbd_non_blocking_aio config option), so adding an extra IO dispatching
>> > layer will just add additional latency / thread context switching.
>> >
>>
>> Well, if only that would be possible...
>> Here's the error message from libvirt when starting a VM with
>> native+writeback:
>>
>> """
>> native I/O needs either no disk cache or directsync cache mode, QEMU
>> will fallback to aio=threads
>> """
>>
>
> Learn something new everyday: looking at QEMU's internals, that flag actually 
> only makes a difference for local IO backends (files and devices).  
> Therefore, no need to set it for librbd volumes.

And libvirt's error message is misleading.  What they are after is
O_DIRECT modes, i.e. cache=none or cache=directsync.  cache=none != "no
disk cache"...

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Jason Dillaman
> Hi Jason,
> 
> Le 22/03/2016 14:12, Jason Dillaman a écrit :
> >
> > We actually recommend that OpenStack be configured to use writeback cache
> > [1].  If the guest OS is properly issuing flush requests, the cache will
> > still provide crash-consistency.  By default, the cache will automatically
> > start up in writethrough mode (when configured for writeback) until the
> > first OS flush is received.
> >
> 
> Phew, that was the good reasoning then, thank you for your confirmation. :)
> 
> >> I interpret native as kernel-managed I/O, and as the RBD through librbd
> >> isn't exposed as a block device on the hypervisor, I configured threads
> >> I/O for all our guest VMs.
> >
> > While I have nothing to empirically back up the following statement, I
> > would actually recommend "native".  When set to "threads", QEMU will use a
> > dispatch thread to invoke librbd IO operations instead of passing the IO
> > request "directly" from the guest OS.  librbd itself already has its own
> > IO dispatch thread with is enabled by default (via the
> > rbd_non_blocking_aio config option), so adding an extra IO dispatching
> > layer will just add additional latency / thread context switching.
> >
> 
> Well, if only that would be possible...
> Here's the error message from libvirt when starting a VM with
> native+writeback:
> 
> """
> native I/O needs either no disk cache or directsync cache mode, QEMU
> will fallback to aio=threads
> """
> 

Learn something new everyday: looking at QEMU's internals, that flag actually 
only makes a difference for local IO backends (files and devices).  Therefore, 
no need to set it for librbd volumes.

> >>> librbd has a setting called 'rbd_op_threads' which seems to be related to
> >>> AIO.
> >>> When does this kick in?
> >
> > This is related to the librbd IO dispatch thread pool.  Keep it at the
> > default value of "1" as higher settings will prevent IO flushes from
> > operating correctly.
> >
> >>>
> >>> Yes, a lot of questions where the internet gives a lot of answers.
> >>>
> >>> Some feedback would be nice!
> >>>
> >>> Thanks,
> >>>
> >>> Wido
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> > [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-nova
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Loris Cuoghi

Hi Jason,

Le 22/03/2016 14:12, Jason Dillaman a écrit :


We actually recommend that OpenStack be configured to use writeback cache [1].  
If the guest OS is properly issuing flush requests, the cache will still 
provide crash-consistency.  By default, the cache will automatically start up 
in writethrough mode (when configured for writeback) until the first OS flush 
is received.



Phew, that was the good reasoning then, thank you for your confirmation. :)


I interpret native as kernel-managed I/O, and as the RBD through librbd
isn't exposed as a block device on the hypervisor, I configured threads
I/O for all our guest VMs.


While I have nothing to empirically back up the following statement, I would actually recommend 
"native".  When set to "threads", QEMU will use a dispatch thread to invoke librbd IO 
operations instead of passing the IO request "directly" from the guest OS.  librbd itself already 
has its own IO dispatch thread with is enabled by default (via the rbd_non_blocking_aio config option), so 
adding an extra IO dispatching layer will just add additional latency / thread context switching.



Well, if only that would be possible...
Here's the error message from libvirt when starting a VM with 
native+writeback:


"""
native I/O needs either no disk cache or directsync cache mode, QEMU 
will fallback to aio=threads

"""


librbd has a setting called 'rbd_op_threads' which seems to be related to
AIO.
When does this kick in?


This is related to the librbd IO dispatch thread pool.  Keep it at the default value of 
"1" as higher settings will prevent IO flushes from operating correctly.



Yes, a lot of questions where the internet gives a lot of answers.

Some feedback would be nice!

Thanks,

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-nova



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Advice

2016-03-22 Thread Ben Archuleta
Hello All,

I have experience using Lustre but I am new to the Ceph world, I have some 
questions to the Ceph users out there. 

I am thinking about deploying a Ceph storage cluster that lives in multiple 
location "Building A" and "Building B”, this cluster will be comprised of two 
dell servers with 10TB (5 * 2TB Disks) of JBOD storage and a MDS server over a 
10GB network. We will be using CephFS to serve multiple operating systems 
(Windows, Linux, OS X). 

My main question is how well does CephFS work in a multi-operating system 
environment and how well does it support NFS/CIFS? What are the chances of data 
corruption. Also on average how well does CephFS handle variable size files 
ranging from really small to really large?


Regards,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Advice

2016-03-22 Thread Ben Archuleta
Hello All,

I have experience using Lustre but I am new to the Ceph world, I have some 
questions to the Ceph users out there. 

I am thinking about deploying a Ceph storage cluster that lives in multiple 
location "Building A" and "Building B”, this cluster will be comprised of two 
dell servers with 10TB (5 * 2TB Disks) of JBOD storage and a MDS server over a 
10GB network. We will be using CephFS to serve multiple operating systems 
(Windows, Linux, OS X). 

My main question is how well does CephFS work in a multi-operating system 
environment and how well does it support NFS/CIFS? What are the chances of data 
corruption. Also on average how well does CephFS handle variable size files 
ranging from really small to really large?


Regards,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] About the NFS on RGW

2016-03-22 Thread Matt Benjamin
Hi Xusangdi,

NFS on RGW is not intended as an alternative to CephFS.  The basic idea is to 
expose the S3 namespace using Amazon's prefix+delimiter convention (delimiter 
currently limited to '/').  We use opens for atomicity, which implies NFSv4 (or 
4.1).  In addition to limitations by design, there are some limitations in 
Jewel.  For example, clients should use (or emulate) sync mount behavior.  
Also, I/O is proxied--that restriction should be lifted in future releases.  
I'll post here when we have some usage documentation ready.

Matt

- Original Message -
> From: "Xusangdi" 
> To: mbenja...@redhat.com, ceph-us...@ceph.com
> Cc: ceph-de...@vger.kernel.org
> Sent: Tuesday, March 22, 2016 8:12:41 AM
> Subject: About the NFS on RGW
> 
> Hi Matt & Cephers,
> 
> I am looking for advise on setting up a file system based on Ceph. As CephFS
> is not yet productive ready(or I missed some breakthroughs?), the new NFS on
> RadosGW should be a promising alternative, especially for large files, which
> is what we are most interested in. However, after searching around the Ceph
> documentation (http://docs.ceph.com/docs/master/) and recent community
> mails, I cannot find much information about it. Could you please provide
> some introduction about the new NFS, and (if possible) a raw way to try it?
> Thank you!
> 
> Regards,
> ---Sandy
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C,
> which is
> intended only for the person or entity whose address is listed above. Any use
> of the
> information contained herein in any way (including, but not limited to, total
> or partial
> disclosure, reproduction, or dissemination) by persons other than the
> intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender
> by phone or email immediately and delete it!
> 

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-707-0660
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Jason Dillaman
> > I've been looking on the internet regarding two settings which might
> > influence
> > performance with librbd.
> >
> > When attaching a disk with Qemu you can set a few things:
> > - cache
> > - aio
> >
> > The default for libvirt (in both CloudStack and OpenStack) for 'cache' is
> > 'none'. Is that still the recommend value combined with librbd
> > (write)cache?
> >
> 
> We've been using "writeback" since end of last year, looking for an
> explicit writeback policy taking advantage of the librbd cache, but we
> haven't got any problem with "none" before that.
> 

We actually recommend that OpenStack be configured to use writeback cache [1].  
If the guest OS is properly issuing flush requests, the cache will still 
provide crash-consistency.  By default, the cache will automatically start up 
in writethrough mode (when configured for writeback) until the first OS flush 
is received.

> > In libvirt you can set 'io' to:
> > - native
> > - threads
> >
> > This translates to the 'aio' flags to Qemu. What is recommended here? I
> > found:
> > - io=native for block device based VMs
> > - io=threads for file-based VMs
> >
> > This seems to suggest that 'native' should be used for librbd. Is that
> > still
> > correct?
> >
> 
> I interpret native as kernel-managed I/O, and as the RBD through librbd
> isn't exposed as a block device on the hypervisor, I configured threads
> I/O for all our guest VMs.

While I have nothing to empirically back up the following statement, I would 
actually recommend "native".  When set to "threads", QEMU will use a dispatch 
thread to invoke librbd IO operations instead of passing the IO request 
"directly" from the guest OS.  librbd itself already has its own IO dispatch 
thread with is enabled by default (via the rbd_non_blocking_aio config option), 
so adding an extra IO dispatching layer will just add additional latency / 
thread context switching.

> > librbd has a setting called 'rbd_op_threads' which seems to be related to
> > AIO.
> > When does this kick in?

This is related to the librbd IO dispatch thread pool.  Keep it at the default 
value of "1" as higher settings will prevent IO flushes from operating 
correctly.

> >
> > Yes, a lot of questions where the internet gives a lot of answers.
> >
> > Some feedback would be nice!
> >
> > Thanks,
> >
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#configuring-nova


-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Loris Cuoghi

Hi Wido,

Le 22/03/2016 13:52, Wido den Hollander a écrit :

Hi,

I've been looking on the internet regarding two settings which might influence
performance with librbd.

When attaching a disk with Qemu you can set a few things:
- cache
- aio

The default for libvirt (in both CloudStack and OpenStack) for 'cache' is
'none'. Is that still the recommend value combined with librbd (write)cache?



We've been using "writeback" since end of last year, looking for an 
explicit writeback policy taking advantage of the librbd cache, but we 
haven't got any problem with "none" before that.



In libvirt you can set 'io' to:
- native
- threads

This translates to the 'aio' flags to Qemu. What is recommended here? I found:
- io=native for block device based VMs
- io=threads for file-based VMs

This seems to suggest that 'native' should be used for librbd. Is that still
correct?



I interpret native as kernel-managed I/O, and as the RBD through librbd 
isn't exposed as a block device on the hypervisor, I configured threads 
I/O for all our guest VMs.



librbd has a setting called 'rbd_op_threads' which seems to be related to AIO.
When does this kick in?

Yes, a lot of questions where the internet gives a lot of answers.

Some feedback would be nice!

Thanks,

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Qemu+RBD recommended cache mode and AIO settings

2016-03-22 Thread Wido den Hollander
Hi,

I've been looking on the internet regarding two settings which might influence
performance with librbd.

When attaching a disk with Qemu you can set a few things:
- cache
- aio

The default for libvirt (in both CloudStack and OpenStack) for 'cache' is
'none'. Is that still the recommend value combined with librbd (write)cache?

In libvirt you can set 'io' to:
- native
- threads

This translates to the 'aio' flags to Qemu. What is recommended here? I found:
- io=native for block device based VMs
- io=threads for file-based VMs

This seems to suggest that 'native' should be used for librbd. Is that still
correct?

librbd has a setting called 'rbd_op_threads' which seems to be related to AIO.
When does this kick in?

Yes, a lot of questions where the internet gives a lot of answers.

Some feedback would be nice!

Thanks,

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] About the NFS on RGW

2016-03-22 Thread Xusangdi
Hi Matt & Cephers,

I am looking for advise on setting up a file system based on Ceph. As CephFS is 
not yet productive ready(or I missed some breakthroughs?), the new NFS on 
RadosGW should be a promising alternative, especially for large files, which is 
what we are most interested in. However, after searching around the Ceph 
documentation (http://docs.ceph.com/docs/master/) and recent community mails, I 
cannot find much information about it. Could you please provide some 
introduction about the new NFS, and (if possible) a raw way to try it? Thank 
you!

Regards,
---Sandy
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fresh install - all OSDs remain down and out

2016-03-22 Thread Markus Goldberg

Hi desmond,
this seems to be much to do for 90 OSDs. And possible a few mistakes in 
typing.

Every change of disk needs extra editing too.
This weighting was automatically done in former versions.
Do you know why and where this changed or was i faulty at some point?

Markus
Am 21.03.2016 um 13:28 schrieb 施柏安:

Hi Markus

You should define the "osd device" and "host" then make ceph cluster work.
Take the types in your map (osd, host, chasis.root) to design the 
crushmap according to your needed.

Example:
​​
host node1 {
 id -1
 alg straw
 hash 0
 item osd.0 weight 1.00
 item osd.1 weight 1.00
}
host node2 {
 id -2
 alg straw
 hash 0
 item osd.2 weight 1.00
 item osd.3 weight 1.00
}
root default {
 id 0
 alg straw
 hash 0
 item node1 weight 2.00 (sum of its item)
 item node2 weight 2.00
}
​​Then you can use default ruleset. It is set to take the root "default".


2016-03-21 19:50 GMT+08:00 Markus Goldberg >:


Hi desmond,
this is my decompile_map:
root@bd-a:/etc/ceph# cat decompile_map
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
root default {
id -1   # do not change unnecessarily
# weight 0.000
alg straw
hash 0  # rjenkins1
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map
root@bd-a:/etc/ceph#

How should i change It?
I never had to edit anything in this area in former versions of
ceph. Has something changed?
Is any new parameter nessessary in ceph.conf while installing?

Thank you,
  Markus

Am 21.03.2016 um 10:34 schrieb 施柏安:

It seems that there no setting weight to all of your osd. So the
pg stuck in creating.
you can use some command to edit crushmap for setting weight:

# ceph osd getcrushmap -o map
# crushtool -d map -o decompile_map
# vim decompile_map (then you can change the weight to all of
your osd and its host weight)
# crushtool -c decompile_map -o changed_map
# ceph osd setcrushmap -i changed_map

Then, it should work in your situation.


2016-03-21 17:20 GMT+08:00 Markus Goldberg
>:

Hi,
root@bd-a:~# ceph osd tree
ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
-1  0 root default
 0  0 osd.0 down0  1.0
 1  0 osd.1 down0  1.0
 2  0 osd.2 down0  1.0
...delete all the other OSDs as they are the same
...
88  0 osd.88 down0  1.0
89  0 osd.89 down0  1.0
root@bd-a:~#

bye,
  Markus

Am 21.03.2016 um 10:10 schrieb 施柏安:

What's your crushmap show? Or command 'ceph osd tree' show.

2016-03-21 16:39 GMT+08:00 Markus Goldberg
>:

Hi,
i have upgraded my hardware and installed ceph totally
new as described in
http://docs.ceph.com/docs/master/rados/deployment/
The last job was creating the OSDs
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-osd/
I have used the create command and after that, the OSDs
should be in and up but they are all down and out.
An additionally osd activate command does not help.

Ubuntu 14.04.4 kernel 4.2.1
ceph 10.0.2

What should i do, where is my mistake?

This is ceph.conf:

[global]
fsid = 122e929a-111b-4067-80e4-3fef39e66ecf
mon_initial_members = bd-0, bd-1, bd-2
mon_host = xxx.xxx.xxx.20,xxx.xxx.xxx.21,xxx.xxx.xxx.22
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = xxx.xxx.xxx.0/24
cluster network = 192.168.1.0/24 
osd_journal_size = 10240
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 333

Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Zhang Qiang
Hi Reddy,
It's over a thousand lines, I pasted it on gist:
https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4

On Tue, 22 Mar 2016 at 18:15 M Ranga Swami Reddy 
wrote:

> Hi,
> Can you please share the "ceph health detail" output?
>
> Thanks
> Swami
>
> On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang 
> wrote:
> > Hi all,
> >
> > I have 20 OSDs and 1 pool, and, as recommended by the
> > doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/),
> I
> > configured pg_num and pgp_num to 4096, size 2, min size 1.
> >
> > But ceph -s shows:
> >
> > HEALTH_WARN
> > 534 pgs degraded
> > 551 pgs stuck unclean
> > 534 pgs undersized
> > too many PGs per OSD (382 > max 300)
> >
> > Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work?  And what
> > does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has
> 382
> > PGs I would have had 7640 PGs.
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread Oliver Dzombic
Hi Zhang,

are you sure, that all your 20 osd's are up and in ?

Please provide the complete output of ceph -s or better with detail flag.

Thank you :-)

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 22.03.2016 um 11:02 schrieb Zhang Qiang:
> Hi all, 
> 
> I have 20 OSDs and 1 pool, and, as recommended by the
> doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I
> configured pg_num and pgp_num to 4096, size 2, min size 1. 
> 
> But ceph -s shows:
> 
> HEALTH_WARN
> 534 pgs degraded
> 551 pgs stuck unclean
> 534 pgs undersized
> too many PGs per OSD (382 > max 300)
> 
> Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work?  And
> what does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD
> has 382 PGs I would have had 7640 PGs.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Need help for PG problem

2016-03-22 Thread M Ranga Swami Reddy
Hi,
Can you please share the "ceph health detail" output?

Thanks
Swami

On Tue, Mar 22, 2016 at 3:32 PM, Zhang Qiang  wrote:
> Hi all,
>
> I have 20 OSDs and 1 pool, and, as recommended by the
> doc(http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I
> configured pg_num and pgp_num to 4096, size 2, min size 1.
>
> But ceph -s shows:
>
> HEALTH_WARN
> 534 pgs degraded
> 551 pgs stuck unclean
> 534 pgs undersized
> too many PGs per OSD (382 > max 300)
>
> Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work?  And what
> does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has 382
> PGs I would have had 7640 PGs.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Need help for PG problem

2016-03-22 Thread Zhang Qiang
Hi all,

I have 20 OSDs and 1 pool, and, as recommended by the doc(
http://docs.ceph.com/docs/master/rados/operations/placement-groups/), I
configured pg_num and pgp_num to 4096, size 2, min size 1.

But ceph -s shows:

HEALTH_WARN
534 pgs degraded
551 pgs stuck unclean
534 pgs undersized
too many PGs per OSD (382 > max 300)

Why the recommended value, 4096, for 10 ~ 50 OSDs doesn't work?  And what
does it mean by "too many PGs per OSD (382 > max 300)"? If per OSD has 382
PGs I would have had 7640 PGs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable civetweb log in Infernails (or Jewel)

2016-03-22 Thread Mika c
Hi Cephers,
  I don't notice the user already changed from root into ceph.  By changed
the directory caps, the problem already fixed. Thank you all.



Best wishes,
Mika


2016-03-22 16:50 GMT+08:00 Mika c :

> Hi Cephers,
>   ​​Setting of "rgw frontends =
> access_log_file=/var/log/civetweb/access.log
> error_log_file=/var/log/civetweb/error.log"  is working in Firefly and
> Giant.
> But Infernails and Jewel the setting is invalid, the logs are empty. Have
> anyone know how to set civetweb log in newer CEPH correctly?
> Any comments is appreciated.
>
>
> Best wishes,
> Mika
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] recorded data digest != on disk

2016-03-22 Thread Max A. Krasilnikov
Hello!

I have 3-node cluster running ceph version 0.94.6 
(e832001feaf8c176593e0325c8298e3f16dfb403)
on Ubuntu 14.04. When scrubbing I get error:

-9> 2016-03-21 17:36:09.047029 7f253a4f6700  5 -- op tracker -- seq: 48045, 
time: 2016-03-21 17:36:09.046984, event: all_read, op: osd_sub_op(unknown.0.0:0 
5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
-8> 2016-03-21 17:36:09.047035 7f253a4f6700  5 -- op tracker -- seq: 48045, 
time: 0.00, event: dispatched, op: osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 
[scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
-7> 2016-03-21 17:36:09.047066 7f254411b700  5 -- op tracker -- seq: 48045, 
time: 2016-03-21 17:36:09.047066, event: reached_pg, op: 
osd_sub_op(unknown.0.0:0 5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] 
snapc=0=[])
-6> 2016-03-21 17:36:09.047086 7f254411b700  5 -- op tracker -- seq: 48045, 
time: 2016-03-21 17:36:09.047086, event: started, op: osd_sub_op(unknown.0.0:0 
5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
-5> 2016-03-21 17:36:09.047127 7f254411b700  5 -- op tracker -- seq: 48045, 
time: 2016-03-21 17:36:09.047127, event: done, op: osd_sub_op(unknown.0.0:0 
5.ca 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[])
-4> 2016-03-21 17:36:09.047173 7f253f912700  2 osd.13 pg_epoch: 23286 
pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
active+clean+scrubbing+deep+repair] scrub_compare_maps   osd.13 has 10 items
-3> 2016-03-21 17:36:09.047377 7f253f912700  2 osd.13 pg_epoch: 23286 
pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
active+clean+scrubbing+deep+repair] scrub_compare_maps replica 21 has 10 items
-2> 2016-03-21 17:36:09.047983 7f253f912700  2 osd.13 pg_epoch: 23286 
pg[5.ca( v 23286'8176779 (23286'8173729,23286'8176779] local-les=23286 n=8132 
ec=114 les/c 23286/23286 23285/23285/23285) [13,21] r=0 lpr=23285 
crt=23286'8176777 lcod 23286'8176778 mlcod 23286'8176778 
active+clean+scrubbing+deep+repair] 5.ca recorded data digest 0xb284fef9 != on 
disk 0x43d61c5d on 6134ccca/rb
d_data.86280c78aaf7da.000e0bb5/17//5

-1> 2016-03-21 17:36:09.048201 7f253f912700 -1 log_channel(cluster) log 
[ERR] : 5.ca recorded data digest 0xb284fef9 != on disk 0x43d61c5d on 
6134ccca/rbd_data.86280c78aaf7da.000e0bb5/17//5
 0> 2016-03-21 17:36:09.050672 7f253f912700 -1 osd/osd_types.cc: In 
function 'uint64_t SnapSet::get_clone_bytes(snapid_t) const' thread 
7f253f912700 time 2016-03-21 17:36:09.048341
osd/osd_types.cc: 4103: FAILED assert(clone_size.count(clone))

 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0x5606c23633db]
 2: (SnapSet::get_clone_bytes(snapid_t) const+0xb6) [0x5606c1fd4666]
 3: (ReplicatedPG::_scrub(ScrubMap&, std::map > > const&)+0xa1c) 
[0x5606c20b3c6c]
 4: (PG::scrub_compare_maps()+0xec9) [0x5606c2020d49]
 5: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x1ee) [0x5606c20264be]
 6: (PG::scrub(ThreadPool::TPHandle&)+0x1f4) [0x5606c2027d44]
 7: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle&)+0x19) [0x5606c1f0c379]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa56) [0x5606c2353fc6]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x5606c2355070]
 10: (()+0x8182) [0x7f256168e182]
 11: (clone()+0x6d) [0x7f255fbf947d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

Is there any way to recalculate data digest?
I have removed OSD with failed PG, data was recovered but error occurs on other
OSD. I think, I do not have consistent copy of data.
What can I do to recover?

pool size 2 (it's not so good, I know, but i have not ability to increase this
nearest 2 month).

-- 
WBR, Max A. Krasilnikov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ZFS or BTRFS for performance?

2016-03-22 Thread Mike Almateia

20-Mar-16 23:23, Schlacta, Christ пишет:

What do you use as an interconnect between your osds, and your clients?



Two Mellanox 10Gb SFP NIC dual port each = 4 x 10Gbit/s ports on each 
server.
On servers each 2 ports bonded, so we have 2 bond for Cluster net and 
Storage net.


Clients servers use also 10Gbit network.

--
Mike.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com