Preparing hammer v0.94.5

2015-10-03 Thread Loic Dachary
Hi Abhishek,

The v0.94.5 version was added to the list of versions and you should now be 
able to create the issue to track its progress. Since v0.94.4 is in the process 
of being tested, most of the ~50 backports in flight[1] will actually be for 
v0.94.5 and we can start testing them. The worst that can happen is that a few 
of them shift to v0.94.4 because they are needed after all. From the point of 
view of v0.94.5 that's all the same.

The next step is to create the issue[2] and you should have all the credentials 
you need to do so. 

This will be your first time driving a point release but there should be no 
surprises: you already know all about the process :-)

Cheers

[1] https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Ahammer
[2] 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_start_working_on_a_new_point_release#Create-new-task

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Driving the first infernalis point release v9.2.1

2015-10-03 Thread Loic Dachary
Nathan & Abhisheks (are we even allowed to do that ? ;-)

Immediately after the first infernalis release v9.2.0 [1], we will start 
preparing the v9.2.1 point release. Would one of you be willing to drive it ?

Cheers

[1] Release numbers conventions 
http://docs.ceph.com/docs/master/releases/#release-numbers-conventions

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [ceph-users] Potential OSD deadlock?

2015-10-03 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

We are still struggling with this and have tried a lot of different
things. Unfortunately, Inktank (now Red Hat) no longer provides
consulting services for non-Red Hat systems. If there are some
certified Ceph consultants in the US that we can do both remote and
on-site engagements, please let us know.

This certainly seems to be network related, but somewhere in the
kernel. We have tried increasing the network and TCP buffers, number
of TCP sockets, reduced the FIN_WAIT2 state. There is about 25% idle
on the boxes, the disks are busy, but not constantly at 100% (they
cycle from <10% up to 100%, but not 100% for more than a few seconds
at a time). There seems to be no reasonable explanation why I/O is
blocked pretty frequently longer than 30 seconds. We have verified
Jumbo frames by pinging from/to each node with 9000 byte packets. The
network admins have verified that packets are not being dropped in the
switches for these nodes. We have tried different kernels including
the recent Google patch to cubic. This is showing up on three cluster
(two Ethernet and one IPoIB). I booted one cluster into Debian Jessie
(from CentOS 7.1) with similar results.

The messages seem slightly different:
2015-10-03 14:38:23.193082 osd.134 10.208.16.25:6800/1425 439 :
cluster [WRN] 14 slow requests, 1 included below; oldest blocked for >
100.087155 secs
2015-10-03 14:38:23.193090 osd.134 10.208.16.25:6800/1425 440 :
cluster [WRN] slow request 30.041999 seconds old, received at
2015-10-03 14:37:53.151014: osd_op(client.1328605.0:7082862
rbd_data.13fdcb2ae8944a.0001264f [read 975360~4096]
11.6d19c36f ack+read+known_if_redirected e10249) currently no flag
points reached

I don't know what "no flag points reached" means.

The problem is most pronounced when we have to reboot an OSD node (1
of 13), we will have hundreds of I/O blocked for some times up to 300
seconds. It takes a good 15 minutes for things to settle down. The
production cluster is very busy doing normally 8,000 I/O and peaking
at 15,000. This is all 4TB spindles with SSD journals and the disks
are between 25-50% full. We are currently splitting PGs to distribute
the load better across the disks, but we are having to do this 10 PGs
at a time as we get blocked I/O. We have max_backfills and
max_recovery set to 1, client op priority is set higher than recovery
priority. We tried increasing the number of op threads but this didn't
seem to help. It seems as soon as PGs are finished being checked, they
become active and could be the cause for slow I/O while the other PGs
are being checked.

What I don't understand is that the messages are delayed. As soon as
the message is received by Ceph OSD process, it is very quickly
committed to the journal and a response is sent back to the primary
OSD which is received very quickly as well. I've adjust
min_free_kbytes and it seems to keep the OSDs from crashing, but
doesn't solve the main problem. We don't have swap and there is 64 GB
of RAM per nodes for 10 OSDs.

Is there something that could cause the kernel to get a packet but not
be able to dispatch it to Ceph such that it could be explaining why we
are seeing these blocked I/O for 30+ seconds. Is there some pointers
to tracing Ceph messages from the network buffer through the kernel to
the Ceph process?

We can really use some pointers no matter how outrageous. We've have
over 6 people looking into this for weeks now and just can't think of
anything else.

Thanks,
-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWEDY1CRDmVDuy+mK58QAARgoP/RcoL1qVmg7qbQrzStar
NK80bqYGeYHb26xHbt1fZVgnZhXU0nN0Dv4ew0e/cYJLELSO2KCeXNfXN6F1
prZuzYagYEyj1Q1TOo+4h/nOQRYsTwQDdFzbHb/OUDN55C0QGZ29DjEvrqP6
K5l6sAQzvQDpUEEIiOCkS6pH59ira740nSmnYkEWhr1lxF/hMjb6fFlfCFe2
h1djM0GfY7vBHFGgI3jkw0BL5AQnWe+SCcCiKZmxY6xiR70FWl3XqK5M+nxm
iq74y7Dv6cpenit6boMr6qtOeIt+8ko85hVMh09Hkaqz/m2FzxAKLcahzkGF
Fh/M6YBzgnX7QBURTC4YQT/FVyDTW3JMuT3RKQdaX6c0iiOsVdkE+iyidWyY
Hr1KzWU23Ur9yBfZ39Y43jrsSiAEwHnKjSqMowSGljdTysNEAAZQhlqZIoHb
JlgpB39ugkHI1H5fZ5b2SIDz32/d5ywG4Gay9Rk6hp8VanvIrBbev+JYEoYT
8/WX+fhueHt4dqUYWIl3HZ0CEzbXbug0xmFvhrbmL2f3t9XOkDZRbAjlYrGm
lswiJMDueY8JkxSnPvCQrHXqjbCcy9rMG7nTnLFz98rTcHNCwtpv0qVYhheg
4YRNRVMbfNP/6xsJvG1wVOSQPwxZSPqJh42pDqMRePJl3Zn66MTx5wvdNDpk
l7OF
=OI++
-END PGP SIGNATURE-

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Sep 25, 2015 at 2:40 PM, Robert LeBlanc  wrote:
> We dropped the replication on our cluster from 4 to 3 and it looks
> like all the blocked I/O has stopped (no entries in the log for the
> last 12 hours). This makes me believe that there is some issue with
> the number of sockets or some other TCP issue. We have not messed with
> Ephemeral ports and TIME_WAIT at this point. There are 130 OSDs, 8 KVM
> hosts hosting about 150 VMs. Open files is set at 32K for the OSD
> processes 

Re: Preparing hammer v0.94.5

2015-10-03 Thread Abhishek Varshney
Hi Loic,


On Sat, Oct 3, 2015 at 2:09 PM, Loic Dachary  wrote:
> Hi Abhishek,
>
> The v0.94.5 version was added to the list of versions and you should now be 
> able to create the issue to track its progress. Since v0.94.4 is in the 
> process of being tested, most of the ~50 backports in flight[1] will actually 
> be for v0.94.5 and we can start testing them. The worst that can happen is 
> that a few of them shift to v0.94.4 because they are needed after all. From 
> the point of view of v0.94.5 that's all the same.
>
> The next step is to create the issue[2] and you should have all the 
> credentials you need to do so.

I have created a new issue to keep track of hammer v0.94.5 (
http://tracker.ceph.com/issues/13356 ). I would next prepare an
integration branch with all the in-flight backports and update the
tracker for v0.94.5.

>
> This will be your first time driving a point release but there should be no 
> surprises: you already know all about the process :-)

I guess I know the process well now, but, I am sure I will need your
support and motivation in getting it through :)

>
> Cheers
>
> [1] https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Ahammer
> [2] 
> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_start_working_on_a_new_point_release#Create-new-task
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Preparing hammer v0.94.5

2015-10-03 Thread Loic Dachary


On 03/10/2015 19:47, Abhishek Varshney wrote:
> Hi Loic,
> 
> 
> On Sat, Oct 3, 2015 at 2:09 PM, Loic Dachary  wrote:
>> Hi Abhishek,
>>
>> The v0.94.5 version was added to the list of versions and you should now be 
>> able to create the issue to track its progress. Since v0.94.4 is in the 
>> process of being tested, most of the ~50 backports in flight[1] will 
>> actually be for v0.94.5 and we can start testing them. The worst that can 
>> happen is that a few of them shift to v0.94.4 because they are needed after 
>> all. From the point of view of v0.94.5 that's all the same.
>>
>> The next step is to create the issue[2] and you should have all the 
>> credentials you need to do so.
> 
> I have created a new issue to keep track of hammer v0.94.5 (
> http://tracker.ceph.com/issues/13356 ). I would next prepare an
> integration branch with all the in-flight backports and update the
> tracker for v0.94.5.

The hammer-backport has them already and it compiles ( 
http://ceph.com/gitbuilder.cgi ). I'll schedule suites for you since you do not 
have access to the sepia lab yet. You'll have a lot of fun sorting out the 
results.

> 
>>
>> This will be your first time driving a point release but there should be no 
>> surprises: you already know all about the process :-)
> 
> I guess I know the process well now, but, I am sure I will need your
> support and motivation in getting it through :)

I'll be around :-) 

> 
>>
>> Cheers
>>
>> [1] https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Ahammer
>> [2] 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_start_working_on_a_new_point_release#Create-new-task
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature