Re: [ceph-users] Multiple OSD crashing a lot

2016-08-13 Thread Blade Doyle
Hi HP. Mine was not really a fix, it was just a hack to get the OSD up long enough to make sure I had a full backup, then I rebuilt the cluster from scratch and restored the data. Though the hack did stop the OSD from crashing, it is probably a symptom of some internal problem, and may not be

Re: [ceph-users] Scrub Errors

2016-05-04 Thread Blade Doyle
l_default_min_size = ?? > > And whats the setting for the to create copies ? > > osd_pool_default_size = ??? > > Please give us the output of > > ceph osd pool ls detail > > > > > -- > Mit freundlichen Gruessen / Best regards > > Oliver Dzombic > IP-Interac

Re: [ceph-users] Scrub Errors

2016-05-03 Thread Blade Doyle
gestions appreciated, Blade. On Sat, Apr 30, 2016 at 9:31 AM, Blade Doyle <blade.do...@gmail.com> wrote: > Hi Ceph-Users, > > Help with how to resolve these would be appreciated. > > 2016-04-30 09:25:58.399634 9b809350 0 log_channel(cluster) log [INF] : > 4.97 deep-scrub

[ceph-users] Scrub Errors

2016-04-30 Thread Blade Doyle
Hi Ceph-Users, Help with how to resolve these would be appreciated. 2016-04-30 09:25:58.399634 9b809350 0 log_channel(cluster) log [INF] : 4.97 deep-scrub starts 2016-04-30 09:26:00.041962 93009350 0 -- 192.168.2.52:6800/6640 >> 192.168.2.32:0/3983425916 pipe(0x27406000 sd=111 :6800 s=0 pgs=0

Re: [ceph-users] Multiple OSD crashing a lot

2016-04-23 Thread Blade Doyle
I went ahead and removed the assert and conditionalized the future use of the obc variable on its being non-null. And linked that into a custom ceph-osd binary for use on the most problematic node (8). That got the osd up and running again! I took the opportunity to use the standard "remove an

Re: [ceph-users] Multiple OSD crashing a lot

2016-04-23 Thread Blade Doyle
b350 -1 *** Caught signal (Aborted) ** in thread 9d0bb350 ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) 1: /usr/bin/ceph-osd() [0x69764c] 2: (__default_sa_restorer()+0) [0xb694ed10] 3: (gsignal()+0x38) [0xb694daa8] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Aborted

Re: [ceph-users] Multiple OSD crashing a lot

2016-04-21 Thread Blade Doyle
, Apr 20, 2016 at 12:37 AM, Blade Doyle <blade.do...@gmail.com> wrote: > I get a lot of osd crash with the following stack - suggestion please: > > 0> 1969-12-31 16:04:55.455688 83ccf410 -1 osd/ReplicatedPG.cc: In > function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::R

[ceph-users] Multiple OSD crashing a lot

2016-04-20 Thread Blade Doyle
I get a lot of osd crash with the following stack - suggestion please: 0> 1969-12-31 16:04:55.455688 83ccf410 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 83ccf410 time 295.324905 osd/ReplicatedPG.cc: 11011: FAILED

[ceph-users] Losing data in healthy cluster

2016-03-25 Thread Blade Doyle
Help, my Ceph cluster is losing data slowly over time. I keep finding files that are the same length as they should be, but all the content has been lost & replaced by nulls. Here is an example: (from a backup I have the original file) [root@blotter docker]# ls -lart

Re: [ceph-users] Understanding "ceph -w" output - cluster monitoring

2016-03-15 Thread Blade Doyle
On Mon, Mar 14, 2016 at 3:48 PM, Christian Balzer <ch...@gol.com> wrote: > > Hello, > > On Mon, 14 Mar 2016 09:16:13 -0700 Blade Doyle wrote: > > > Hi Ceph Community, > > > > I am trying to use "ceph -w" output to monitor my ceph cluster. The >

[ceph-users] Understanding "ceph -w" output - cluster monitoring

2016-03-14 Thread Blade Doyle
Hi Ceph Community, I am trying to use "ceph -w" output to monitor my ceph cluster. The basic setup is: A python script runs ceph -w and processes each line of output. It finds the data it wants and reports it to InfluxDB. I view the data using Grafana, and Ceph Dashboard. For the most part

Re: [ceph-users] lstat() hangs on single file

2016-02-13 Thread Blade Doyle
Greg, Thats very useful info. I had not queried the admin sockets before today, so I am learning new things! on the x86_64: mds, mon, and osd, and rbd + cephfs client ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) On the arm7 nodes: mon, osd, and rbd + cephfs clients ceph

[ceph-users] lstat() hangs on single file

2016-02-11 Thread Blade Doyle
After several months of use without needing any administration at all, I think I finally found something to debug. Attempting to "ls -l" within a directory on CephFS hangs - strace shows its hanging on lstat(): open("/etc/group", O_RDONLY|O_CLOEXEC) = 4 fstat(4, {st_mode=S_IFREG|0644,