Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread hjcho616
This is what it looks like today.  Seems like ceph-osds are sitting at 0% cpu 
so... all the migrations appear to be done,  Does this look ok to shutdown and 
continue when I get the HDD on Thursday?
# ceph healthHEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 20 
pgs backfill_wait; 23 pgs degraded; 6 pgs down; 2 pgs inconsistent; 6 pgs 
peering; 4 pgs recovering; 3 pgs recovery_wait; 16 pgs stale; 23 pgs stuck 
degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 49 pgs stuck unclean; 16 
pgs stuck undersized; 16 pgs undersized; 1 requests are blocked > 32 sec; 
recovery 221870/2473686 objects degraded (8.969%); recovery 365398/2473686 
objects misplaced (14.771%); recovery 147/2251990 unfound (0.007%); 7 scrub 
errors; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag 
is not set
# dfFilesystem      1K-blocks      Used  Available Use% Mounted onudev          
      10240         0      10240   0% /devtmpfs             1584780      9212   
 1575568   1% /run/dev/sda1        15247760   9610208    4839960  67% /tmpfs    
         3961940         0    3961940   0% /dev/shmtmpfs                5120    
     0       5120   0% /run/locktmpfs             3961940         0    3961940  
 0% /sys/fs/cgroup/dev/sdd1      1952559676 712028032 1240531644  37% 
/var/lib/ceph/osd/ceph-2/dev/sde1      1952559676 628862040 1323697636  33% 
/var/lib/ceph/osd/ceph-6/dev/sdc1      1952559676 755815036 1196744640  39% 
/var/lib/ceph/osd/ceph-1/dev/sdf1       312417560  42551928  269865632  14% 
/var/lib/ceph/osd/ceph-7tmpfs              792392         0     792392   0% 
/run/user/0
I'm not sure if I am liking what I see on fdisk... it doesn't show sdb1.  I 
hope it shows up when I run dd_rescue to other drive... =P
# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.25.2).Changes will remain in memory only, until 
you decide to write them.Be careful before using the write command.
/dev/sdb: device contains a valid 'xfs' signature, it's strongly recommended to 
wipe the device by command wipefs(8) if this setup is unexpected to avoid 
possible collisions.
Device does not contain a recognized partition table.Created a new DOS 
disklabel with disk identifier 0xe684adb6.
Command (m for help): pDisk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 
sectorsUnits: sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 
bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisklabel 
type: dosDisk identifier: 0xe684adb6


Command (m for help):
 

On Tuesday, August 29, 2017 3:29 PM, Tomasz Kusmierz 
 wrote:
 

 Maged, on second host he has 4 out of 5 OSD failed on him … I think he’s past 
the trying to increase the backfill threshold :) ofcourse he could try to 
degrade cluster by letting mirror within same host :) 

On 29 Aug 2017, at 21:26, Maged Mokhtar  wrote:

One of the things to watch out in small clusters is OSDs can get full rather 
unexpectedly in recovery/backfill cases:In your case you have 2 OSD nodes with 
5 disks each. Since you have a replica of 2, each PG will have 1 copy on each 
host, so if an OSD fails, all its PGs will have to be re-created on the same 
host, meaning they will be distributed only among the 4 OSDs on the same host, 
which will quickly bump their usage by nearly 20% each.
the default osd_backfill_full_ratio is 85% so if any of the 4 OSDs was near 70% 
util before the failure, it will easily reach 85% and cause the cluster to 
error with backfill_toofull message you see.  This is why i suggest you add an 
extra disk or try your luck reasing osd_backfill_full_ratio to 92% it may fix 
things./MagedOn 2017-08-29 21:13, hjcho616 wrote:
Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =)  
That does sound great.  I don't quite have another cluster so waiting for a 
drive to arrive! =)   After setting min and max_min to 1, looks like toofull 
flag is gone... Maybe when I was making that video copy OSDs were already 
down... and those two OSDs were not enough to take too much extra...  and on 
top of it that last OSD alive was smaller disk (2TB vs 320GB)... so it probably 
was filling up faster.  I should have captured that message... but turned 
machine off and now I am at work. =P  When I get back home, I'll try to grab 
that and share.  Maybe I don't need to try to add another OSD to that cluster 
just yet!  OSDs are about 50% full on OSD1. So next up, fixing osd0! 
Regards,Hong  

 On Tuesday, August 29, 2017 1:05 PM, David Turner  
wrote:


But it was absolutely awesome to run an osd off of an rbd after the disk failed.
On Tue, Aug 29, 2017, 1:42 PM David Turner  wrote:
To addend Steve's success, the rbd was created in a second cluster in the same 
datacenter so it didn't run the risk of deadlocking that mapping rbds on 
machines running osds has.  It is still theoretical to work on the same 
cluster, but more inherently 

Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread kefu chai
On Wed, Aug 30, 2017 at 11:50 AM, Xiaoxi Chen  wrote:
> The ceph -v for 12.2.0 still go with RC, a little bit confusing
>
> root@slx03c-5zkd:~# ceph -v
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

https://github.com/ceph/ceph/pull/17359

probably we can have it fixed in 12.2.1 =)


-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developers Monthly - September

2017-08-29 Thread Leonardo Vaz
Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:

 http://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 http://wiki.ceph.com/CDM_06-SEP-2017

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] get error when use prometheus plugin of ceph-mgr

2017-08-29 Thread shawn tim
Hello,
I just want to try prometheus plugin of ceph-mgr.
Following this doc(http://docs.ceph.com/docs/master/mgr/prometheus/)
. I get output like

[root@ceph01 ~]# curl localhost:9283/metrics/ | head
  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
  0 00 00 0  0  0 --:--:-- --:--:-- --:--:--
  0
# HELP mds_mem_dir_ Directories closed
# TYPE mds_mem_dir_ counter
mds_mem_dir_{daemon="mds.ceph01"} 0.0
# HELP throttle_msgr_dispatch_throttler_radosclient_max Max value for
throttle
# TYPE throttle_msgr_dispatch_throttler_radosclient_max gauge
throttle_msgr_dispatch_throttler_radosclient_max{daemon="rgw.ceph01"}
104857600.0
# HELP mds_mem_dir+ Directories opened
# TYPE mds_mem_dir+ counter
mds_mem_dir+{daemon="mds.ceph01"} 12.0
...

It seems everythins was  OK. But when prometheus connect it, I got errors:

WARN[0002] append failed err="no token
found" source="scrape.go:648" target="{__address__="10.10.0.125:9283",
__metrics_path__="/metric
s", __scheme__="http", group="production", instance="10.10.0.125:9283",
job="ceph"}"
WARN[0007] append failed err="no token
found" source="scrape.go:648" target="{__address__="10.10.0.125:9283",
__metrics_path__="/metric
s", __scheme__="http", group="production", instance="10.10.0.125:9283",
job="ceph"}"
WARN[0012] append failed err="no token
found" source="scrape.go:648" target="{__address__="10.10.0.125:9283",
__metrics_path__="/metric
s", __scheme__="http", group="production", instance="10.10.0.125:9283",
job="ceph"}"
^CWARN[0014] Received SIGTERM, exiting gracefully...
source="main.go:340"

My prometheus config yml is

  - job_name: 'ceph mgr'
scrape_interval: 5s
# metrics_path: /metrics/
# uncomment this still get error (Get http://10.10.0.125:9283/metrics:
net/http: invalid header field value "Bearer ..." for key Authorization )
# bearer_token_file: /etc/ceph/ceph.mgr.ceph01.keyring
# bearer_token: AQD6359ZCMK3DxAAyJYErZayMX/CRZyiyk/UGg==

static_configs:
 - targets: ['10.10.0.125:9283']
   labels:
 group: 'production'


My ceph version is

[root@ceph01 ~]# ceph --version
ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)

My prometheus version is

[root@prom01 prometheus-2.0.0-beta.2.linux-amd64]# ./prometheus --version
prometheus, version 2.0.0-beta.2 (branch: HEAD, revision:
a52f082939a566d5269671e98be06fc6bdf61d09)
  build user:   root@41a0740ea598
  build date:   20170818-08:16:50
  go version:   go1.8.3

Is there any steps I missed

Thanks for help !
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Mark Nelson

Hi Bryan,

Check out your SCSI device failures, but if that doesn't pan out, Sage 
and I have been tracking this:


http://tracker.ceph.com/issues/21171

There's a fix in place being tested now!

Mark

On 08/29/2017 05:41 PM, Bryan Banister wrote:

Found some bad stuff in the messages file about SCSI block device fails…
I think I found my smoking gun…

-B



*From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
Of *Bryan Banister
*Sent:* Tuesday, August 29, 2017 5:02 PM
*To:* ceph-users@lists.ceph.com
*Subject:* [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore
back



/Note: External Email/



Hi all,



Not sure what to do with this down OSD:



-2> 2017-08-29 16:55:34.588339 72d58700  1 --
7.128.13.57:6979/18818 --> 7.128.13.55:0/52877 -- osd_ping(ping_reply
e935 stamp 2017-08-29 16:55:34.587991) v4 -- 0x67397000 con 0

-1> 2017-08-29 16:55:34.588351 72557700  1 --
7.128.13.57:6978/18818 --> 7.128.13.55:0/52877 -- osd_ping(ping_reply
e935 stamp 2017-08-29 16:55:34.587991) v4 -- 0x67395000 con 0

 0> 2017-08-29 16:55:34.650061 7fffecd93700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
In function 'void KernelDevice::_aio_thread()' thread 7fffecd93700 time
2017-08-29 16:55:34.648642

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
372: FAILED assert(r >= 0)



ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x55fb4420]

2: (KernelDevice::_aio_thread()+0x4b5) [0x55f59ce5]

3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x55f5e3cd]

4: (()+0x7dc5) [0x75635dc5]

5: (clone()+0x6d) [0x7472a73d]

NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.



--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

[snip]



Any help with recovery would be greatly appreciated, thanks!

-Bryan






Note: This email is for the confidential use of the named addressee(s)
only and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you are hereby
notified that any review, dissemination or copying of this email is
strictly prohibited, and to please notify the sender immediately and
destroy this email and any attachments. Email transmission cannot be
guaranteed to be secure or error-free. The Company, therefore, does not
make any guarantees as to the completeness or accuracy of this email or
any attachments. This email is for informational purposes only and does
not constitute a recommendation, offer, request or solicitation of any
kind to buy, sell, subscribe, redeem or perform any type of transaction
of a financial product.




Note: This email is for the confidential use of the named addressee(s)
only and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you are hereby
notified that any review, dissemination or copying of this email is
strictly prohibited, and to please notify the sender immediately and
destroy this email and any attachments. Email transmission cannot be
guaranteed to be secure or error-free. The Company, therefore, does not
make any guarantees as to the completeness or accuracy of this email or
any attachments. This email is for informational purposes only and does
not constitute a recommendation, offer, request or solicitation of any
kind to buy, sell, subscribe, redeem or perform any type of transaction
of a financial product.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Bryan Banister
Found some bad stuff in the messages file about SCSI block device fails... I 
think I found my smoking gun...
-B

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan 
Banister
Sent: Tuesday, August 29, 2017 5:02 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

Note: External Email

Hi all,

Not sure what to do with this down OSD:

-2> 2017-08-29 16:55:34.588339 72d58700  1 -- 7.128.13.57:6979/18818 
--> 7.128.13.55:0/52877 -- osd_ping(ping_reply e935 stamp 2017-08-29 
16:55:34.587991) v4 -- 0x67397000 con 0
-1> 2017-08-29 16:55:34.588351 72557700  1 -- 7.128.13.57:6978/18818 
--> 7.128.13.55:0/52877 -- osd_ping(ping_reply e935 stamp 2017-08-29 
16:55:34.587991) v4 -- 0x67395000 con 0
 0> 2017-08-29 16:55:34.650061 7fffecd93700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
 In function 'void KernelDevice::_aio_thread()' thread 7fffecd93700 time 
2017-08-29 16:55:34.648642
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
 372: FAILED assert(r >= 0)

ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) 
[0x55fb4420]
2: (KernelDevice::_aio_thread()+0x4b5) [0x55f59ce5]
3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x55f5e3cd]
4: (()+0x7dc5) [0x75635dc5]
5: (clone()+0x6d) [0x7472a73d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
[snip]

Any help with recovery would be greatly appreciated, thanks!
-Bryan



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Bryan Banister
Hi all,

Not sure what to do with this down OSD:

-2> 2017-08-29 16:55:34.588339 72d58700  1 -- 7.128.13.57:6979/18818 
--> 7.128.13.55:0/52877 -- osd_ping(ping_reply e935 stamp 2017-08-29 
16:55:34.587991) v4 -- 0x67397000 con 0
-1> 2017-08-29 16:55:34.588351 72557700  1 -- 7.128.13.57:6978/18818 
--> 7.128.13.55:0/52877 -- osd_ping(ping_reply e935 stamp 2017-08-29 
16:55:34.587991) v4 -- 0x67395000 con 0
 0> 2017-08-29 16:55:34.650061 7fffecd93700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
 In function 'void KernelDevice::_aio_thread()' thread 7fffecd93700 time 
2017-08-29 16:55:34.648642
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/os/bluestore/KernelDevice.cc:
 372: FAILED assert(r >= 0)

ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) 
[0x55fb4420]
2: (KernelDevice::_aio_thread()+0x4b5) [0x55f59ce5]
3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x55f5e3cd]
4: (()+0x7dc5) [0x75635dc5]
5: (clone()+0x6d) [0x7472a73d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
[snip]

Any help with recovery would be greatly appreciated, thanks!
-Bryan



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Centos7, luminous, cephfs, .snaps

2017-08-29 Thread Marc Roos

Where can I find some examples on creating a snapshot on a directory. 
Can I just do mkdir .snaps? I tried with stock kernel and a 4.12.9-1
http://docs.ceph.com/docs/luminous/dev/cephfs-snapshots/






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
so on IRC I was asked to add this log from OSD that was marked as missing 
during scrub:

https://pastebin.com/raw/YQj3Drzi


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread David Zafman


Please file a bug in tracker: http://tracker.ceph.com/projects/ceph

When an OSD is marked down is there are a crash (e.g. assert, heartbeat 
timeout, declared down by another daemon)?  Please include relevant log 
snippets.  If no obvious information, then bump osd debug log levels.


Luminous LTS release happened today, so 12.2.0 is the best thing to run 
as of now.


See if any existing bugs like http://tracker.ceph.com/issues/21142 are 
related.


David


On 8/29/17 8:24 AM, Tomasz Kusmierz wrote:

So nobody has any clue on this one ???

Should I go with this one to dev mailing list ?


On 27 Aug 2017, at 01:49, Tomasz Kusmierz  wrote:

Hi,
for purposes of experimenting I’m running a home cluster that consists of 
single node and 4 OSD (weights in crush map are true to actual hdd size). I 
prefer to test all new stuff on home equipment before getting egg in the face 
at work :)
Anyway recently I’ve upgrade to Luminous, and replaced my ancient 8x 2TB drives 
with 2x 8TB drives (with hopes of getting more in near future). While doing 
that I’ve converted everything to bluestore. while still on 12.1.1

Everything was running smooth and performance was good (for ceph).

I’ve decided to upgrade recently to 12.1.2 and this is where everything started 
acting up. I’m aware that
- single node cluster is not a cluster
- in the end I might need more OSD (old joke right ?)
- I need to switch from spinning rust to SSD

Before upgrade my “cluster” was only switching to WRN only when I was pumping a 
lot of data into it and it would just come up with “slow requests” stuff. Now 
while completely static, not doing anything (no read, no write) OSD’s are 
committing suicide due to timeout, also before they will commit suicide I can’t 
actually access data from cluster, which make me think that while performing a 
scrub those are unaccessible. Bellow I’ll attach a log excerpt just please 
notice that it happens on deep scrub and normal scrub as well.

After I’ve discovered that I’ve tried to play around with sysctl.conf and with 
ceph.conf ( up to this point sysctl.conf was stock, and ceph.conf was just 
adjusted to allow greater OSD full capacity and disable cephx to speed it up)

also I’m running 3 pools on top of this cluster (all three have size = 2 
min_size = 2):
cephfs_data pg=256 (99.99% of data used in cluster)
cephfs_metadata pg=4 (0.01% of data used in cluster)
rbd pg=8 but this pool contains no data and I’m considering removing it since 
in my use case I’ve got nothing for it.

Please note that while this logs were produced cephFS was not even mounted :/



FYI hardware is old and trusted hp proliant DL180 G6 with 2 xeons @2.2GHz 
giving 16 cores and 32GB or ECC ram and LSI in HBA mode (2x 6GB SAS)



(
As a side issue could somebody explain to my why with bluestore that was supposed to 
cure cancer write performance still sucks ? I know that filestore did suffer from 
writing everything multiple times to same drive, and I did experience this first hand 
when after exhausting journals it was just dead slow, but now while within same host 
in my current configuration it keeps choking [flaps 70MB/s -> 10 MB/s -> 
70MB/s] and I never seen it even approach speed of single slowest drive. This server 
is not a speed daemon, I know, but when performing a simultaneous read / write for 
those drives I was getting around 760MB/s sequential R/W speed.
Right now I’m struggling to comprehend where the bottleneck is while performing 
operations within same host ?! network should not be an issue (correct me if 
I’m wrong here), dumping a singular blob into pool should produce a nice long 
sequence of object placed into drives …
I’m just puzzled why ceph will not exceed combined 40MB/s while still switching 
“cluster” into warning state due to “slow responses”
2017-08-24 20:49:34.457191 osd.8 osd.8 192.168.1.240:6814/3393 503 : cluster 
[WRN] slow request 63.878717 seconds old, received at 2017-08-24 
20:48:30.578398: osd_op(client.994130.1:13659 1.9700016d 
1:b68000e9:::10ffeef.0068:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457195 osd.8 osd.8 192.168.1.240:6814/3393 504 : cluster 
[WRN] slow request 64.177858 seconds old, received at 2017-08-24 
20:48:30.279257: osd_op(client.994130.1:13568 1.b95e13a4 
1:25c87a9d:::10ffeef.000d:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457198 osd.8 osd.8 192.168.1.240:6814/3393 505 : cluster 
[WRN] slow request 64.002653 seconds old, received at 2017-08-24 
20:48:30.454463: osd_op(client.994130.1:13626 1.b426420e 
1:7042642d:::10ffeef.0047:head [write 0~4194304 [1@-1]] snapc 1=[] 
ondisk+write+known_if_redirected e4306) currently waiting for active
2017-08-24 20:49:34.457200 osd.8 osd.8 192.168.1.240:6814/3393 506 : cluster 
[WRN] slow request 63.873519 seconds old, 

[ceph-users] Possible way to clean up leaked multipart objects?

2017-08-29 Thread William Schroeder
Hello!

Our team finally had a chance to take another look at the problem identified by 
Brian Felton in http://tracker.ceph.com/issues/16767.  Basically, if any 
multipart objects are retried before an Abort or Complete, they remain on the 
system, taking up space and leaving their accounting in “radosgw-admin bucket 
stats”.  The problem is confirmed in Hammer and Jewel.

This past week, we succeeded in some experimental code to remove those parts.  
I am not sure if this code has any unintended consequences, so **I would 
greatly appreciate reviews of the new tool**!  I have tested it successfully 
against objects created and leaked in the ceph-demo Docker image for Jewel.  
Here is a pull request with the patch:

https://github.com/ceph/ceph/pull/17349

Basically, we added a new subcommand for “bucket” called “fixmpleak”.  This 
lists objects in the “multipart” namespace, and it identifies objects that are 
not associated with current .meta files in that list.  It then deletes those 
objects with a delete op, which results in the accounting being corrected and 
the space being reclaimed on the OSDs.

This is not a preventative measure, which would be a lot more complicated, but 
we figure to run this tool hourly against all our buckets to keep things clean.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-08-29 Thread Phil Schwarz

Hi, back to work, i face my problem.

@Alexandre : AMDTurion  for N54L HP Microserver.
This server is OSD and LXC only, no mon working in.

After rebooting the whole cluster and attempting to add a third time the 
same disk :


ceph osd tree
ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.47226 root default
-2 3.65898 host jon
 1 2.2 osd.1  up  1.0  1.0
 3 1.35899 osd.3  up  1.0  1.0
-3 0.34999 host daenerys
 0 0.34999 osd.0  up  1.0  1.0
-4 1.64969 host tyrion
 2 0.44969 osd.2  up  1.0  1.0
 4 1.2 osd.4  up  1.0  1.0
-5 1.81360 host jaime
 5 1.81360 osd.5  up  1.0  1.0
 6   0 osd.6down0  1.0
 7   0 osd.7down0  1.0
 8   0 osd.8down0  1.0

6,7,8 disks are the same issue for the same disk (which isn't faulty).


Any clue ?
I'm gonna try soon to create the osd on this disk in another server.

Thanks.

Best regards
Le 26/07/2017 à 15:53, Alexandre DERUMIER a écrit :

Hi Phil,


It's possible that rocksdb have a bug with some old cpus currently (old xeon 
and some opteron)
I have the same behaviour with new cluster when creating mons
http://tracker.ceph.com/issues/20529

What is your cpu model ?

in your log:

sh[1869]:  in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]:  ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) 
luminous (dev)
sh[1869]:  1: (()+0x9bc562) [0x558561169562]
sh[1869]:  2: (()+0x110c0) [0x7f6d835cb0c0]
sh[1869]:  3: 
(rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x871) 
[0x5585615788b1]
sh[1869]:  4: (rocksdb::VersionSet::Recover(std::vector const&, bool)+0x26bc) 
[0x55856145ca4c]
sh[1869]:  5: (rocksdb::DBImpl::Recover(std::vector const&, bool, bool, bool)+0x11f) 
[0x558561423e6f]
sh[1869]:  6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string const&, std:
sh[1869]:  7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string const&, rocksdb:
sh[1869]:  8: (RocksDBStore::do_open(std::ostream&, bool)+0x68e) 
[0x5585610af76e]
sh[1869]:  9: (RocksDBStore::create_and_open(std::ostream&)+0xd7) 
[0x5585610b0d27]
sh[1869]:  10: (BlueStore::_open_db(bool)+0x326) [0x55856103c6d6]
sh[1869]:  11: (BlueStore::mkfs()+0x856) [0x55856106d406]
sh[1869]:  12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string const&, uuid_d, int)+0x348) 
[0x558560bc98f8]
sh[1869]:  13: (main()+0xe58) [0x558560b1da78]
sh[1869]:  14: (__libc_start_main()+0xf1) [0x7f6d825802b1]
sh[1869]:  15: (_start()+0x2a) [0x558560ba4dfa]
sh[1869]: 2017-07-16 14:46:00.763521 7f6d85db3c80 -1 *** Caught signal (Illegal 
instruction) **
sh[1869]:  in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]:  ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) 
luminous (dev)
sh[1869]:  1: (()+0x9bc562) [0x558561169562]

- Mail original -
De: "Phil Schwarz" 
À: "Udo Lembke" , "ceph-users" 
Envoyé: Dimanche 16 Juillet 2017 15:04:16
Objet: Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & 
Ceph Luminous

Le 15/07/2017 à 23:09, Udo Lembke a écrit :

Hi,

On 15.07.2017 16:01, Phil Schwarz wrote:

Hi,
...

While investigating, i wondered about my config :
Question relative to /etc/hosts file :
Should i use private_replication_LAN Ip or public ones ?

private_replication_LAN!! And the pve-cluster should use another network
(nics) if possible.

Udo


OK, thanks Udo.

After investigation, i did :
- set Noout OSDs
- Stopped CPU-pegging LXC
- Check the cabling
- Restart the whole cluster

Everything went fine !

But, when i tried to add a new OSD :

fdisk /dev/sdc --> Deleted the partition table
parted /dev/sdc --> mklabel msdos (Disk came from a ZFS FreeBSD system)
dd if=/dev/null of=/dev/sdc
ceph-disk zap /dev/sdc
dd if=/dev/zero of=/dev/sdc bs=10M count=1000

And recreated the OSD via Web GUI.
Same result, the OSD is known by the node, but not by the cluster.

Logs seem to show an issue with this bluestore OSD, have a look at the file.

I'm gonna give a try to OSD recreating using Filestore.

Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Tomasz Kusmierz
Maged, on second host he has 4 out of 5 OSD failed on him … I think he’s past 
the trying to increase the backfill threshold :) ofcourse he could try to 
degrade cluster by letting mirror within same host :) 
> On 29 Aug 2017, at 21:26, Maged Mokhtar  wrote:
> 
> One of the things to watch out in small clusters is OSDs can get full rather 
> unexpectedly in recovery/backfill cases:
> 
> In your case you have 2 OSD nodes with 5 disks each. Since you have a replica 
> of 2, each PG will have 1 copy on each host, so if an OSD fails, all its PGs 
> will have to be re-created on the same host, meaning they will be distributed 
> only among the 4 OSDs on the same host, which will quickly bump their usage 
> by nearly 20% each.
> the default osd_backfill_full_ratio is 85% so if any of the 4 OSDs was near 
> 70% util before the failure, it will easily reach 85% and cause the cluster 
> to error with backfill_toofull message you see.  This is why i suggest you 
> add an extra disk or try your luck reasing osd_backfill_full_ratio to 92% it 
> may fix things.
> 
> /Maged
> 
> On 2017-08-29 21:13, hjcho616 wrote:
> 
>> Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =) 
>>  That does sound great.  I don't quite have another cluster so waiting for a 
>> drive to arrive! =)  
>>  
>> After setting min and max_min to 1, looks like toofull flag is gone... Maybe 
>> when I was making that video copy OSDs were already down... and those two 
>> OSDs were not enough to take too much extra...  and on top of it that last 
>> OSD alive was smaller disk (2TB vs 320GB)... so it probably was filling up 
>> faster.  I should have captured that message... but turned machine off and 
>> now I am at work. =P  When I get back home, I'll try to grab that and share. 
>>  Maybe I don't need to try to add another OSD to that cluster just yet!  
>> OSDs are about 50% full on OSD1.
>>  
>> So next up, fixing osd0!
>>  
>> Regards,
>> Hong  
>> 
>> 
>> On Tuesday, August 29, 2017 1:05 PM, David Turner  
>> wrote:
>> 
>> 
>> But it was absolutely awesome to run an osd off of an rbd after the disk 
>> failed.
>> 
>> On Tue, Aug 29, 2017, 1:42 PM David Turner > > wrote:
>> To addend Steve's success, the rbd was created in a second cluster in the 
>> same datacenter so it didn't run the risk of deadlocking that mapping rbds 
>> on machines running osds has.  It is still theoretical to work on the same 
>> cluster, but more inherently dangerous for a few reasons.
>> 
>> On Tue, Aug 29, 2017, 1:15 PM Steve Taylor > > wrote:
>> Hong,
>> 
>> Probably your best chance at recovering any data without special,
>> expensive, forensic procedures is to perform a dd from /dev/sdb to
>> somewhere else large enough to hold a full disk image and attempt to
>> repair that. You'll want to use 'conv=noerror' with your dd command
>> since your disk is failing. Then you could either re-attach the OSD
>> from the new source or attempt to retrieve objects from the filestore
>> on it.
>> 
>> I have actually done this before by creating an RBD that matches the
>> disk size, performing the dd, running xfs_repair, and eventually
>> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
>> temporary arrangement for repair only, but I'm happy to report that it
>> worked flawlessly in my case. I was able to weight the OSD to 0,
>> offload all of its data, then remove it for a full recovery, at which
>> point I just deleted the RBD.
>> 
>> The possibilities afforded by Ceph inception are endless. ☺
>> 
>> 
>> 
>> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 801.871.2799 |
>> 
>> If you are not the intended recipient of this message or received it 
>> erroneously, please notify the sender and delete it, together with any 
>> attachments, and be advised that any dissemination or copying of this 
>> message is prohibited.
>> 
>> 
>> 
>> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>> > Rule of thumb with batteries is:
>> > - more "proper temperature" you run them at the more life you get out
>> > of them
>> > - more battery is overpowered for your application the longer it will
>> > survive. 
>> >
>> > Get your self a LSI 94** controller and use it as HBA and you will be
>> > fine. but get MORE DRIVES ! ... 
>> > > On 28 Aug 2017, at 23:10, hjcho616 > > > > wrote:
>> > >
>> > > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
>> > > try these out.  Car battery idea is nice!  I may try that.. =)  Do
>> > > they last longer?  Ones that fit the UPS original battery spec
>> > > didn't last very long... part of the reason why I gave up on them..
>> > > =P  My wife probably won't like the idea of car 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Maged Mokhtar
One of the things to watch out in small clusters is OSDs can get full
rather unexpectedly in recovery/backfill cases: 

In your case you have 2 OSD nodes with 5 disks each. Since you have a
replica of 2, each PG will have 1 copy on each host, so if an OSD fails,
all its PGs will have to be re-created on the same host, meaning they
will be distributed only among the 4 OSDs on the same host, which will
quickly bump their usage by nearly 20% each.
the default osd_backfill_full_ratio is 85% so if any of the 4 OSDs was
near 70% util before the failure, it will easily reach 85% and cause the
cluster to error with backfill_toofull message you see.  This is why i
suggest you add an extra disk or try your luck reasing
osd_backfill_full_ratio to 92% it may fix things. 

/Maged 

On 2017-08-29 21:13, hjcho616 wrote:

> Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =)  
> That does sound great.  I don't quite have another cluster so waiting for a 
> drive to arrive! =)   
> 
> After setting min and max_min to 1, looks like toofull flag is gone... Maybe 
> when I was making that video copy OSDs were already down... and those two 
> OSDs were not enough to take too much extra...  and on top of it that last 
> OSD alive was smaller disk (2TB vs 320GB)... so it probably was filling up 
> faster.  I should have captured that message... but turned machine off and 
> now I am at work. =P  When I get back home, I'll try to grab that and share.  
> Maybe I don't need to try to add another OSD to that cluster just yet!  OSDs 
> are about 50% full on OSD1. 
> 
> So next up, fixing osd0! 
> 
> Regards, 
> Hong   
> 
> On Tuesday, August 29, 2017 1:05 PM, David Turner  
> wrote:
> 
> But it was absolutely awesome to run an osd off of an rbd after the disk 
> failed. 
> 
> On Tue, Aug 29, 2017, 1:42 PM David Turner  wrote: 
> To addend Steve's success, the rbd was created in a second cluster in the 
> same datacenter so it didn't run the risk of deadlocking that mapping rbds on 
> machines running osds has.  It is still theoretical to work on the same 
> cluster, but more inherently dangerous for a few reasons. 
> 
> On Tue, Aug 29, 2017, 1:15 PM Steve Taylor  
> wrote: Hong,
> 
> Probably your best chance at recovering any data without special,
> expensive, forensic procedures is to perform a dd from /dev/sdb to
> somewhere else large enough to hold a full disk image and attempt to
> repair that. You'll want to use 'conv=noerror' with your dd command
> since your disk is failing. Then you could either re-attach the OSD
> from the new source or attempt to retrieve objects from the filestore
> on it.
> 
> I have actually done this before by creating an RBD that matches the
> disk size, performing the dd, running xfs_repair, and eventually
> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
> temporary arrangement for repair only, but I'm happy to report that it
> worked flawlessly in my case. I was able to weight the OSD to 0,
> offload all of its data, then remove it for a full recovery, at which
> point I just deleted the RBD.
> 
> The possibilities afforded by Ceph inception are endless. ☺
> 
> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
> 
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>> Rule of thumb with batteries is:
>> - more "proper temperature" you run them at the more life you get out
>> of them
>> - more battery is overpowered for your application the longer it will
>> survive. 
>> 
>> Get your self a LSI 94** controller and use it as HBA and you will be
>> fine. but get MORE DRIVES ! ... 
>>> On 28 Aug 2017, at 23:10, hjcho616  wrote:
>>>
>>> Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
>>> try these out.  Car battery idea is nice!  I may try that.. =)  Do
>>> they last longer?  Ones that fit the UPS original battery spec
>>> didn't last very long... part of the reason why I gave up on them..
>>> =P  My wife probably won't like the idea of car battery hanging out
>>> though ha!
>>>
>>> The OSD1 (one with mostly ok OSDs, except that smart failure)
>>> motherboard doesn't have any additional SATA connectors available.
>>>  Would it be safe to add another OSD host?
>>>
>>> Regards,
>>> Hong
>>>
>>>
>>>
>>> On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz >> mail.com [1]> wrote:
>>>
>>>
>>> Sorry for being brutal ... anyway 
>>> 1. get the battery for UPS ( a car battery will do as well, I've
>>> moded on ups in the past with truck battery and it was working like
>>> a 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Tomasz Kusmierz
Just FYI, setting size and min_size to 1 is a last resort in my mind - to get 
you out of dodge !! 

Before setting that you should have made your self 105% certain that all OSD 
you leave ON, have NO bad sectors or no sectors pending or no any errors of any 
kind. 

once you can mount the cephfs, just delete everything you don’t actually need. 
Trust everybody has some data that they don’t trully need … this pron 
collection that you can redownload ;) that set of iso files that you downloaded 
from ubuntu but you can download them later … it might turn out that one of 
those files will contain the missing objects and your recovery will be 
pointless. 

> On 29 Aug 2017, at 20:49, Willem Jan Withagen  wrote:
> 
> On 29-8-2017 19:12, Steve Taylor wrote:
>> Hong,
>> 
>> Probably your best chance at recovering any data without special,
>> expensive, forensic procedures is to perform a dd from /dev/sdb to
>> somewhere else large enough to hold a full disk image and attempt to
>> repair that. You'll want to use 'conv=noerror' with your dd command
>> since your disk is failing. Then you could either re-attach the OSD
>> from the new source or attempt to retrieve objects from the filestore
>> on it.
> 
> Like somebody else already pointed out
> In problem "cases like disk, use dd_rescue.
> It has really a far better chance of restoring a copy of your disk
> 
> --WjW
> 
>> I have actually done this before by creating an RBD that matches the
>> disk size, performing the dd, running xfs_repair, and eventually
>> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
>> temporary arrangement for repair only, but I'm happy to report that it
>> worked flawlessly in my case. I was able to weight the OSD to 0,
>> offload all of its data, then remove it for a full recovery, at which
>> point I just deleted the RBD.
>> 
>> The possibilities afforded by Ceph inception are endless. ☺
>> 
>> 
>> 
>> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 801.871.2799 | 
>> 
>> If you are not the intended recipient of this message or received it 
>> erroneously, please notify the sender and delete it, together with any 
>> attachments, and be advised that any dissemination or copying of this 
>> message is prohibited.
>> 
>> 
>> 
>> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>>> Rule of thumb with batteries is:
>>> - more “proper temperature” you run them at the more life you get out
>>> of them
>>> - more battery is overpowered for your application the longer it will
>>> survive. 
>>> 
>>> Get your self a LSI 94** controller and use it as HBA and you will be
>>> fine. but get MORE DRIVES ! … 
 On 28 Aug 2017, at 23:10, hjcho616  wrote:
 
 Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
 try these out.  Car battery idea is nice!  I may try that.. =)  Do
 they last longer?  Ones that fit the UPS original battery spec
 didn't last very long... part of the reason why I gave up on them..
 =P  My wife probably won't like the idea of car battery hanging out
 though ha!
 
 The OSD1 (one with mostly ok OSDs, except that smart failure)
 motherboard doesn't have any additional SATA connectors available.
  Would it be safe to add another OSD host?
 
 Regards,
 Hong
 
 
 
 On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz  wrote:
 
 
 Sorry for being brutal … anyway 
 1. get the battery for UPS ( a car battery will do as well, I’ve
 moded on ups in the past with truck battery and it was working like
 a charm :D )
 2. get spare drives and put those in because your cluster CAN NOT
 get out of error due to lack of space
 3. Follow advice of Ronny Aasen on hot to recover data from hard
 drives 
 4 get cooling to drives or you will loose more ! 
 
 
> On 28 Aug 2017, at 22:39, hjcho616  wrote:
> 
> Tomasz,
> 
> Those machines are behind a surge protector.  Doesn't appear to
> be a good one!  I do have a UPS... but it is my fault... no
> battery.  Power was pretty reliable for a while... and UPS was
> just beeping every chance it had, disrupting some sleep.. =P  So
> running on surge protector only.  I am running this in home
> environment.   So far, HDD failures have been very rare for this
> environment. =)  It just doesn't get loaded as much!  I am not
> sure what to expect, seeing that "unfound" and just a feeling of
> possibility of maybe getting OSD back made me excited about it.
> =) Thanks for letting me know what should be the priority.  I
> just lack experience and knowledge in this. =) Please do continue
> to guide me though this. 
> 
> Thank you for the decode of that smart messages!  I do agree that
> 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Willem Jan Withagen
On 29-8-2017 19:12, Steve Taylor wrote:
> Hong,
> 
> Probably your best chance at recovering any data without special,
> expensive, forensic procedures is to perform a dd from /dev/sdb to
> somewhere else large enough to hold a full disk image and attempt to
> repair that. You'll want to use 'conv=noerror' with your dd command
> since your disk is failing. Then you could either re-attach the OSD
> from the new source or attempt to retrieve objects from the filestore
> on it.

Like somebody else already pointed out
In problem "cases like disk, use dd_rescue.
It has really a far better chance of restoring a copy of your disk

--WjW

> I have actually done this before by creating an RBD that matches the
> disk size, performing the dd, running xfs_repair, and eventually
> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
> temporary arrangement for repair only, but I'm happy to report that it
> worked flawlessly in my case. I was able to weight the OSD to 0,
> offload all of its data, then remove it for a full recovery, at which
> point I just deleted the RBD.
> 
> The possibilities afforded by Ceph inception are endless. ☺
> 
> 
>  
> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | 
>  
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
>  
> 
> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>> Rule of thumb with batteries is:
>> - more “proper temperature” you run them at the more life you get out
>> of them
>> - more battery is overpowered for your application the longer it will
>> survive. 
>>
>> Get your self a LSI 94** controller and use it as HBA and you will be
>> fine. but get MORE DRIVES ! … 
>>> On 28 Aug 2017, at 23:10, hjcho616  wrote:
>>>
>>> Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
>>> try these out.  Car battery idea is nice!  I may try that.. =)  Do
>>> they last longer?  Ones that fit the UPS original battery spec
>>> didn't last very long... part of the reason why I gave up on them..
>>> =P  My wife probably won't like the idea of car battery hanging out
>>> though ha!
>>>
>>> The OSD1 (one with mostly ok OSDs, except that smart failure)
>>> motherboard doesn't have any additional SATA connectors available.
>>>  Would it be safe to add another OSD host?
>>>
>>> Regards,
>>> Hong
>>>
>>>
>>>
>>> On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz >> mail.com> wrote:
>>>
>>>
>>> Sorry for being brutal … anyway 
>>> 1. get the battery for UPS ( a car battery will do as well, I’ve
>>> moded on ups in the past with truck battery and it was working like
>>> a charm :D )
>>> 2. get spare drives and put those in because your cluster CAN NOT
>>> get out of error due to lack of space
>>> 3. Follow advice of Ronny Aasen on hot to recover data from hard
>>> drives 
>>> 4 get cooling to drives or you will loose more ! 
>>>
>>>
 On 28 Aug 2017, at 22:39, hjcho616  wrote:

 Tomasz,

 Those machines are behind a surge protector.  Doesn't appear to
 be a good one!  I do have a UPS... but it is my fault... no
 battery.  Power was pretty reliable for a while... and UPS was
 just beeping every chance it had, disrupting some sleep.. =P  So
 running on surge protector only.  I am running this in home
 environment.   So far, HDD failures have been very rare for this
 environment. =)  It just doesn't get loaded as much!  I am not
 sure what to expect, seeing that "unfound" and just a feeling of
 possibility of maybe getting OSD back made me excited about it.
 =) Thanks for letting me know what should be the priority.  I
 just lack experience and knowledge in this. =) Please do continue
 to guide me though this. 

 Thank you for the decode of that smart messages!  I do agree that
 looks like it is on its way out.  I would like to know how to get
 good portion of it back if possible. =)

 I think I just set the size and min_size to 1.
 # ceph osd lspools
 0 data,1 metadata,2 rbd,
 # ceph osd pool set rbd size 1
 set pool 2 size to 1
 # ceph osd pool set rbd min_size 1
 set pool 2 min_size to 1

 Seems to be doing some backfilling work.

 # ceph health
 HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
 pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling;
 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;
 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs
 stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101
 pgs stuck undersized; 101 pgs undersized; 1 requests are blocked
> 32 sec; 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread hjcho616
Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =)  
That does sound great.  I don't quite have another cluster so waiting for a 
drive to arrive! =)  
After setting min and max_min to 1, looks like toofull flag is gone... Maybe 
when I was making that video copy OSDs were already down... and those two OSDs 
were not enough to take too much extra...  and on top of it that last OSD alive 
was smaller disk (2TB vs 320GB)... so it probably was filling up faster.  I 
should have captured that message... but turned machine off and now I am at 
work. =P  When I get back home, I'll try to grab that and share.  Maybe I don't 
need to try to add another OSD to that cluster just yet!  OSDs are about 50% 
full on OSD1.
So next up, fixing osd0!
Regards,Hong   

On Tuesday, August 29, 2017 1:05 PM, David Turner  
wrote:
 

 But it was absolutely awesome to run an osd off of an rbd after the disk 
failed.
On Tue, Aug 29, 2017, 1:42 PM David Turner  wrote:

To addend Steve's success, the rbd was created in a second cluster in the same 
datacenter so it didn't run the risk of deadlocking that mapping rbds on 
machines running osds has.  It is still theoretical to work on the same 
cluster, but more inherently dangerous for a few reasons.
On Tue, Aug 29, 2017, 1:15 PM Steve Taylor  
wrote:

Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.

I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺



Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
> Rule of thumb with batteries is:
> - more “proper temperature” you run them at the more life you get out
> of them
> - more battery is overpowered for your application the longer it will
> survive. 
>
> Get your self a LSI 94** controller and use it as HBA and you will be
> fine. but get MORE DRIVES ! … 
> > On 28 Aug 2017, at 23:10, hjcho616  wrote:
> >
> > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
> > try these out.  Car battery idea is nice!  I may try that.. =)  Do
> > they last longer?  Ones that fit the UPS original battery spec
> > didn't last very long... part of the reason why I gave up on them..
> > =P  My wife probably won't like the idea of car battery hanging out
> > though ha!
> >
> > The OSD1 (one with mostly ok OSDs, except that smart failure)
> > motherboard doesn't have any additional SATA connectors available.
> >  Would it be safe to add another OSD host?
> >
> > Regards,
> > Hong
> >
> >
> >
> > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz  > mail.com> wrote:
> >
> >
> > Sorry for being brutal … anyway 
> > 1. get the battery for UPS ( a car battery will do as well, I’ve
> > moded on ups in the past with truck battery and it was working like
> > a charm :D )
> > 2. get spare drives and put those in because your cluster CAN NOT
> > get out of error due to lack of space
> > 3. Follow advice of Ronny Aasen on hot to recover data from hard
> > drives 
> > 4 get cooling to drives or you will loose more ! 
> >
> >
> > > On 28 Aug 2017, at 22:39, hjcho616  wrote:
> > >
> > > Tomasz,
> > >
> > > Those machines are behind a surge protector.  Doesn't appear to
> > > be a good one!  I do have a UPS... but it is my fault... no
> > > battery.  Power was pretty reliable for a while... and UPS was
> > > just beeping every chance it had, disrupting some sleep.. =P  So
> > > running on surge protector only.  I am running this in home
> > > environment.   So far, HDD failures have been very rare for this
> > > environment. =)  It just doesn't get loaded as much!  I am not
> > > sure what to expect, seeing that "unfound" and just a feeling of
> > > possibility of maybe getting OSD back 

[ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Abhishek Lekshmanan

We're glad to announce the first release of Luminous v12.2.x long term
stable release series. There have been major changes since Kraken
(v11.2.z) and Jewel (v10.2.z), and the upgrade process is non-trivial.
Please read the release notes carefully.

For more details, links & changelog please refer to the
complete release notes entry at the Ceph blog:
http://ceph.com/releases/v12-2-0-luminous-released/


Major Changes from Kraken
-

- *General*:
  * Ceph now has a simple, built-in web-based dashboard for monitoring cluster
status.

- *RADOS*:
  * *BlueStore*:
- The new *BlueStore* backend for *ceph-osd* is now stable and the
  new default for newly created OSDs.  BlueStore manages data
  stored by each OSD by directly managing the physical HDDs or
  SSDs without the use of an intervening file system like XFS.
  This provides greater performance and features.
- BlueStore supports full data and metadata checksums
  of all data stored by Ceph.
- BlueStore supports inline compression using zlib, snappy, or LZ4. (Ceph
  also supports zstd for RGW compression but zstd is not recommended for
  BlueStore for performance reasons.)

  * *Erasure coded* pools now have full support for overwrites
allowing them to be used with RBD and CephFS.

  * *ceph-mgr*:
- There is a new daemon, *ceph-mgr*, which is a required part of
  any Ceph deployment.  Although IO can continue when *ceph-mgr*
  is down, metrics will not refresh and some metrics-related calls
  (e.g., `ceph df`) may block.  We recommend deploying several
  instances of *ceph-mgr* for reliability.  See the notes on
  Upgrading below.
- The *ceph-mgr* daemon includes a REST-based management API.
  The API is still experimental and somewhat limited but
  will form the basis for API-based management of Ceph going forward.
- ceph-mgr also includes a Prometheus exporter plugin, which can provide 
Ceph
  perfcounters to Prometheus.
- ceph-mgr now has a Zabbix plugin. Using zabbix_sender it sends trapper
  events to a Zabbix server containing high-level information of the Ceph
  cluster. This makes it easy to monitor a Ceph cluster's status and send
  out notifications in case of a malfunction.

  * The overall *scalability* of the cluster has improved. We have
successfully tested clusters with up to 10,000 OSDs.
  * Each OSD can now have a device class associated with
it (e.g., `hdd` or `ssd`), allowing CRUSH rules to trivially map
data to a subset of devices in the system.  Manually writing CRUSH
rules or manual editing of the CRUSH is normally not required.
  * There is a new upmap exception mechanism that allows individual PGs to be 
moved around to achieve
a *perfect distribution* (this requires luminous clients).
  * Each OSD now adjusts its default configuration based on whether the
backing device is an HDD or SSD. Manual tuning generally not required.
  * The prototype mClock QoS queueing algorithm is now available.
  * There is now a *backoff* mechanism that prevents OSDs from being
overloaded by requests to objects or PGs that are not currently able to
process IO.
  * There is a simplified OSD replacement process that is more robust.
  * You can query the supported features and (apparent) releases of
all connected daemons and clients with `ceph features`
  * You can configure the oldest Ceph client version you wish to allow to
connect to the cluster via `ceph osd set-require-min-compat-client` and
Ceph will prevent you from enabling features that will break compatibility
with those clients.
  * Several `sleep` settings, include `osd_recovery_sleep`,
`osd_snap_trim_sleep`, and `osd_scrub_sleep` have been
reimplemented to work efficiently.  (These are used in some cases
to work around issues throttling background work.)
  * Pools are now expected to be associated with the application using them.
Upon completing the upgrade to Luminous, the cluster will attempt to 
associate
existing pools to known applications (i.e. CephFS, RBD, and RGW). In-use 
pools
that are not associated to an application will generate a health warning. 
Any
unassociated pools can be manually associated using the new
`ceph osd pool application enable` command. For more details see
`associate pool to application` in the documentation.

- *RGW*:

  * RGW *metadata search* backed by ElasticSearch now supports end
user requests service via RGW itself, and also supports custom
metadata fields. A query language a set of RESTful APIs were
created for users to be able to search objects by their
metadata. New APIs that allow control of custom metadata fields
were also added.
  * RGW now supports *dynamic bucket index sharding*. This has to be enabled via
the `rgw dyamic resharding` configurable. As the number of objects in a
bucket grows, RGW will 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
But it was absolutely awesome to run an osd off of an rbd after the disk
failed.

On Tue, Aug 29, 2017, 1:42 PM David Turner  wrote:

> To addend Steve's success, the rbd was created in a second cluster in the
> same datacenter so it didn't run the risk of deadlocking that mapping rbds
> on machines running osds has.  It is still theoretical to work on the same
> cluster, but more inherently dangerous for a few reasons.
>
> On Tue, Aug 29, 2017, 1:15 PM Steve Taylor 
> wrote:
>
>> Hong,
>>
>> Probably your best chance at recovering any data without special,
>> expensive, forensic procedures is to perform a dd from /dev/sdb to
>> somewhere else large enough to hold a full disk image and attempt to
>> repair that. You'll want to use 'conv=noerror' with your dd command
>> since your disk is failing. Then you could either re-attach the OSD
>> from the new source or attempt to retrieve objects from the filestore
>> on it.
>>
>> I have actually done this before by creating an RBD that matches the
>> disk size, performing the dd, running xfs_repair, and eventually
>> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
>> temporary arrangement for repair only, but I'm happy to report that it
>> worked flawlessly in my case. I was able to weight the OSD to 0,
>> offload all of its data, then remove it for a full recovery, at which
>> point I just deleted the RBD.
>>
>> The possibilities afforded by Ceph inception are endless. ☺
>>
>>
>>
>> Steve Taylor | Senior Software Engineer | StorageCraft Technology
>> Corporation
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 801.871.2799 |
>>
>> If you are not the intended recipient of this message or received it
>> erroneously, please notify the sender and delete it, together with any
>> attachments, and be advised that any dissemination or copying of this
>> message is prohibited.
>>
>>
>>
>> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
>> > Rule of thumb with batteries is:
>> > - more “proper temperature” you run them at the more life you get out
>> > of them
>> > - more battery is overpowered for your application the longer it will
>> > survive.
>> >
>> > Get your self a LSI 94** controller and use it as HBA and you will be
>> > fine. but get MORE DRIVES ! …
>> > > On 28 Aug 2017, at 23:10, hjcho616  wrote:
>> > >
>> > > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
>> > > try these out.  Car battery idea is nice!  I may try that.. =)  Do
>> > > they last longer?  Ones that fit the UPS original battery spec
>> > > didn't last very long... part of the reason why I gave up on them..
>> > > =P  My wife probably won't like the idea of car battery hanging out
>> > > though ha!
>> > >
>> > > The OSD1 (one with mostly ok OSDs, except that smart failure)
>> > > motherboard doesn't have any additional SATA connectors available.
>> > >  Would it be safe to add another OSD host?
>> > >
>> > > Regards,
>> > > Hong
>> > >
>> > >
>> > >
>> > > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz > > > mail.com> wrote:
>> > >
>> > >
>> > > Sorry for being brutal … anyway
>> > > 1. get the battery for UPS ( a car battery will do as well, I’ve
>> > > moded on ups in the past with truck battery and it was working like
>> > > a charm :D )
>> > > 2. get spare drives and put those in because your cluster CAN NOT
>> > > get out of error due to lack of space
>> > > 3. Follow advice of Ronny Aasen on hot to recover data from hard
>> > > drives
>> > > 4 get cooling to drives or you will loose more !
>> > >
>> > >
>> > > > On 28 Aug 2017, at 22:39, hjcho616  wrote:
>> > > >
>> > > > Tomasz,
>> > > >
>> > > > Those machines are behind a surge protector.  Doesn't appear to
>> > > > be a good one!  I do have a UPS... but it is my fault... no
>> > > > battery.  Power was pretty reliable for a while... and UPS was
>> > > > just beeping every chance it had, disrupting some sleep.. =P  So
>> > > > running on surge protector only.  I am running this in home
>> > > > environment.   So far, HDD failures have been very rare for this
>> > > > environment. =)  It just doesn't get loaded as much!  I am not
>> > > > sure what to expect, seeing that "unfound" and just a feeling of
>> > > > possibility of maybe getting OSD back made me excited about it.
>> > > > =) Thanks for letting me know what should be the priority.  I
>> > > > just lack experience and knowledge in this. =) Please do continue
>> > > > to guide me though this.
>> > > >
>> > > > Thank you for the decode of that smart messages!  I do agree that
>> > > > looks like it is on its way out.  I would like to know how to get
>> > > > good portion of it back if possible. =)
>> > > >
>> > > > I think I just set the size and min_size to 1.
>> > > > # ceph osd lspools
>> > > > 0 data,1 metadata,2 rbd,
>> > > > # ceph osd pool set rbd size 1
>> > > > set pool 2 size to 1
>> > > > 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
To addend Steve's success, the rbd was created in a second cluster in the
same datacenter so it didn't run the risk of deadlocking that mapping rbds
on machines running osds has.  It is still theoretical to work on the same
cluster, but more inherently dangerous for a few reasons.

On Tue, Aug 29, 2017, 1:15 PM Steve Taylor 
wrote:

> Hong,
>
> Probably your best chance at recovering any data without special,
> expensive, forensic procedures is to perform a dd from /dev/sdb to
> somewhere else large enough to hold a full disk image and attempt to
> repair that. You'll want to use 'conv=noerror' with your dd command
> since your disk is failing. Then you could either re-attach the OSD
> from the new source or attempt to retrieve objects from the filestore
> on it.
>
> I have actually done this before by creating an RBD that matches the
> disk size, performing the dd, running xfs_repair, and eventually
> adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
> temporary arrangement for repair only, but I'm happy to report that it
> worked flawlessly in my case. I was able to weight the OSD to 0,
> offload all of its data, then remove it for a full recovery, at which
> point I just deleted the RBD.
>
> The possibilities afforded by Ceph inception are endless. ☺
>
>
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
>
>
> On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
> > Rule of thumb with batteries is:
> > - more “proper temperature” you run them at the more life you get out
> > of them
> > - more battery is overpowered for your application the longer it will
> > survive.
> >
> > Get your self a LSI 94** controller and use it as HBA and you will be
> > fine. but get MORE DRIVES ! …
> > > On 28 Aug 2017, at 23:10, hjcho616  wrote:
> > >
> > > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
> > > try these out.  Car battery idea is nice!  I may try that.. =)  Do
> > > they last longer?  Ones that fit the UPS original battery spec
> > > didn't last very long... part of the reason why I gave up on them..
> > > =P  My wife probably won't like the idea of car battery hanging out
> > > though ha!
> > >
> > > The OSD1 (one with mostly ok OSDs, except that smart failure)
> > > motherboard doesn't have any additional SATA connectors available.
> > >  Would it be safe to add another OSD host?
> > >
> > > Regards,
> > > Hong
> > >
> > >
> > >
> > > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz  > > mail.com> wrote:
> > >
> > >
> > > Sorry for being brutal … anyway
> > > 1. get the battery for UPS ( a car battery will do as well, I’ve
> > > moded on ups in the past with truck battery and it was working like
> > > a charm :D )
> > > 2. get spare drives and put those in because your cluster CAN NOT
> > > get out of error due to lack of space
> > > 3. Follow advice of Ronny Aasen on hot to recover data from hard
> > > drives
> > > 4 get cooling to drives or you will loose more !
> > >
> > >
> > > > On 28 Aug 2017, at 22:39, hjcho616  wrote:
> > > >
> > > > Tomasz,
> > > >
> > > > Those machines are behind a surge protector.  Doesn't appear to
> > > > be a good one!  I do have a UPS... but it is my fault... no
> > > > battery.  Power was pretty reliable for a while... and UPS was
> > > > just beeping every chance it had, disrupting some sleep.. =P  So
> > > > running on surge protector only.  I am running this in home
> > > > environment.   So far, HDD failures have been very rare for this
> > > > environment. =)  It just doesn't get loaded as much!  I am not
> > > > sure what to expect, seeing that "unfound" and just a feeling of
> > > > possibility of maybe getting OSD back made me excited about it.
> > > > =) Thanks for letting me know what should be the priority.  I
> > > > just lack experience and knowledge in this. =) Please do continue
> > > > to guide me though this.
> > > >
> > > > Thank you for the decode of that smart messages!  I do agree that
> > > > looks like it is on its way out.  I would like to know how to get
> > > > good portion of it back if possible. =)
> > > >
> > > > I think I just set the size and min_size to 1.
> > > > # ceph osd lspools
> > > > 0 data,1 metadata,2 rbd,
> > > > # ceph osd pool set rbd size 1
> > > > set pool 2 size to 1
> > > > # ceph osd pool set rbd min_size 1
> > > > set pool 2 min_size to 1
> > > >
> > > > Seems to be doing some backfilling work.
> > > >
> > > > # ceph health
> > > > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
> > > > pgs backfill_toofull; 74 pgs 

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Steve Taylor
Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.

I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
> Rule of thumb with batteries is:
> - more “proper temperature” you run them at the more life you get out
> of them
> - more battery is overpowered for your application the longer it will
> survive. 
> 
> Get your self a LSI 94** controller and use it as HBA and you will be
> fine. but get MORE DRIVES ! … 
> > On 28 Aug 2017, at 23:10, hjcho616  wrote:
> > 
> > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
> > try these out.  Car battery idea is nice!  I may try that.. =)  Do
> > they last longer?  Ones that fit the UPS original battery spec
> > didn't last very long... part of the reason why I gave up on them..
> > =P  My wife probably won't like the idea of car battery hanging out
> > though ha!
> > 
> > The OSD1 (one with mostly ok OSDs, except that smart failure)
> > motherboard doesn't have any additional SATA connectors available.
> >  Would it be safe to add another OSD host?
> > 
> > Regards,
> > Hong
> > 
> > 
> > 
> > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz  > mail.com> wrote:
> > 
> > 
> > Sorry for being brutal … anyway 
> > 1. get the battery for UPS ( a car battery will do as well, I’ve
> > moded on ups in the past with truck battery and it was working like
> > a charm :D )
> > 2. get spare drives and put those in because your cluster CAN NOT
> > get out of error due to lack of space
> > 3. Follow advice of Ronny Aasen on hot to recover data from hard
> > drives 
> > 4 get cooling to drives or you will loose more ! 
> > 
> > 
> > > On 28 Aug 2017, at 22:39, hjcho616  wrote:
> > > 
> > > Tomasz,
> > > 
> > > Those machines are behind a surge protector.  Doesn't appear to
> > > be a good one!  I do have a UPS... but it is my fault... no
> > > battery.  Power was pretty reliable for a while... and UPS was
> > > just beeping every chance it had, disrupting some sleep.. =P  So
> > > running on surge protector only.  I am running this in home
> > > environment.   So far, HDD failures have been very rare for this
> > > environment. =)  It just doesn't get loaded as much!  I am not
> > > sure what to expect, seeing that "unfound" and just a feeling of
> > > possibility of maybe getting OSD back made me excited about it.
> > > =) Thanks for letting me know what should be the priority.  I
> > > just lack experience and knowledge in this. =) Please do continue
> > > to guide me though this. 
> > > 
> > > Thank you for the decode of that smart messages!  I do agree that
> > > looks like it is on its way out.  I would like to know how to get
> > > good portion of it back if possible. =)
> > > 
> > > I think I just set the size and min_size to 1.
> > > # ceph osd lspools
> > > 0 data,1 metadata,2 rbd,
> > > # ceph osd pool set rbd size 1
> > > set pool 2 size to 1
> > > # ceph osd pool set rbd min_size 1
> > > set pool 2 min_size to 1
> > > 
> > > Seems to be doing some backfilling work.
> > > 
> > > # ceph health
> > > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
> > > pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling;
> > > 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;
> > > 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs
> > > stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101
> > > pgs stuck undersized; 101 pgs undersized; 1 requests are blocked
> > > > 32 sec; recovery 1790657/4502340 objects degraded (39.772%);
> > > recovery 641906/4502340 objects misplaced (14.257%); recovery
> > > 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is
> > > degraded; no legacy OSD 

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
So nobody has any clue on this one ??? 

Should I go with this one to dev mailing list ?

> On 27 Aug 2017, at 01:49, Tomasz Kusmierz  wrote:
> 
> Hi,
> for purposes of experimenting I’m running a home cluster that consists of 
> single node and 4 OSD (weights in crush map are true to actual hdd size). I 
> prefer to test all new stuff on home equipment before getting egg in the face 
> at work :)
> Anyway recently I’ve upgrade to Luminous, and replaced my ancient 8x 2TB 
> drives with 2x 8TB drives (with hopes of getting more in near future). While 
> doing that I’ve converted everything to bluestore. while still on 12.1.1
> 
> Everything was running smooth and performance was good (for ceph).
> 
> I’ve decided to upgrade recently to 12.1.2 and this is where everything 
> started acting up. I’m aware that 
> - single node cluster is not a cluster
> - in the end I might need more OSD (old joke right ?)
> - I need to switch from spinning rust to SSD 
> 
> Before upgrade my “cluster” was only switching to WRN only when I was pumping 
> a lot of data into it and it would just come up with “slow requests” stuff. 
> Now while completely static, not doing anything (no read, no write) OSD’s are 
> committing suicide due to timeout, also before they will commit suicide I 
> can’t actually access data from cluster, which make me think that while 
> performing a scrub those are unaccessible. Bellow I’ll attach a log excerpt 
> just please notice that it happens on deep scrub and normal scrub as well.
> 
> After I’ve discovered that I’ve tried to play around with sysctl.conf and 
> with ceph.conf ( up to this point sysctl.conf was stock, and ceph.conf was 
> just adjusted to allow greater OSD full capacity and disable cephx to speed 
> it up)
> 
> also I’m running 3 pools on top of this cluster (all three have size = 2 
> min_size = 2):
> cephfs_data pg=256 (99.99% of data used in cluster)
> cephfs_metadata pg=4 (0.01% of data used in cluster)
> rbd pg=8 but this pool contains no data and I’m considering removing it since 
> in my use case I’ve got nothing for it.
> 
> Please note that while this logs were produced cephFS was not even mounted :/
> 
> 
> 
> FYI hardware is old and trusted hp proliant DL180 G6 with 2 xeons @2.2GHz 
> giving 16 cores and 32GB or ECC ram and LSI in HBA mode (2x 6GB SAS)
> 
> 
> 
> ( 
> As a side issue could somebody explain to my why with bluestore that was 
> supposed to cure cancer write performance still sucks ? I know that filestore 
> did suffer from writing everything multiple times to same drive, and I did 
> experience this first hand when after exhausting journals it was just dead 
> slow, but now while within same host in my current configuration it keeps 
> choking [flaps 70MB/s -> 10 MB/s -> 70MB/s] and I never seen it even approach 
> speed of single slowest drive. This server is not a speed daemon, I know, but 
> when performing a simultaneous read / write for those drives I was getting 
> around 760MB/s sequential R/W speed. 
> Right now I’m struggling to comprehend where the bottleneck is while 
> performing operations within same host ?! network should not be an issue 
> (correct me if I’m wrong here), dumping a singular blob into pool should 
> produce a nice long sequence of object placed into drives … 
> I’m just puzzled why ceph will not exceed combined 40MB/s while still 
> switching “cluster” into warning state due to “slow responses” 
> 2017-08-24 20:49:34.457191 osd.8 osd.8 192.168.1.240:6814/3393 503 : cluster 
> [WRN] slow request 63.878717 seconds old, received at 2017-08-24 
> 20:48:30.578398: osd_op(client.994130.1:13659 1.9700016d 
> 1:b68000e9:::10ffeef.0068:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457195 osd.8 osd.8 192.168.1.240:6814/3393 504 : cluster 
> [WRN] slow request 64.177858 seconds old, received at 2017-08-24 
> 20:48:30.279257: osd_op(client.994130.1:13568 1.b95e13a4 
> 1:25c87a9d:::10ffeef.000d:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457198 osd.8 osd.8 192.168.1.240:6814/3393 505 : cluster 
> [WRN] slow request 64.002653 seconds old, received at 2017-08-24 
> 20:48:30.454463: osd_op(client.994130.1:13626 1.b426420e 
> 1:7042642d:::10ffeef.0047:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457200 osd.8 osd.8 192.168.1.240:6814/3393 506 : cluster 
> [WRN] slow request 63.873519 seconds old, received at 2017-08-24 
> 20:48:30.583596: osd_op(client.994130.1:13661 1.31551a8 
> 1:158aa8c0:::10ffeef.006a:head [write 0~4194304 [1@-1]] snapc 1=[] 
> ondisk+write+known_if_redirected e4306) currently waiting for active
> 2017-08-24 20:49:34.457206 osd.8 osd.8 192.168.1.240:6814/3393 507 : cluster 
> [WRN] slow request 

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-29 Thread Orit Wasserman
Hi David,

On Mon, Aug 28, 2017 at 8:33 PM, David Turner  wrote:

> The vast majority of the sync error list is "failed to sync bucket
> instance: (16) Device or resource busy".  I can't find anything on Google
> about this error message in relation to Ceph.  Does anyone have any idea
> what this means? and/or how to fix it?
>

Those are intermediate errors resulting from several radosgw trying to
acquire the same sync log shard lease. It doesn't effect the sync progress.
Are there any other errors?

Orit

>
> On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley  wrote:
>
>> Hi David,
>>
>> The 'data sync init' command won't touch any actual object data, no.
>> Resetting the data sync status will just cause a zone to restart a full
>> sync of the --source-zone's data changes log. This log only lists which
>> buckets/shards have changes in them, which causes radosgw to consider them
>> for bucket sync. So while the command may silence the warnings about data
>> shards being behind, it's unlikely to resolve the issue with missing
>> objects in those buckets.
>>
>> When data sync is behind for an extended period of time, it's usually
>> because it's stuck retrying previous bucket sync failures. The 'sync error
>> list' may help narrow down where those failures are.
>>
>> There is also a 'bucket sync init' command to clear the bucket sync
>> status. Following that with a 'bucket sync run' should restart a full sync
>> on the bucket, pulling in any new objects that are present on the
>> source-zone. I'm afraid that those commands haven't seen a lot of polish or
>> testing, however.
>>
>> Casey
>>
>> On 08/24/2017 04:15 PM, David Turner wrote:
>>
>> Apparently the data shards that are behind go in both directions, but
>> only one zone is aware of the problem.  Each cluster has objects in their
>> data pool that the other doesn't have.  I'm thinking about initiating a
>> `data sync init` on both sides (one at a time) to get them back on the same
>> page.  Does anyone know if that command will overwrite any local data that
>> the zone has that the other doesn't if you run `data sync init` on it?
>>
>> On Thu, Aug 24, 2017 at 1:51 PM David Turner 
>> wrote:
>>
>>> After restarting the 2 RGW daemons on the second site again, everything
>>> caught up on the metadata sync.  Is there something about having 2 RGW
>>> daemons on each side of the multisite that might be causing an issue with
>>> the sync getting stale?  I have another realm set up the same way that is
>>> having a hard time with its data shards being behind.  I haven't told them
>>> to resync, but yesterday I noticed 90 shards were behind.  It's caught back
>>> up to only 17 shards behind, but the oldest change not applied is 2 months
>>> old and no order of restarting RGW daemons is helping to resolve this.
>>>
>>> On Thu, Aug 24, 2017 at 10:59 AM David Turner 
>>> wrote:
>>>
 I have a RGW Multisite 10.2.7 set up for bi-directional syncing.  This
 has been operational for 5 months and working fine.  I recently created a
 new user on the master zone, used that user to create a bucket, and put in
 a public-acl object in there.  The Bucket created on the second site, but
 the user did not and the object errors out complaining about the access_key
 not existing.

 That led me to think that the metadata isn't syncing, while bucket and
 data both are.  I've also confirmed that data is syncing for other buckets
 as well in both directions. The sync status from the second site was this.


1.

  metadata sync syncing

2.

full sync: 0/64 shards

3.

incremental sync: 64/64 shards

4.

metadata is caught up with master

5.

  data sync source: f4c12327-4721-47c9-a365-86332d84c227 
 (public-atl01)

6.

syncing

7.

full sync: 0/128 shards

8.

incremental sync: 128/128 shards

9.

data is caught up with source



 Sync status leads me to think that the second site believes it is up to
 date, even though it is missing a freshly created user.  I restarted all of
 the rgw daemons for the zonegroup, but it didn't trigger anything to fix
 the missing user in the second site.  I did some googling and found the
 sync init commands mentioned in a few ML posts and used metadata sync init
 and now have this as the sync status.


1.

  metadata sync preparing for full sync

2.

full sync: 64/64 shards

3.

full sync: 0 entries to sync

4.

Re: [ceph-users] Grafana Dasboard

2017-08-29 Thread Félix Barbeira
Hi,

You can check the official site: https://grafana.com/dashboards?search=ceph

2017-08-29 3:08 GMT+02:00 Shravana Kumar.S :

> All,
> I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to
> collect the metrics and influxDB to store the value.
>
> Anyone is having the dashboard json file.
>
> Thanks,
> Saravans
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-29 Thread Marc Roos
 
nfs-ganesha-2.5.2-.el7.x86_64.rpm 
 ^
Is this correct?

-Original Message-
From: Marc Roos 
Sent: dinsdag 29 augustus 2017 11:40
To: amaredia; wooertim
Cc: ceph-users
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

 
Ali, Very very nice! I was creating the rpm's based on a old rpm source 
spec. And it was a hastle to get them to build, and I am not sure if I 
even used to correct compile settings.



-Original Message-
From: Ali Maredia [mailto:amare...@redhat.com]
Sent: maandag 28 augustus 2017 22:29
To: TYLin
Cc: Marc Roos; ceph-us...@ceph.com
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

Marc,

These rpms (and debs) are built with the latest ganesha 2.5 stable 
release and the latest luminous release on download.ceph.com:

http://download.ceph.com/nfs-ganesha/

I just put them up late last week, and I will be maintaining them in the 
future.

-Ali

- Original Message -
> From: "TYLin" 
> To: "Marc Roos" 
> Cc: ceph-us...@ceph.com
> Sent: Sunday, August 20, 2017 11:58:05 PM
> Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7
> 
> You can get rpm from here
> 
> https://download.gluster.org/pub/gluster/glusterfs/nfs-ganesha/old/2.3
> .0/CentOS/nfs-ganesha.repo
> 
> You have to fix the path mismatch error in the repo file manually.
> 
> > On Aug 20, 2017, at 5:38 AM, Marc Roos 
wrote:
> > 
> > 
> > 
> > Where can you get the nfs-ganesha-ceph rpm? Is there a repository 
> > that has these?
> > 
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-29 Thread Marc Roos
 
Ali, Very very nice! I was creating the rpm's based on a old rpm source 
spec. And it was a hastle to get them to build, and I am not sure if I 
even used to correct compile settings.



-Original Message-
From: Ali Maredia [mailto:amare...@redhat.com] 
Sent: maandag 28 augustus 2017 22:29
To: TYLin
Cc: Marc Roos; ceph-us...@ceph.com
Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

Marc,

These rpms (and debs) are built with the latest ganesha 2.5 stable 
release and the latest luminous release on download.ceph.com:

http://download.ceph.com/nfs-ganesha/

I just put them up late last week, and I will be maintaining them in the 
future.

-Ali

- Original Message -
> From: "TYLin" 
> To: "Marc Roos" 
> Cc: ceph-us...@ceph.com
> Sent: Sunday, August 20, 2017 11:58:05 PM
> Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7
> 
> You can get rpm from here
> 
> https://download.gluster.org/pub/gluster/glusterfs/nfs-ganesha/old/2.3
> .0/CentOS/nfs-ganesha.repo
> 
> You have to fix the path mismatch error in the repo file manually.
> 
> > On Aug 20, 2017, at 5:38 AM, Marc Roos  
wrote:
> > 
> > 
> > 
> > Where can you get the nfs-ganesha-ceph rpm? Is there a repository 
> > that has these?
> > 
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: mount fs - single posing of failure

2017-08-29 Thread Oscar Segarra
Hi, thanks a lot...
 I apologize for my simple question!

El 29 ago. 2017 1:38, "Michael Kuriger"  escribió:

> I use automount (/etc/auto.auto)
>
>
>
> Example:
>
> ceph-fstype=ceph,name=admin,secretfile=/etc/ceph/admin.secret,noatime
>   10.1.40.11,10.1.40.12,10.1.40.13:/cephfs1
>
>
>
>
>
>
>
>
>
> *From: *ceph-users  on behalf of LOPEZ
> Jean-Charles 
> *Date: *Monday, August 28, 2017 at 3:40 PM
> *To: *Oscar Segarra 
> *Cc: *"ceph-users@lists.ceph.com" 
> *Subject: *Re: [ceph-users] CephFS: mount fs - single posing of failure
>
>
>
> Hi Oscar,
>
>
>
> the mount command accepts multiple MON addresses.
>
>
>
> mount -t ceph monhost1,monhost2,monhost3:/ /mnt/foo
>
>
>
> If not specified the port by default is 6789.
>
>
>
> JC
>
>
>
> On Aug 28, 2017, at 13:54, Oscar Segarra  wrote:
>
>
>
> Hi,
>
>
>
> In Ceph, by design there is no single point of failure I  terms of server
> roles, nevertheless, from the client point of view, it might exist.
>
>
>
> In my environment:
>
> Mon1: 192.168.100.101:6789
> 
>
> Mon2: 192.168.100.102:6789
> 
>
> Mon3: 192.168.100.103:6789
> 
>
>
>
> Client: 192.168.100.104
>
>
>
> I have created a line in /etc/fstab referencing Mon1 but, of course, if
> Mon1 fails, the mount point gets stuck.
>
>
>
> I'd like to create a vip assigned to any host with tcp port 6789 UP and,
> in the client, mount the CephFS using that VIP.
>
>
>
> Is there any way to achieve this?
>
>
>
> Thanks a lot in advance!
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of play for RDMA on Luminous

2017-08-29 Thread Haomai Wang
On Tue, Aug 29, 2017 at 12:01 AM, Florian Haas  wrote:
> Sorry, I worded my questions poorly in the last email, so I'm asking
> for clarification here:
>
> On Mon, Aug 28, 2017 at 6:04 PM, Haomai Wang  wrote:
>> On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas  wrote:
>>> On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang  wrote:
 On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas  wrote:
> Hello everyone,
>
> I'm trying to get a handle on the current state of the async messenger's
> RDMA transport in Luminous, and I've noticed that the information
> available is a little bit sparse (I've found
> https://community.mellanox.com/docs/DOC-2693 and
> https://community.mellanox.com/docs/DOC-2721, which are a great start
> but don't look very complete). So I'm kicking off this thread that might
> hopefully bring interested parties and developers together.
>
> Could someone in the know please confirm that the following assumptions
> of mine are accurate:
>
> - RDMA support for the async messenger is available in Luminous.

 to be precious, rdma in luminous is available but lack of memory
 control when under pressure. it would be ok to run for test purpose.
>>>
>>> OK, thanks! Assuming async+rdma will become fully supported some time
>>> in the next release or two, are there plans to backport async+rdma
>>> related features to Luminous? Or will users likely need to wait for
>>> the next release to get a production-grade Ceph/RDMA stack?
>>
>> I think so
>
> OK, so just to clarify:
>
> (1) production RDMA support *will* be in the next LTS. Correct?
>
> (2) Users should *not* expect production RDMA support in any Luminous
> point release. Correct?

from my view, we still contribute to a stable rdma impl. I can't make
a promise about this.

>
> - You enable it globally by setting ms_type to "async+rdma", and by
> setting appropriate values for the various ms_async_rdma* options (most
> importantly, ms_async_rdma_device_name).
>
> - You can also set RDMA messaging just for the public or cluster
> network, via ms_public_type and ms_cluster_type.
>
> - Users have to make a global async+rdma vs. async+posix decision on
> either network. For example, if either ms_type or ms_public_type is
> configured to async+rdma on cluster nodes, then a client configured with
> ms_type = async+posix can't communicate.
>
> Based on those assumptions, I have the following questions:
>
> - What is the current state of RDMA support in kernel libceph? In other
> words, is there currently a way to map RBDs, or mount CephFS, if a Ceph
> cluster uses RDMA messaging?

 no planning on kernel side so far. rbd-nbd, cephfs-fuse should be 
 supported now.
>>>
>>> Understood — are there plans to support async+rdma in the kernel at
>>> all, or is there something in the kernel that precludes this?
>>
>> no.
>
> Do you mean:
>
> (1) There is a plan to support async+rdma in the kernel client eventually, or
>
> (2) There are no plans to bring async+rdma support to the kernel
> client, even though it *would* be possible to implement from a kernel
> perspective, or
>
> (3) There are no plans to bring async+rdma support to the kernel
> client *because* something deep in the kernel prevents it?

yes, async is totoaly a userspace impl. kernel client need to make use
of existing kernel infra

>
> Thanks again!
>
> Cheers,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] State of play for RDMA on Luminous

2017-08-29 Thread Florian Haas
Sorry, I worded my questions poorly in the last email, so I'm asking
for clarification here:

On Mon, Aug 28, 2017 at 6:04 PM, Haomai Wang  wrote:
> On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas  wrote:
>> On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang  wrote:
>>> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas  wrote:
 Hello everyone,

 I'm trying to get a handle on the current state of the async messenger's
 RDMA transport in Luminous, and I've noticed that the information
 available is a little bit sparse (I've found
 https://community.mellanox.com/docs/DOC-2693 and
 https://community.mellanox.com/docs/DOC-2721, which are a great start
 but don't look very complete). So I'm kicking off this thread that might
 hopefully bring interested parties and developers together.

 Could someone in the know please confirm that the following assumptions
 of mine are accurate:

 - RDMA support for the async messenger is available in Luminous.
>>>
>>> to be precious, rdma in luminous is available but lack of memory
>>> control when under pressure. it would be ok to run for test purpose.
>>
>> OK, thanks! Assuming async+rdma will become fully supported some time
>> in the next release or two, are there plans to backport async+rdma
>> related features to Luminous? Or will users likely need to wait for
>> the next release to get a production-grade Ceph/RDMA stack?
>
> I think so

OK, so just to clarify:

(1) production RDMA support *will* be in the next LTS. Correct?

(2) Users should *not* expect production RDMA support in any Luminous
point release. Correct?

 - You enable it globally by setting ms_type to "async+rdma", and by
 setting appropriate values for the various ms_async_rdma* options (most
 importantly, ms_async_rdma_device_name).

 - You can also set RDMA messaging just for the public or cluster
 network, via ms_public_type and ms_cluster_type.

 - Users have to make a global async+rdma vs. async+posix decision on
 either network. For example, if either ms_type or ms_public_type is
 configured to async+rdma on cluster nodes, then a client configured with
 ms_type = async+posix can't communicate.

 Based on those assumptions, I have the following questions:

 - What is the current state of RDMA support in kernel libceph? In other
 words, is there currently a way to map RBDs, or mount CephFS, if a Ceph
 cluster uses RDMA messaging?
>>>
>>> no planning on kernel side so far. rbd-nbd, cephfs-fuse should be supported 
>>> now.
>>
>> Understood — are there plans to support async+rdma in the kernel at
>> all, or is there something in the kernel that precludes this?
>
> no.

Do you mean:

(1) There is a plan to support async+rdma in the kernel client eventually, or

(2) There are no plans to bring async+rdma support to the kernel
client, even though it *would* be possible to implement from a kernel
perspective, or

(3) There are no plans to bring async+rdma support to the kernel
client *because* something deep in the kernel prevents it?

Thanks again!

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com