[ceph-users] One Mon out of Quorum

2020-01-12 Thread nokia ceph
Hi, When installing Nautilus on a five node cluster, we tried to install one node first and then the remaining four nodes. After that we saw that the fifth node is out of quorum and we found that the fsid was different in 5th node. When we replaced the ceph.conf file from the four nodes to the

[ceph-users] rados_ioctx_selfmanaged_snap_set_write_ctx examples

2019-12-02 Thread nokia ceph
Hi Team, We would like to create multiple snapshots inside ceph cluster, initiate the request from librados client and came across this rados api rados_ioctx_selfmanaged_snap_set_write_ctx Can some give us sample code on how to use this api . Thanks, Muthu

[ceph-users] Ceph osd's crashing repeatedly

2019-11-13 Thread nokia ceph
Hi, We have upgraded a 5 node ceph cluster from Luminous to Nautilus and the cluster was running fine. Yesterday when we tried to add one more osd into the ceph cluster we find that the OSD is created in the cluster but suddenly some of the other OSD's started to crash and we are not able to

[ceph-users] Ceph Osd operation slow

2019-11-12 Thread nokia ceph
Hi Team, In one of our ceph cluster we observe that there are many slow IOPS in all our OSD's and most of the latency is happening between two set of operations which are shown below. { "time": "2019-11-12 08:29:58.128669",

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-10 Thread nokia ceph
Nov 8 23:39:32 UTC 2018", "kernel_version": "3.10.0-957.el7.x86_64", "mem_swap_kb": "0", "mem_total_kb": "272036636", "network_numa_unknown_ifaces": "dss-client,dss-private", "objectstore_nu

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread nokia ceph
-mon 1/5' injectargs: cn1.chn8be1c1.cdn ~# ceph daemon /var/run/ceph/ceph-mon.cn1.asok config show|grep debug_mon "debug_mon": "1/5", "debug_monc": "0/0", On Sun, Nov 10, 2019 at 11:05 AM huang jun wrote: > good, please send me the mon and o

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread nokia ceph
: > The mon log shows that the all mismatch fsid osds are from node > 10.50.11.45, > maybe that the fith node? > BTW i don't found the osd.0 boot message in ceph-mon.log > do you set debug_mon=20 first and then restart osd.0 process, and make > sure the osd.0 is restarted. > > > no

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-09 Thread nokia ceph
Hi, Please find the ceph osd tree output in the pastebin https://pastebin.com/Gn93rE6w On Fri, Nov 8, 2019 at 7:58 PM huang jun wrote: > can you post your 'ceph osd tree' in pastebin? > do you mean the osds report fsid mismatch is from old removed nodes? > > nokia ceph 于2019年11月8日

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
0ef4479) > the osd boot will be ignored if the fsid mismatch > what do you do before this happen? > > nokia ceph 于2019年11月8日周五 下午8:29写道: > > > > Hi, > > > > Please find the osd.0 which is restarted after the debug_mon is > increased to 20. > > > > c

Re: [ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
sds in 'ceph osd tree', and to see > what happened? > > nokia ceph 于2019年11月8日周五 下午6:24写道: > > > > Adding my official mail id > > > > -- Forwarded message - > > From: nokia ceph > > Date: Fri, Nov 8, 2019 at 3:57 PM > > Subje

[ceph-users] Fwd: OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Adding my official mail id -- Forwarded message - From: nokia ceph Date: Fri, Nov 8, 2019 at 3:57 PM Subject: OSD's not coming up in Nautilus To: Ceph Users Hi Team, There is one 5 node ceph cluster which we have upgraded from Luminous to Nautilus and everything was going

[ceph-users] OSD's not coming up in Nautilus

2019-11-08 Thread nokia ceph
Hi Team, There is one 5 node ceph cluster which we have upgraded from Luminous to Nautilus and everything was going well until yesterday when we noticed that the ceph osd's are marked down and not recognized by the monitors as running eventhough the osd processes are running. We noticed that the

[ceph-users] Increase of Ceph-mon memory usage - Luminous

2019-10-16 Thread nokia ceph
Hi Team, We have noticed that memory usage of ceph-monitor processes increased by 1GB in 4 days. We monitored the ceph-monitor memory usage every minute and we can see it increases and decreases by few 100 MBs at any point; but over time, the memory usage increases. We also noticed some monitor

[ceph-users] ceph stats on the logs

2019-10-08 Thread nokia ceph
Hi Team, With default log settings , the ceph stats will be logged like cluster [INF] pgmap v30410386: 8192 pgs: 8192 active+clean; 445 TB data, 1339 TB used, 852 TB / 2191 TB avail; 188 kB/s rd, 217 MB/s wr, 1618 op/s Jewel : on mon logs Nautilus : on mgr logs Luminous : not able to view

Re: [ceph-users] Nautilus : ceph dashboard ssl not working

2019-09-24 Thread nokia ceph
board.crt > $ ceph config-key set mgr/dashboard/key -i dashboard.key > > The above commands will emit a deprecation warning that you can ignore. > > Thanks, > Ricardo Dias > > ____ > From: ceph-users on behalf of nokia > ceph > Sent

[ceph-users] Nautilus : ceph dashboard ssl not working

2019-09-16 Thread nokia ceph
Hi Team, In ceph 14.2.2 , ceph dashboard does not have set-ssl-certificate . We are trying to enable ceph dashboard and while using the ssl certificate and key , it is not working . cn5.chn5au1c1.cdn ~# ceph dashboard set-ssl-certificate -i dashboard.crt no valid command found; 10 closest

[ceph-users] multiple RESETSESSION messages

2019-09-13 Thread nokia ceph
Hi, We have a 5 node Luminous cluster on which we see multiple RESETSESSION messages for OSDs on the last node alone. 's=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=2613 cs=1 l=0).handle_connect_reply connect got RESETSESSION' We found the below fix for this issue, but not able to identify the

Re: [ceph-users] mon db change from rocksdb to leveldb

2019-08-22 Thread nokia ceph
d one, > let it sync, etc. > Still a bad idea. > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 9

[ceph-users] mon db change from rocksdb to leveldb

2019-08-21 Thread nokia ceph
Hi Team, One of our old customer had Kraken and they are going to upgrade to Luminous . In the process they also requesting for downgrade procedure. Kraken used leveldb for ceph-mon data , from luminous it changed to rocksdb , upgrade works without any issues. When we downgrade , the ceph-mon

Re: [ceph-users] bluestore write iops calculation

2019-08-06 Thread nokia ceph
On Mon, Aug 5, 2019 at 6:35 PM wrote: > > Hi Team, > > @vita...@yourcmc.ru , thank you for information and could you please > > clarify on the below quires as well, > > > > 1. Average object size we use will be 256KB to 512KB , will there be > > deferred write queue ? > > With the default

Re: [ceph-users] bluestore write iops calculation

2019-08-05 Thread nokia ceph
Hi Team, @vita...@yourcmc.ru , thank you for information and could you please clarify on the below quires as well, 1. Average object size we use will be 256KB to 512KB , will there be deferred write queue ? 2. Share the link of existing rocksdb ticket which does 2 write + syncs. 3. Any

Re: [ceph-users] details about cloning objects using librados

2019-08-02 Thread nokia ceph
Thank you Greg, it is now clear for us and the option is only available in C++ , we need to rewrite the client code with c++ . Thanks, Muthu On Fri, Aug 2, 2019 at 1:05 AM Gregory Farnum wrote: > On Wed, Jul 31, 2019 at 10:31 PM nokia ceph > wrote: > > > > Thank you Gre

[ceph-users] bluestore write iops calculation

2019-08-02 Thread nokia ceph
Hi Team, Could you please help us in understanding the write iops inside ceph cluster . There seems to be mismatch in iops between theoretical and what we see in disk status. Our platform 5 node cluster 120 OSDs, with each node having 24 disks HDD ( data, rcoksdb and rocksdb.WAL all resides in

Re: [ceph-users] details about cloning objects using librados

2019-07-31 Thread nokia ceph
use librados.h in our client to communicate with ceph cluster. Also any equivalent librados api for the command rados -p poolname Thanks, Muthu On Wed, Jul 31, 2019 at 11:13 PM Gregory Farnum wrote: > > > On Wed, Jul 31, 2019 at 1:32 AM nokia ceph > wrote: > >> Hi Greg,

Re: [ceph-users] details about cloning objects using librados

2019-07-31 Thread nokia ceph
Hi Greg, We were trying to implement this however having issues in assigning the destination object name with this api. There is a rados command "rados -p cp " , is there any librados api equivalent to this ? Thanks, Muthu On Fri, Jul 5, 2019 at 4:00 PM nokia ceph wrote: > T

Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-24 Thread nokia ceph
: > bluestore warn on legacy statfs = false > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > > On Fri, Jul 19, 2019

Re: [ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-21 Thread nokia ceph
chen > www.croit.io > Tel: +49 89 1896585 90 > > > On Fri, Jul 19, 2019 at 1:35 PM nokia ceph > wrote: > >> Hi Team, >> >> After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to >> warning state with following error >> >> cn

[ceph-users] Nautilus:14.2.2 Legacy BlueStore stats reporting detected

2019-07-19 Thread nokia ceph
Hi Team, After upgrading our cluster from 14.2.1 to 14.2.2 , the cluster moved to warning state with following error cn1.chn6m1c1ru1c1.cdn ~# ceph status cluster: id: e9afb5f3-4acf-421a-8ae6-caaf328ef888 health: HEALTH_WARN Legacy BlueStore stats reporting detected on

Re: [ceph-users] details about cloning objects using librados

2019-07-05 Thread nokia ceph
vise flags we have in various > places that let you specify things like not to cache the data. > Probably leave them unset. > > -Greg > > > > On Wed, Jul 3, 2019 at 2:47 AM nokia ceph > wrote: > > > > Hi Greg, > > > > Can you please share the api deta

Re: [ceph-users] details about cloning objects using librados

2019-07-03 Thread nokia ceph
ect class will still need to > > > connect to the relevant primary osd and send the write (presumably in > > > some situations though this will be the same machine). > > > > > > On Tue, Jul 2, 2019 at 4:08 PM nokia ceph > wrote: > > > > >

Re: [ceph-users] details about cloning objects using librados

2019-07-02 Thread nokia ceph
oes this by default. For each replicated pool, you can set > the 'size' which is the number of copies you want Ceph to maintain. The > accepted norm for replicas is 3, but you can set it higher if you want to > incur the performance penalty. > > On Mon, Jul 1, 2019, 6:01 AM nokia ceph w

Re: [ceph-users] details about cloning objects using librados

2019-07-01 Thread nokia ceph
will clone/copy multiple objects and stores inside the cluster. Thanks, Muthu On Fri, Jun 28, 2019 at 9:23 AM Brad Hubbard wrote: > On Thu, Jun 27, 2019 at 8:58 PM nokia ceph > wrote: > > > > Hi Team, > > > > We have a requirement to create multiple copies o

[ceph-users] details about cloning objects using librados

2019-06-27 Thread nokia ceph
Hi Team, We have a requirement to create multiple copies of an object and currently we are handling it in client side to write as separate objects and this causes huge network traffic between client and cluster. Is there possibility of cloning an object to multiple copies using librados api?

Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-15 Thread nokia ceph
> > > > For disable deep-scrub you can use “ceph osd set nodeep-scrub” , Also you > can setup deep-scrub with threshold . > > #Start Scrub 22:00 > > osd scrub begin hour = 22 > > #Stop Scrub 8 > > osd scrub end hour = 8 > > #Scrub Load 0.5 > > osd scr

[ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread nokia ceph
Hi Team, After upgrading from Luminous to Nautilus , we see 654 pgs not deep-scrubbed in time error in ceph status . How can we disable this flag? . In our setup we disable deep-scrubbing for performance issues. Thanks, Muthu ___ ceph-users mailing

[ceph-users] Kraken - Pool storage MAX AVAIL drops by 30TB after disk failure

2019-04-11 Thread nokia ceph
Hi, We have a 5 node EC 4+1 cluster with 335 OSDs running Kraken Bluestore 11.2.0. There was a disk failure on one of the OSDs and the disk was replaced. After which it was noticed that there was a ~30TB drop in the MAX_AVAIL value for the pool storage details on output of 'ceph df' Even though

[ceph-users] ceph bluestore data cache on osd

2018-07-23 Thread nokia ceph
Hi Team, We need a mechanism to have some data cache on OSD build on bluestore . Is there an option available to enable data cache? With default configurations , OSD logs state that data cache is disabled by default, bluestore(/var/lib/ceph/osd/ceph-66) _set_cache_sizes cache_size 1073741824

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-24 Thread nokia ceph
please suggest other options which we can try. thanks, Muthu On Wed, May 23, 2018 at 4:51 PM, nokia ceph <nokiacephus...@gmail.com> wrote: > yes it is 68 disks , and will this mon_osd_reporter_subtree_level = host > have any impact on mon_osd_ min_down_reporters ? > > And r

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
which has been mentioned and discussed multiple times on the ML. > > > On Wed, May 23, 2018, 3:39 AM nokia ceph <nokiacephus...@gmail.com> wrote: > >> Hi David Turner, >> >> This is our ceph config under mon section , we have EC 4+1 and set the >> failure d

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
the osd process stops or the network comes back up. > There might be a seeing for how long an odd will try telling the mons it's > up, but this isn't really a situation I've come across after initial > testing and installation of nodes. > > On Tue, May 22, 2018, 1:47 AM nokia ceph &

[ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-21 Thread nokia ceph
Hi Ceph users, We have a cluster with 5 node (67 disks) and EC 4+1 configuration and min_size set as 4. Ceph version : 12.2.5 While executing one of our resilience usecase , making private interface down on one of the node, till kraken we saw less outage in rados (60s) . Now with luminous, we

Re: [ceph-users] Luminous : mark_unfound_lost for EC pool

2018-05-08 Thread nokia ceph
t; > 2018-05-08 9:26 GMT+02:00 nokia ceph <nokiacephus...@gmail.com>: > >> Hi Team, >> >> I was trying to forcefully lost the unfound objects using the below >> commands mentioned in the documentation , it is not working in the latest >> release , any pre

[ceph-users] Luminous : mark_unfound_lost for EC pool

2018-05-08 Thread nokia ceph
Hi Team, I was trying to forcefully lost the unfound objects using the below commands mentioned in the documentation , it is not working in the latest release , any prerequisites required for EC pool. cn1.chn6m1c1ru1c1.cdn ~# *ceph pg 4.1206 mark_unfound_lost revert|delete* -bash: delete:

Re: [ceph-users] ceph-mgr not able to modify max_misplaced in 12.2.4

2018-05-03 Thread nokia ceph
active+remapped+backfilling Thanks, Muthu On Fri, Apr 27, 2018 at 7:54 PM, John Spray <jsp...@redhat.com> wrote: > On Fri, Apr 27, 2018 at 7:03 AM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hi Team, > > > > I was trying to modify the max_

[ceph-users] ceph-mgr not able to modify max_misplaced in 12.2.4

2018-04-27 Thread nokia ceph
Hi Team, I was trying to modify the max_misplaced parameter in 12.2.4 as per documentation , however not able to modify it with following error, #ceph config set mgr mgr/balancer/max_misplaced .06 Invalid command: unused arguments: [u'.06'] config set : Set a configuration option at runtime

Re: [ceph-users] scalability new node to the existing cluster

2018-04-20 Thread nokia ceph
> > > > > >> On Apr 18, 2018, at 1:32 PM, Serkan Çoban <cobanser...@gmail.com> > wrote: > >> > >> You can add new OSDs with 0 weight and edit below script to increase > >> the osd weights instead of decreasing. > >> > >> https://gi

Re: [ceph-users] scalability new node to the existing cluster

2018-04-18 Thread nokia ceph
ser...@gmail.com> wrote: > You can add new OSDs with 0 weight and edit below script to increase > the osd weights instead of decreasing. > > https://github.com/cernceph/ceph-scripts/blob/master/ > tools/ceph-gentle-reweight > > > On Wed, Apr 18, 2018 at 2:16 PM, nokia

[ceph-users] scalability new node to the existing cluster

2018-04-18 Thread nokia ceph
Hi All, We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now we are trying to add new node with 68 disks to the cluster . We tried to add new node and created all OSDs in one go , the cluster stopped all client traffic and does only backfilling . Any procedure to add the new

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-25 Thread nokia ceph
. Thanks, Muthu On Wed, Feb 21, 2018 at 6:57 PM, Alfredo Deza <ad...@redhat.com> wrote: > > > On Tue, Feb 20, 2018 at 9:33 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > >> Hi Alfredo Deza, >> >> I understand the point between lvm and

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-20 Thread nokia ceph
. If we consider only lvm based system does this high iops because of dm-cache created for each osd?. Meanwhile i will update some graphs to show this once i have. Thanks, Muthu On Tuesday, February 20, 2018, Alfredo Deza <ad...@redhat.com> wrote: > > > On Mon, Feb 19, 2018 at 9:29

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-19 Thread nokia ceph
performance. During rocksdb compaction the situation is worse. Meanwhile we are building another platform creating osd using ceph-disk and analyse on this. Thanks, Muthu On Tuesday, February 20, 2018, Alfredo Deza <ad...@redhat.com> wrote: > > > On Mon, Feb 19, 2018 at 2:01

[ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-19 Thread nokia ceph
Hi All, We have 5 node clusters with EC 4+1 and use bluestore since last year from Kraken. Recently we migrated all our platforms to luminous 12.2.2 and finally all OSDs migrated to ceph-volume simple type and on few platforms installed ceph using ceph-volume . Now we see two times more traffic

Re: [ceph-users] Luminous : All OSDs not starting when ceph.target is started

2018-01-24 Thread nokia ceph
root 41 Jan 23 09:36 ceph-osd@12.service -> /usr/lib/systemd/system/ceph-osd@.service . . . On Mon, Jan 8, 2018 at 3:49 PM, nokia ceph <nokiacephus...@gmail.com> wrote: > Hello, > > i have installed Luminous 12.2.2 on a 5 node cluster with logical volume > OSDs. >

[ceph-users] Luminous : All OSDs not starting when ceph.target is started

2018-01-08 Thread nokia ceph
Hello, i have installed Luminous 12.2.2 on a 5 node cluster with logical volume OSDs. I am trying to stop and start ceph on one of the nodes using systemctl commands. *systemctl stop ceph.target; systemctl start ceph.target* When i stop ceph, all OSDs are stopped on the node properly. But when i

Re: [ceph-users] ceph-disk activation issue in 12.2.2

2017-12-10 Thread nokia ceph
Created tracker for this issue -- > http://tracker.ceph.com/issues/22354 Thanks Jayaram On Fri, Dec 8, 2017 at 9:49 PM, nokia ceph <nokiacephus...@gmail.com> wrote: > Hello Team, > > We aware that ceph-disk which is deprecated in 12.2.2 . As part of my > testing, I can

[ceph-users] ceph-disk activation issue in 12.2.2

2017-12-08 Thread nokia ceph
Hello Team, We aware that ceph-disk which is deprecated in 12.2.2 . As part of my testing, I can still using this ceph-disk utility for creating OSD's in 12.2.2 Here I'm getting activation error on the second hit onwards. First occurance OSD's creating without any issue.

Re: [ceph-users] upgrade from kraken 11.2.0 to 12.2.2 bluestore EC

2017-12-08 Thread nokia ceph
After that, your 24 down OSDs should come back up. > > On Fri, Dec 8, 2017 at 10:51 AM nokia ceph <nokiacephus...@gmail.com> > wrote: > >> Hello Team, >> >> I having a 5 node cluster running with kraken 11.2.0 EC 4+1. >> >> My plan is to upgrade all

[ceph-users] upgrade from kraken 11.2.0 to 12.2.2 bluestore EC

2017-12-08 Thread nokia ceph
Hello Team, I having a 5 node cluster running with kraken 11.2.0 EC 4+1. My plan is to upgrade all 5 nodes to 12.2.2 Luminous without any downtime. I tried on first node, below procedure. commented below directive from ceph.conf enable experimental unrecoverable data corrupting features =

Re: [ceph-users] ceph-volume lvm for bluestore for newer disk

2017-12-01 Thread nokia ceph
THanks brad, that got worked.. :) On Fri, Dec 1, 2017 at 12:18 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > > > On Thu, Nov 30, 2017 at 5:30 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hello, > > > > I'm following > > http://docs

[ceph-users] ceph-volume lvm for bluestore for newer disk

2017-11-29 Thread nokia ceph
Hello, I'm following http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#ceph-volume-lvm-prepare-bluestore to create new OSD's. I took the latest branch from https://shaman.ceph.com/repos/ceph/luminous/ # ceph -v ceph version 12.2.1-851-g6d9f216 What I did, formatted the device. #sgdisk

Re: [ceph-users] v12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping reques "

2017-09-20 Thread nokia ceph
;li...@kirneh.eu> wrote: > On 17-09-20 08:06, nokia ceph wrote: > > Hello, > > Env:- RHEL 7.2 , 3.10.0-327.el7.x86_64 , EC 4+1 , bluestore > > We are writing to ceph via librados C API . Testing with rados no issues. > > > The same we tested with Jewel/kraken without any

[ceph-users] v12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping reques "

2017-09-19 Thread nokia ceph
Hello, Env:- RHEL 7.2 , 3.10.0-327.el7.x86_64 , EC 4+1 , bluestore We are writing to ceph via librados C API . Testing with rados no issues. The same we tested with Jewel/kraken without any issue. Need your view how to debug this issue? >> OSD.log == ~~~ 2017-09-18 14:51:59.895746

[ceph-users] Unable to remove osd from crush map. - leads remapped pg's v11.2.0

2017-07-28 Thread nokia ceph
Hello, Recently we got an underlying issue with osd.10 which mapped to /dev/sde . So we tried to removed it from the crush === #systemctl stop ceph-osd@10.service #for x in {10..10}; do ceph osd out $x;ceph osd crush remove osd.$x;ceph auth del osd.$x;ceph osd rm osd.$x ;done marked out osd.10.

Re: [ceph-users] Ceph upgrade kraken -> luminous without deploy

2017-07-03 Thread nokia ceph
Hello, How to view latest OSD epoch value in luminous. Normally this can be found by below command. #ceph -s | grep osd or #ceph osd stat Need to know how to find this from v12.1.0 Thanks Jayaram On Sun, Jul 2, 2017 at 6:11 PM, Marc Roos wrote: > > I have updated a

Re: [ceph-users] v11.2.0 Disk activation issue while booting

2017-06-14 Thread nokia ceph
h user. You can test by > chowning the journal block device and try to start the OSD again. > > Alternatively if you want to see more information, you can start the > daemon manually as opposed to starting it through systemd and see what its > output looks like. > > On Tue, Jun 1

[ceph-users] v11.2.0 Disk activation issue while booting

2017-06-13 Thread nokia ceph
Hello, Some osd's not getting activated after a reboot operation which cause that particular osd's landing in failed state. Here you can see mount points were not getting updated to osd-num and mounted as a incorrect mount point, which caused osd. can't able to mount/activate the osd's. Env:-

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
>> /root/osd_restart_log > echo "OSD" $OSD "is down, restarting.." > OSDHOST=`ceph osd find $OSD | grep host | awk -F '"' '{print $4}'` > ssh $OSDHOST systemctl restart ceph-osd@$OSD > sleep 30 > else > echo -

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
d help us > reproduce the problem would be much appreciated! > > Mark > > On 06/08/2017 06:08 AM, nokia ceph wrote: > >> Hello Mark, >> >> Raised tracker for the issue -- http://tracker.ceph.com/issues/20222 >> >> Jake can you share the restart_OSD_and_log-

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-08 Thread nokia ceph
Hello Mark, Raised tracker for the issue -- http://tracker.ceph.com/issues/20222 Jake can you share the restart_OSD_and_log-this.sh script Thanks Jayaram On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett wrote: > Hi Mark & List, > > Unfortunately, even when using

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-31 Thread nokia ceph
PM, nokia ceph <nokiacephus...@gmail.com> wrote: > Hello Mark, > > Yes this issue happens once the test/write started after 60 secs which > correspond config value -- "threadpool_default_timeout = 60 " . Do you > require the down OSD coredump to analyse tp_osd_tp sta

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph
e , #gcore or using wallclock profiler, I'm not much aware how to use this tool. Thanks Jayaram On Tue, May 30, 2017 at 6:57 PM, Mark Nelson <mnel...@redhat.com> wrote: > On 05/30/2017 05:07 AM, nokia ceph wrote: > >> Hello Mark, >> >> I can able to reproduce

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph
Hello Mark, I can able to reproduce this problem everytime. Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64 Tested with debug bluestore = 20... >From ceph watch === 2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192 active+clean; 774 GB

[ceph-users] Troubleshooting remapped PG's + OSD flaps

2017-05-18 Thread nokia ceph
Hello, Env;- Bluestore EC 4+1 v11.2.0 RHEL7.3 16383 PG We did our resiliency testing and found OSD's keeps on flapping and cluster went to error state. What we did:- 1. we have 5 node cluster 2. poweroff/stop ceph.target on last node and waited everything seems to reach back to normal. 3.

[ceph-users] bluestore - OSD booting issue continuosly

2017-04-05 Thread nokia ceph
Hello, Env:- 11.2.0 bluestore, EC 4+1 , RHEL7.2 We are facing one OSD's booting again and again which caused the cluster crazy :( . As you can see one PG got in inconsistent state while we tried to repair that partular PG, as its primary OSD's went down. After some time we found some

Re: [ceph-users] Troubleshooting incomplete PG's

2017-04-04 Thread nokia ceph
_lost delete { data loss } Need your views on this, to how to clear the unfound issues without data loss. Thanks Jayaram On Mon, Apr 3, 2017 at 6:50 PM, Sage Weil <sw...@redhat.com> wrote: > On Fri, 31 Mar 2017, nokia ceph wrote: > > Hello Brad, > > Many thanks of t

[ceph-users] v11.2.0 OSD crashing "src/os/bluestore/KernelDevice.cc: 541: FAILED assert((uint64_t)r == len) "

2017-04-01 Thread nokia ceph
Hello, We are getting below trace on failed OSD's . Can you please explain from the below code why this issue happening. We suspect it could be because of underlying HW issue. We can't find anything from the syslogs. All the OSD disk are in healthy condition. Link :-

Re: [ceph-users] Troubleshooting incomplete PG's

2017-03-30 Thread nokia ceph
y understanding I'm aware about this command. === #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b --op remove === Awaiting for your suggestions to proceed. Thanks On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubb...@redhat.com> wrote: > > > On Thu

[ceph-users] Troubleshooting incomplete PG's

2017-03-29 Thread nokia ceph
Hello, Env:- 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2 As part of our resillency testing with kraken bluestore, we face more PG's were in incomplete+remapped state. We tried to repair each PG using "ceph pg repair " still no luck. Then we planned to remove incomplete PG's using below

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-27 Thread nokia ceph
ompiled like this. Thanks On Mon, Mar 27, 2017 at 5:04 AM, Brad Hubbard <bhubb...@redhat.com> wrote: > > > On Fri, Mar 24, 2017 at 6:49 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Brad, cool now we are on the same track :) > > > > So whatever

[ceph-users] PG Calculation query

2017-03-27 Thread nokia ceph
Hello, We are facing some performance issue with rados bench marking on a 5 node cluster with PG num 4096 vs 8192. As per the PG calculation below is our specification Size OSD % Data Targets PG count 5 340 100 100 8192 5 340 100 50 4096 With 8192 PG count we got good performance with

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-24 Thread nokia ceph
Piotr, thanks for the info. Yea this method is time saving, but we are not started testing with build from source method. We will consider this for our next part of testing :) On Fri, Mar 24, 2017 at 1:17 PM, Piotr Dałek <piotr.da...@corp.ovh.com> wrote: > On 03/23/2017 06:10 PM, n

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-24 Thread nokia ceph
Hubbard <bhubb...@redhat.com> wrote: > Oh wow, I completely misunderstood your question. > > Yes, src/osd/PG.cc and src/osd/PG.h are compiled into the ceph-osd binary > which > is included in the ceph-osd rpm as you said in your OP. > > On Fri, Mar 24, 2017 at 3:10

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
;piotr.da...@corp.ovh.com> wrote: > On 03/23/2017 02:02 PM, nokia ceph wrote: > > Hello Piotr, >> >> We do customizing ceph code for our testing purpose. It's a part of our >> R :) >> >> Recompiling source code will create 38 rpm's out of these I need to find

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
at 6:18 PM, Piotr Dałek <piotr.da...@corp.ovh.com> wrote: > On 03/23/2017 01:41 PM, nokia ceph wrote: > >> Hey brad, >> >> Thanks for the info. >> >> Yea we know that these are test rpm's. >> >> The idea behind my question is if I made an

Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-23 Thread nokia ceph
to achieve, maybe > you > could have another go at describing your objective? > > On Wed, Mar 22, 2017 at 12:26 AM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hello, > > > > I made some changes in the below file on ceph kraken v11.2.0 source code

[ceph-users] Recompiling source code - to find exact RPM

2017-03-21 Thread nokia ceph
Hello, I made some changes in the below file on ceph kraken v11.2.0 source code as per this article https://github.com/ceph/ceph-ci/commit/wip-prune-past-intervals-kraken ..src/osd/PG.cc ..src/osd/PG.h Is there any way to find which rpm got affected by these two files. I believe it should be

Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-16 Thread nokia ceph
Sounds good :), Brad many thanks for the explanation . On Thu, Mar 16, 2017 at 12:42 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > On Thu, Mar 16, 2017 at 4:33 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hello Brad, > > > > I meant for this param

Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-16 Thread nokia ceph
16, 2017 at 4:15 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hello, > > > > We are running latest kernel - 3.10.0-514.2.2.el7.x86_64 { RHEL 7.3 } > > > > Sure I will try to alter this directive - bdev_aio_max_queue_depth and > will > >

Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-16 Thread nokia ceph
t;s...@newdream.net> wrote: > On Wed, 15 Mar 2017, Brad Hubbard wrote: > > +ceph-devel > > > > On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > > Hello, > > > > > > We suspect these messages not only at the time o

Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-15 Thread nokia ceph
it retries 11* 2017-03-14 20:13:04.291160 7fee05294700 4 rocksdb: reusing log 85 from recycle list 2017-03-14 20:13:04.291254 7fee05294700 4 rocksdb: [default] New memtable created with log file: #89. Immutable memtables: 0. = Thanks On Wed, Mar 15, 2017 at 11:18 AM, nokia ceph <noki

Re: [ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-14 Thread nokia ceph
Hello, Can we get any update for this problem? Thanks On Thu, Mar 2, 2017 at 2:16 PM, nokia ceph <nokiacephus...@gmail.com> wrote: > Hello, > > Env:- v11.2.0 - bluestore - EC 3 + 1 > > We are getting below entries both in /var/log/messages and osd logs. May I >

[ceph-users] Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

2017-03-02 Thread nokia ceph
Hello, Env:- v11.2.0 - bluestore - EC 3 + 1 We are getting below entries both in /var/log/messages and osd logs. May I know what is the impact of the below message and as these message were flooded in osd and sys logs. ~~~ 2017-03-01 13:00:59.938839 7f6c96915700 -1

Re: [ceph-users] "STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY" showing in ceph -s

2017-03-01 Thread nokia ceph
have any other suggestion to how to skip this warning? Thanks On Mon, Feb 27, 2017 at 8:47 PM, Gregory Farnum <gfar...@redhat.com> wrote: > On Sun, Feb 26, 2017 at 10:41 PM, nokia ceph <nokiacephus...@gmail.com> > wrote: > > Hello, > > > > On a fresh installatio

[ceph-users] "STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY" showing in ceph -s

2017-02-26 Thread nokia ceph
Hello, On a fresh installation ceph kraken 11.2.0 , we are facing below error in the "ceph -s" output. 0 -- 10.50.62.152:0/675868622 >> 10.50.62.152:6866/13884 conn(0x7f576c002750 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect claims to be