[ceph-users] Cephalocon 2018 APAC

2018-01-12 Thread Leonardo Vaz
Hey Cephers,

As many of you know, we've just announced the Cephalocon APAC 2018
which happens on Beijing on March 22 and 23 at the JW Marriott Hotel,
as well the Call for Proposals, that is acceptig talks until January
31, 2018.

This is a project we've been working for a long time now and that we
unfortunately had to cancel in 2017 due several reasons. Fortunately
this community is made by people driven by a great passion for this
project and a brave group of volunteers in China decided to pursue
the challenge and organize the very first Cephalocon the world will
see!

Having the Cephalacon happening in China will allow us to reach the
vibrant community of contributors, users and companies which have
been helping this project to grow fast, and develop a closer
relationship with them. This will bring a lot of benefits for Ceph
project and also provide a great experience to everyone joining us at
the conference.

At the moment we have a lot of work to be done, including help the
organization team in Beijing on tasks like finding Sponsors (we have
around 70% of all sponsorship packages sold so far, but we need to work
in order to meet the goal and have the conference expenses covered),
provide the travel information (visas, flights, hotels etc) and
obviously promoting the event so we can reach more people and have a
great attendance.

The team in Beijing is working with a local company called DoIT which
created a website[2] containing details about the conference and this
information is being progressively synced to Ceph.com so we will have
a lot of changes and announcements on upcoming weeks.

If you have any questions about the conference feel free to contact me
through this email, IRC (Lvaz on OFTC and Freenode) or on any social
media channel use by Ceph. I will be happy to answer to all questions.

Kindest regards,

Leo

[1] https://ceph.com/cephalocon
[2] http://cephalocon.doit.com.cn/index_en.html

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous RGW Metadata Search

2018-01-12 Thread Yehuda Sadeh-Weinraub
The errors you're seeing there don't look like related to
elasticsearch. It's a generic radosgw related error that says that it
failed to reach the rados (ceph) backend. You can try bumping up the
messenger log (debug ms =1) and see if there's any hint in there.

Yehuda

On Fri, Jan 12, 2018 at 12:54 PM, Youzhong Yang  wrote:
> So I did the exact same thing using Kraken and the same set of VMs, no
> issue. What is the magic to make it work in Luminous? Anyone lucky enough to
> have this RGW ElasticSearch working using Luminous?
>
> On Mon, Jan 8, 2018 at 10:26 AM, Youzhong Yang  wrote:
>>
>> Hi Yehuda,
>>
>> Thanks for replying.
>>
>> >radosgw failed to connect to your ceph cluster. Does the rados command
>> >with the same connection params work?
>>
>> I am not quite sure what to do by running rados command to test.
>>
>> So I tried again, could you please take a look and check what could have
>> gone wrong?
>>
>> Here are what I did:
>>
>>  On ceph admin node, I removed installation on ceph-rgw1 and
>> ceph-rgw2, reinstalled rgw on ceph-rgw1, stoped rgw service, removed all rgw
>> pools. Elasticsearch is running on ceph-rgw2 node on port 9200.
>>
>> ceph-deploy purge ceph-rgw1
>> ceph-deploy purge ceph-rgw2
>> ceph-deploy purgedata ceph-rgw2
>> ceph-deploy purgedata ceph-rgw1
>> ceph-deploy install --release luminous ceph-rgw1
>> ceph-deploy admin ceph-rgw1
>> ceph-deploy rgw create ceph-rgw1
>> ssh ceph-rgw1 sudo systemctl stop ceph-rado...@rgw.ceph-rgw1
>> rados rmpool default.rgw.log default.rgw.log --yes-i-really-really-mean-it
>> rados rmpool default.rgw.meta default.rgw.meta
>> --yes-i-really-really-mean-it
>> rados rmpool default.rgw.control default.rgw.control
>> --yes-i-really-really-mean-it
>> rados rmpool .rgw.root .rgw.root --yes-i-really-really-mean-it
>>
>>  On ceph-rgw1 node:
>>
>> export RGWHOST="ceph-rgw1"
>> export ELASTICHOST="ceph-rgw2"
>> export REALM="demo"
>> export ZONEGRP="zone1"
>> export ZONE1="zone1-a"
>> export ZONE2="zone1-b"
>> export SYNC_AKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 |
>> head -n 1 )"
>> export SYNC_SKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 |
>> head -n 1 )"
>>
>> radosgw-admin realm create --rgw-realm=${REALM} --default
>> radosgw-admin zonegroup create --rgw-realm=${REALM}
>> --rgw-zonegroup=${ZONEGRP} --endpoints=http://${RGWHOST}:8000 --master
>> --default
>> radosgw-admin zone create --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
>> --rgw-zone=${ZONE1} --endpoints=http://${RGWHOST}:8000
>> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --master --default
>> radosgw-admin user create --uid=sync --display-name="zone sync"
>> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --system
>> radosgw-admin period update --commit
>> sudo systemctl start ceph-radosgw@rgw.${RGWHOST}
>>
>> radosgw-admin zone create --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
>> --rgw-zone=${ZONE2} --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY}
>> --endpoints=http://${RGWHOST}:8002
>> radosgw-admin zone modify --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
>> --rgw-zone=${ZONE2} --tier-type=elasticsearch
>> --tier-config=endpoint=http://${ELASTICHOST}:9200,num_replicas=1,num_shards=10
>> radosgw-admin period update --commit
>>
>> sudo systemctl restart ceph-radosgw@rgw.${RGWHOST}
>> sudo radosgw --keyring /etc/ceph/ceph.client.admin.keyring -f
>> --rgw-zone=${ZONE2} --rgw-frontends="civetweb port=8002"
>> 2018-01-08 00:21:54.389432 7f0fe9cd2e80 -1 Couldn't init storage provider
>> (RADOS)
>>
>>  As you can see, starting rgw on port 8002 failed, but rgw on port
>> 8000 was started successfully.
>>  Here are some more info which may be useful for diagnosis:
>>
>> $ cat /etc/ceph/ceph.conf
>> [global]
>> fsid = 3e5a32d4-e45e-48dd-a3c5-f6f28fef8edf
>> mon_initial_members = ceph-mon1, ceph-osd1, ceph-osd2, ceph-osd3
>> mon_host = 172.30.212.226,172.30.212.227,172.30.212.228,172.30.212.250
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> osd_pool_default_size = 2
>> osd_pool_default_min_size = 2
>> osd_pool_default_pg_num = 100
>> osd_pool_default_pgp_num = 100
>> bluestore_compression_algorithm = zlib
>> bluestore_compression_mode = force
>> rgw_max_put_size = 21474836480
>> [osd]
>> osd_max_object_size = 1073741824
>> [mon]
>> mon_allow_pool_delete = true
>> [client.rgw.ceph-rgw1]
>> host = ceph-rgw1
>> rgw frontends = civetweb port=8000
>>
>> $ wget -O - -q http://ceph-rgw2:9200/
>> {
>>   "name" : "Hippolyta",
>>   "cluster_name" : "elasticsearch",
>>   "version" : {
>> "number" : "2.3.1",
>> "build_hash" : "bd980929010aef404e7cb0843e61d0665269fc39",
>> "build_timestamp" : "2016-04-04T12:25:05Z",
>> "build_snapshot" : false,
>> "lucene_version" : "5.5.0"
>>   },
>>   "tagline" : "You Know, for Search"
>> }
>>
>> $ ceph df
>> GLOBAL:
>> SIZE AVAIL RAW USED %RAW USED
>> 719G  705G   14473M   

[ceph-users] Bluestore - possible to grow PV/LV and utilize additional space?

2018-01-12 Thread Jared Biel
Hello,

I'm wondering if it's possible to grow a volume (such as in a cloud/VM
environment) and use pvresize/lvextend to utilize the extra space in my
pool.

I am testing with the following environment:

* Running on cloud provider (Google Cloud)
* 3 nodes, 1 OSD each
* 1 storage pool with "size" of 3 (data replicated on all nodes)
* Initial disk size of 100 GB on each node, initialized as bluestore OSDs

I grew all three volumes (100 GB -> 150 GB) being used as OSDs in the
Google console. Then used pvresize/lvextend on all devices and rebooted all
nodes one-by-one. In the end, the nodes are somewhat recognizing the
additional space, but it's showing up as being utilized.

Before resize (there's ~1 GB of data in my pool):

$ ceph -s
  cluster:
id: 553ca7bd-925a-4dc5-a928-563b520842de
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03
mgr: ceph01(active), standbys: ceph02, ceph03
mds: cephfs-1/1/1 up  {0=ceph01=up:active}, 2 up:standby
osd: 3 osds: 3 up, 3 in

  data:
pools:   2 pools, 200 pgs
objects: 281 objects, 1024 MB
usage:   6316 MB used, 293 GB / 299 GB avail
pgs: 200 active+clean

After resize:

$ ceph -s
  cluster:
id: 553ca7bd-925a-4dc5-a928-563b520842de
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03
mgr: ceph01(active), standbys: ceph02, ceph03
mds: cephfs-1/1/1 up  {0=ceph02=up:active}, 2 up:standby
osd: 3 osds: 3 up, 3 in

  data:
pools:   2 pools, 200 pgs
objects: 283 objects, 1024 MB
usage:   156 GB used, 293 GB / 449 GB avail
pgs: 200 active+clean

So, after "growing" all OSDs by 50 GB (and object size remaining the same),
the new 50 GB of additional space shows up as as used space. Also, the pool
max available size stays the same.

$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
449G  293G 156G 34.70
POOLS:
NAMEID USED  %USED MAX AVAIL OBJECTS
cephfs_data 2  1024M  1.0992610M 261
cephfs_metadata 3   681k 092610M  22


I've tried searching around on the Internet and looked through
documentation to see if/how growing bluestore volume OSDs is possible and
haven't come up with anything. I'd greatly appreciate any help in this area
if anyone has experience. Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous RGW Metadata Search

2018-01-12 Thread Youzhong Yang
So I did the exact same thing using Kraken and the same set of VMs, no
issue. What is the magic to make it work in Luminous? Anyone lucky enough
to have this RGW ElasticSearch working using Luminous?

On Mon, Jan 8, 2018 at 10:26 AM, Youzhong Yang  wrote:

> Hi Yehuda,
>
> Thanks for replying.
>
> >radosgw failed to connect to your ceph cluster. Does the rados command
> >with the same connection params work?
>
> I am not quite sure what to do by running rados command to test.
>
> So I tried again, could you please take a look and check what could have
> gone wrong?
>
> Here are what I did:
>
>  On ceph admin node, I removed installation on ceph-rgw1 and
> ceph-rgw2, reinstalled rgw on ceph-rgw1, stoped rgw service, removed all
> rgw pools. Elasticsearch is running on ceph-rgw2 node on port 9200.
>
> *ceph-deploy purge ceph-rgw1*
> *ceph-deploy purge ceph-rgw2*
> *ceph-deploy purgedata ceph-rgw2*
> *ceph-deploy purgedata ceph-rgw1*
> *ceph-deploy install --release luminous ceph-rgw1*
> *ceph-deploy admin ceph-rgw1*
> *ceph-deploy rgw create ceph-rgw1*
> *ssh ceph-rgw1 sudo systemctl stop ceph-rado...@rgw.ceph-rgw1*
> *rados rmpool default.rgw.log default.rgw.log
> --yes-i-really-really-mean-it*
> *rados rmpool default.rgw.meta default.rgw.meta
> --yes-i-really-really-mean-it*
> *rados rmpool default.rgw.control default.rgw.control
> --yes-i-really-really-mean-it*
> *rados rmpool .rgw.root .rgw.root --yes-i-really-really-mean-it*
>
>  On ceph-rgw1 node:
>
> *export RGWHOST="ceph-rgw1"*
> *export ELASTICHOST="ceph-rgw2"*
> *export REALM="demo"*
> *export ZONEGRP="zone1"*
> *export ZONE1="zone1-a"*
> *export ZONE2="zone1-b"*
> *export SYNC_AKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 20 |
> head -n 1 )"*
> *export SYNC_SKEY="$( cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 40 |
> head -n 1 )"*
>
>
> *radosgw-admin realm create --rgw-realm=${REALM} --default*
> *radosgw-admin zonegroup create --rgw-realm=${REALM}
> --rgw-zonegroup=${ZONEGRP} --endpoints=http://${RGWHOST}:8000 --master
> --default*
> *radosgw-admin zone create --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
> --rgw-zone=${ZONE1} --endpoints=http://${RGWHOST}:8000
> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --master --default*
> *radosgw-admin user create --uid=sync --display-name="zone sync"
> --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY} --system*
> *radosgw-admin period update --commit*
>
> *sudo systemctl start ceph-radosgw@rgw.${RGWHOST}*
>
> *radosgw-admin zone create --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
> --rgw-zone=${ZONE2} --access-key=${SYNC_AKEY} --secret=${SYNC_SKEY}
> --endpoints=http://${RGWHOST}:8002*
> *radosgw-admin zone modify --rgw-realm=${REALM} --rgw-zonegroup=${ZONEGRP}
> --rgw-zone=${ZONE2} --tier-type=elasticsearch
> --tier-config=endpoint=http://${ELASTICHOST}:9200,num_replicas=1,num_shards=10*
> *radosgw-admin period update --commit*
>
> *sudo systemctl restart ceph-radosgw@rgw.${RGWHOST}*
>
> *sudo radosgw --keyring /etc/ceph/ceph.client.admin.keyring -f
> --rgw-zone=${ZONE2} --rgw-frontends="civetweb port=8002"*
> *2018-01-08 00:21:54.389432 7f0fe9cd2e80 -1 Couldn't init storage provider
> (RADOS)*
>
>  As you can see, starting rgw on port 8002 failed, but rgw on port
> 8000 was started successfully.
>  Here are some more info which may be useful for diagnosis:
>
> $ cat /etc/ceph/ceph.conf
> [global]
> fsid = 3e5a32d4-e45e-48dd-a3c5-f6f28fef8edf
> mon_initial_members = ceph-mon1, ceph-osd1, ceph-osd2, ceph-osd3
> mon_host = 172.30.212.226,172.30.212.227,172.30.212.228,172.30.212.250
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd_pool_default_size = 2
> osd_pool_default_min_size = 2
> osd_pool_default_pg_num = 100
> osd_pool_default_pgp_num = 100
> bluestore_compression_algorithm = zlib
> bluestore_compression_mode = force
> rgw_max_put_size = 21474836480
> [osd]
> osd_max_object_size = 1073741824
> [mon]
> mon_allow_pool_delete = true
> [client.rgw.ceph-rgw1]
> host = ceph-rgw1
> rgw frontends = civetweb port=8000
>
> $ wget -O - -q http://ceph-rgw2:9200/
> {
>   "name" : "Hippolyta",
>   "cluster_name" : "elasticsearch",
>   "version" : {
> "number" : "2.3.1",
> "build_hash" : "bd980929010aef404e7cb0843e61d0665269fc39",
> "build_timestamp" : "2016-04-04T12:25:05Z",
> "build_snapshot" : false,
> "lucene_version" : "5.5.0"
>   },
>   "tagline" : "You Know, for Search"
> }
>
> $ ceph df
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 719G  705G   14473M  1.96
> POOLS:
> NAMEID USED %USED MAX AVAIL OBJECTS
> .rgw.root   17 6035 0  333G  19
> zone1-a.rgw.control 180 0  333G   8
> zone1-a.rgw.meta19  350 0  333G   2
> zone1-a.rgw.log 20   50 0  333G 

Re: [ceph-users] Trying to increase number of PGs throws "Error E2BIG" though PGs/OSD < mon_max_pg_per_osd

2018-01-12 Thread Subhachandra Chandra
Thank you for the explanation, Brad. I will change that setting and see how
it goes.

Subhachandra

On Thu, Jan 11, 2018 at 10:38 PM, Brad Hubbard  wrote:

> On Fri, Jan 12, 2018 at 11:27 AM, Subhachandra Chandra
>  wrote:
> > Hello,
> >
> >  We are running experiments on a Ceph cluster before we move data on
> it.
> > While trying to increase the number of PGs on one of the pools it threw
> the
> > following error
> >
> > root@ctrl1:/# ceph osd pool set data pg_num 65536
> > Error E2BIG: specified pg_num 65536 is too large (creating 32768 new PGs
> on
> > ~540 OSDs exceeds per-OSD max of 32)
>
> That comes from here:
>
> https://github.com/ceph/ceph/blob/5d7813f612aea59239c8375aaa0091
> 9ae32f952f/src/mon/OSDMonitor.cc#L6027
>
> So the warning is triggered because new_pgs (65536) >
> g_conf->mon_osd_max_split_count (32) * expected_osds (540)
>
> >
> > There are 2 pools named "data" and "metadata". "data" is an erasure coded
> > pool (6,3) and "metadata" is a replicated pool with a replication factor
> of
> > 3.
> >
> > root@ctrl1:/# ceph osd lspools
> > 1 metadata,2 data,
> > root@ctrl1:/# ceph osd pool get metadata pg_num
> > pg_num: 512
> > root@ctrl1:/# ceph osd pool get data pg_num
> > pg_num: 32768
> >
> > osd: 540 osds: 540 up, 540 in
> >  flags noout,noscrub,nodeep-scrub
> >
> >   data:
> > pools:   2 pools, 33280 pgs
> > objects: 7090k objects, 1662 TB
> > usage:   2501 TB used, 1428 TB / 3929 TB avail
> > pgs: 33280 active+clean
> >
> > The current PG/OSD ratio according to my calculation should be 549
>  (32768 * 9 + 512 * 3 ) / 540.0
> > 548.97778
> >
> > Increasing the number of PGs in the "data" pool should increase the
> PG/OSD
> > ratio to about 1095
>  (65536 * 9 + 512 * 3 ) / 540.0
> > 1095.
> >
> > In the config, settings related to PG/OSD ratio look like
> > mon_max_pg_per_osd = 1500
> > osd_max_pg_per_osd_hard_ratio = 1.0
> >
> > Trying to increase the number of PGs to 65536 throws the previously
> > mentioned error. The new PG/OSD ratio is still under the configured
> limit.
> > Why do we see the error? Further, there seems to be a bug in the error
> > message where it says "exceeds per-OSD max of 32" in terms of where does
> > "32" comes from?
>
> Maybe the wording could be better. Perhaps "exceeds per-OSD max with
> mon_osd_max_split_count of 32". I'll submit this and see how it goes.
>
> >
> > P.S. I understand that the PG/OSD ratio configured on this cluster far
> > exceeds the recommended values. The experiment is to find scaling limits
> and
> > try out expansion scenarios.
> >
> > Thanks
> > Subhachandra
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 4 incomplete PGs causing RGW to go offline?

2018-01-12 Thread Brent Kennedy
Rgw.buckets ( which is where the data is being sent ).  I am just surprised 
that a few incomplete PGs would grind three gateways to a halt.  Granted, the 
incomplete part of a large hardware failure situation we had and having a 
min_size setting of 1 didn’t help the situation.  We are not completely 
innocent, but I would hope that the system as a whole would work together to 
skip those incomplete PGs.  Fixing them doesn’t appear to be an easy task at 
this point, hence why we haven’t fixed them yet(I wish that were easier, but I 
understand the counter argument ).

 

-Brent

 

From: David Turner [mailto:drakonst...@gmail.com] 
Sent: Thursday, January 11, 2018 8:22 PM
To: Brent Kennedy 
Cc: Ceph Users 
Subject: Re: [ceph-users] 4 incomplete PGs causing RGW to go offline?

 

Which pools are the incomplete PGs a part of? I would say it's very likely that 
if some of the RGW metadata was incomplete that the daemons wouldn't be happy.

 

On Thu, Jan 11, 2018, 6:17 PM Brent Kennedy  > wrote:

We have 3 RadosGW servers running behind HAProxy to enable clients to connect 
to the ceph cluster like an amazon bucket.  After all the failures and upgrade 
issues were resolved, I cannot get the RadosGW servers to stay online.  They 
were upgraded to luminous, I even upgraded the OS to Ubuntu 16 on them ( before 
upgrading to Luminous ).  They used to have apache on them as they ran Hammer 
and before that firefly.  I removed apache before upgrading to Luminous.  The 
start up and run for about 4-6 hours before all three start to go offline.  
Client traffic is light right now as we are just testing file read/write before 
we reactivate them ( they switched back to amazon while we fix them ).  

 

Could the 4 incomplete PGs be causing them to go offline?  The last time I saw 
an issue like this was when recovery wasn’t working 100%, so it seems related 
since they haven’t been stable since we upgraded( but that was also after the 
failures we had, which is why I am not trying to specifically blame the upgrade 
).

 

When I look at the radosgw log, this is what I see ( the first 2 lines show up 
plenty before this, they are health checks by the haproxy server, the next two 
are file requests that 404 fail I am guessing, then the last one is me 
restarting the service ):

 

2018-01-11 20:14:36.640577 7f5826aa3700  1 == req done req=0x7f5826a9d1f0 
op status=0 http_status=200 ==

2018-01-11 20:14:36.640602 7f5826aa3700  1 civetweb: 0x56202c567000: 
192.168.120.21 - - [11/Jan/2018:20:14:36 +] "HEAD / HTTP/1.0" 1 0 - -

2018-01-11 20:14:36.640835 7f5816282700  1 == req done req=0x7f581627c1f0 
op status=0 http_status=200 ==

2018-01-11 20:14:36.640859 7f5816282700  1 civetweb: 0x56202c61: 
192.168.120.22 - - [11/Jan/2018:20:14:36 +] "HEAD / HTTP/1.0" 1 0 - -

2018-01-11 20:14:36.761917 7f5835ac1700  1 == starting new request 
req=0x7f5835abb1f0 =

2018-01-11 20:14:36.763936 7f5835ac1700  1 == req done req=0x7f5835abb1f0 
op status=0 http_status=404 ==

2018-01-11 20:14:36.763983 7f5835ac1700  1 civetweb: 0x56202c4ce000: 
192.168.120.21 - - [11/Jan/2018:20:14:36 +] "HEAD 
/Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - 
aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:36.772611 7f5808266700  1 == starting new request 
req=0x7f58082601f0 =

2018-01-11 20:14:36.773733 7f5808266700  1 == req done req=0x7f58082601f0 
op status=0 http_status=404 ==

2018-01-11 20:14:36.773769 7f5808266700  1 civetweb: 0x56202c6aa000: 
192.168.120.21 - - [11/Jan/2018:20:14:36 +] "HEAD 
/Jobimages/vendor05/10/3962896/3962896_cover.pdf HTTP/1.1" 1 0 - 
aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:38.163617 7f5836ac3700  1 == starting new request 
req=0x7f5836abd1f0 =

2018-01-11 20:14:38.165352 7f5836ac3700  1 == req done req=0x7f5836abd1f0 
op status=0 http_status=404 ==

2018-01-11 20:14:38.165401 7f5836ac3700  1 civetweb: 0x56202c4e2000: 
192.168.120.21 - - [11/Jan/2018:20:14:38 +] "HEAD 
/Jobimages/vendor05/10/3445645/3445645_cover.pdf HTTP/1.1" 1 0 - 
aws-sdk-dotnet-35/2

.0.2.2 .NET Runtime/4.0 .NET Framework/4.0 OS/6.2.9200.0 FileIO

2018-01-11 20:14:38.170551 7f5807a65700  1 == starting new request 
req=0x7f5807a5f1f0 =

2018-01-11 20:14:40.322236 7f58352c0700  1 == starting new request 
req=0x7f58352ba1f0 =

2018-01-11 20:14:40.323468 7f5834abf700  1 == starting new request 
req=0x7f5834ab91f0 =

2018-01-11 20:14:41.643365 7f58342be700  1 == starting new request 
req=0x7f58342b81f0 =

2018-01-11 20:14:41.643358 7f58312b8700  1 == starting new request 
req=0x7f58312b21f0 =

2018-01-11 20:14:50.324196 7f5829aa9700  1 == starting new request 
req=0x7f5829aa31f0 =

2018-01-11 

Re: [ceph-users] replace failed disk in Luminous v12.2.2

2018-01-12 Thread Dietmar Rieder
Hi,

can someone, comment/confirm my planned OSD replacement procedure?

It would be very helpful for me.

Dietmar

Am 11. Januar 2018 17:47:50 MEZ schrieb Dietmar Rieder 
:
>Hi Alfredo,
>
>thanks for your coments, see my answers inline.
>
>On 01/11/2018 01:47 PM, Alfredo Deza wrote:
>> On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder
>>  wrote:
>>> Hello,
>>>
>>> we have failed OSD disk in our Luminous v12.2.2 cluster that needs
>to
>>> get replaced.
>>>
>>> The cluster was initially deployed using ceph-deploy on Luminous
>>> v12.2.0. The OSDs were created using
>>>
>>> ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk}
>>> --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1
>>>
>>> Note we separated the bluestore data, wal and db.
>>>
>>> We updated to Luminous v12.2.1 and further to Luminous v12.2.2.
>>>
>>> With the last update we also let ceph-volume take over the OSDs
>using
>>> "ceph-volume simple scan  /var/lib/ceph/osd/$osd" and "ceph-volume
>>> simple activate ${osd} ${id}". All of this went smoothly.
>> 
>> That is good to hear!
>> 
>>>
>>> Now wonder what is the correct way to replace a failed OSD block
>disk?
>>>
>>> The docs for luminous [1] say:
>>>
>>> REPLACING AN OSD
>>>
>>> 1. Destroy the OSD first:
>>>
>>> ceph osd destroy {id} --yes-i-really-mean-it
>>>
>>> 2. Zap a disk for the new OSD, if the disk was used before for other
>>> purposes. It’s not necessary for a new disk:
>>>
>>> ceph-disk zap /dev/sdX
>>>
>>>
>>> 3. Prepare the disk for replacement by using the previously
>destroyed
>>> OSD id:
>>>
>>> ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} --osd-uuid
>`uuidgen`
>>>
>>>
>>> 4. And activate the OSD:
>>>
>>> ceph-disk activate /dev/sdX1
>>>
>>>
>>> Initially this seems to be straight forward, but
>>>
>>> 1. I'm not sure if there is something to do with the still existing
>>> bluefs db and wal partitions on the nvme device for the failed OSD.
>Do
>>> they have to be zapped ? If yes, what is the best way? There is
>nothing
>>> mentioned in the docs.
>> 
>> What is your concern here if the activation seems to work?
>
>I geuss on the nvme partitions for bluefs db and bluefs wal there is
>still data related to the failed OSD  block device. I was thinking that
>this data might "interfere" with the new replacement OSD block device,
>which is empty.
>
>So you are saying that this is no concern, right?
>Are they automatically reused and assigned to the replacement OSD block
>device, or do I have to specify them when running ceph-disk prepare?
>If I need to specify the wal and db partition, how is this done?
>
>I'm asking this since from the logs of the initial cluster deployment I
>got the following warning:
>
>[cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>block.db is not the same device as the osd data
>[...]
>[cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>block.wal is not the same device as the osd data
>
>
>>>
>>> 2. Since we already let "ceph-volume simple" take over our OSDs I'm
>not
>>> sure if we should now use ceph-volume or again ceph-disk (followed
>by
>>> "ceph-vloume simple" takeover) to prepare and activate the OSD?
>> 
>> The `simple` sub-command is meant to help with the activation of OSDs
>> at boot time, supporting ceph-disk (or manual) created OSDs.
>
>OK, got this...
>
>> 
>> There is no requirement to use `ceph-volume lvm` which is intended
>for
>> new OSDs using LVM as devices.
>
>Fine...
>
>>>
>>> 3. If we should use ceph-volume, then by looking at the luminous
>>> ceph-volume docs [2] I find for both,
>>>
>>> ceph-volume lvm prepare
>>> ceph-volume lvm activate
>>>
>>> that the bluestore option is either NOT implemented or NOT supported
>>>
>>> activate:  [–bluestore] filestore (IS THIS A TYPO???) objectstore
>(not
>>> yet implemented)
>>> prepare: [–bluestore] Use the bluestore objectstore (not currently
>>> supported)
>> 
>> These might be a typo on the man page, will get that addressed.
>Ticket
>> opened at http://tracker.ceph.com/issues/22663
>
>Thanks
>
>> bluestore as of 12.2.2 is fully supported and it is the default. The
>> --help output in ceph-volume does have the flags updated and
>correctly
>> showing this.
>
>OK
>
>>>
>>>
>>> So, now I'm completely lost. How is all of this fitting together in
>>> order to replace a failed OSD?
>> 
>> You would need to keep using ceph-disk. Unless you want ceph-volume
>to
>> take over, in which case you would need to follow the steps to deploy
>> a new OSD
>> with ceph-volume.
>
>OK
>
>> Note that although --osd-id is supported, there is an issue with that
>> on 12.2.2 that would prevent you from correctly deploying it
>> http://tracker.ceph.com/issues/22642
>> 
>> The recommendation, if you want to use ceph-volume, would be to omit
>> --osd-id and let the cluster give you the ID.
>> 
>>>
>>> 4. More after reading some a recent threads on this list
>additional
>>> questions are coming up:

[ceph-users] mons segmentation faults New 12.2.2 cluster

2018-01-12 Thread Kenneth Waegeman

Hi all,

I installed a new Luminous 12.2.2 cluster. The monitors were up at 
first, but quickly started failing, segfaulting.


I only installed some mons, mgr, mds with ceph-deploy and osds with ceph 
volume. No pools or fs were created yet.


When I start all mons again, there is a short window i can see the 
cluster state:



[root@ceph001 ~]# ceph status
  cluster:
    id: 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
    health: HEALTH_WARN
    1/3 mons down, quorum ceph002,ceph003

  services:
    mon: 3 daemons, quorum ceph002,ceph003, out of quorum: ceph001
    mgr: ceph001(active), standbys: ceph002, ceph003
    osd: 7 osds: 4 up, 4 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   4223 MB used, 14899 GB / 14904 GB avail
    pgs:


But this is only until I lose quorum again.

What could be the problem here?


Thanks!!

Kenneth


2018-01-12 13:08:36.912832 7f794f513e80  0 set uid:gid to 167:167 
(ceph:ceph)
2018-01-12 13:08:36.912859 7f794f513e80  0 ceph version 12.2.2 
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
(unknown), pid 28726
2018-01-12 13:08:36.913016 7f794f513e80  0 pidfile_write: ignore empty 
--pid-file
2018-01-12 13:08:36.951556 7f794f513e80  0 load: jerasure load: lrc 
load: isa
2018-01-12 13:08:36.951703 7f794f513e80  0  set rocksdb option 
compression = kNoCompression
2018-01-12 13:08:36.951716 7f794f513e80  0  set rocksdb option 
write_buffer_size = 33554432
2018-01-12 13:08:36.951742 7f794f513e80  0  set rocksdb option 
compression = kNoCompression
2018-01-12 13:08:36.951749 7f794f513e80  0  set rocksdb option 
write_buffer_size = 33554432

2018-01-12 13:08:36.951936 7f794f513e80  4 rocksdb: RocksDB version: 5.4.0

2018-01-12 13:08:36.951947 7f794f513e80  4 rocksdb: Git sha 
rocksdb_build_git_sha:@0@

2018-01-12 13:08:36.951951 7f794f513e80  4 rocksdb: Compile date Nov 30 2017
2018-01-12 13:08:36.951954 7f794f513e80  4 rocksdb: DB SUMMARY

2018-01-12 13:08:36.952011 7f794f513e80  4 rocksdb: CURRENT file: CURRENT

2018-01-12 13:08:36.952016 7f794f513e80  4 rocksdb: IDENTITY file:  IDENTITY

2018-01-12 13:08:36.952020 7f794f513e80  4 rocksdb: MANIFEST file:  
MANIFEST-64 size: 219 Bytes


2018-01-12 13:08:36.952023 7f794f513e80  4 rocksdb: SST files in 
/var/lib/ceph/mon/ceph-ceph001/store.db dir, Total Num: 3, files: 
48.sst 50.sst 60.sst


2018-01-12 13:08:36.952025 7f794f513e80  4 rocksdb: Write Ahead Log file 
in /var/lib/ceph/mon/ceph-ceph001/store.db: 65.log size: 0 ;


2018-01-12 13:08:36.952028 7f794f513e80  4 
rocksdb: Options.error_if_exists: 0
2018-01-12 13:08:36.952029 7f794f513e80  4 
rocksdb:   Options.create_if_missing: 0
2018-01-12 13:08:36.952031 7f794f513e80  4 
rocksdb: Options.paranoid_checks: 1
2018-01-12 13:08:36.952032 7f794f513e80  4 
rocksdb: Options.env: 0x5617a10fa040
2018-01-12 13:08:36.952033 7f794f513e80  4 
rocksdb:    Options.info_log: 0x5617a24ce1c0
2018-01-12 13:08:36.952034 7f794f513e80  4 
rocksdb:  Options.max_open_files: -1
2018-01-12 13:08:36.952035 7f794f513e80  4 rocksdb: 
Options.max_file_opening_threads: 16
2018-01-12 13:08:36.952035 7f794f513e80  4 
rocksdb:   Options.use_fsync: 0
2018-01-12 13:08:36.952037 7f794f513e80  4 
rocksdb:   Options.max_log_file_size: 0
2018-01-12 13:08:36.952038 7f794f513e80  4 rocksdb:  
Options.max_manifest_file_size: 18446744073709551615
2018-01-12 13:08:36.952039 7f794f513e80  4 rocksdb:   
Options.log_file_time_to_roll: 0
2018-01-12 13:08:36.952040 7f794f513e80  4 
rocksdb:   Options.keep_log_file_num: 1000
2018-01-12 13:08:36.952041 7f794f513e80  4 rocksdb:    
Options.recycle_log_file_num: 0
2018-01-12 13:08:36.952042 7f794f513e80  4 
rocksdb: Options.allow_fallocate: 1
2018-01-12 13:08:36.952043 7f794f513e80  4 
rocksdb:    Options.allow_mmap_reads: 0
2018-01-12 13:08:36.952044 7f794f513e80  4 
rocksdb:   Options.allow_mmap_writes: 0
2018-01-12 13:08:36.952045 7f794f513e80  4 
rocksdb:    Options.use_direct_reads: 0
2018-01-12 13:08:36.952046 7f794f513e80  4 rocksdb: 
Options.use_direct_io_for_flush_and_compaction: 0
2018-01-12 13:08:36.952047 7f794f513e80  4 rocksdb: 
Options.create_missing_column_families: 0
2018-01-12 13:08:36.952048 7f794f513e80  4 
rocksdb:  Options.db_log_dir:
2018-01-12 13:08:36.952049 7f794f513e80  4 
rocksdb: Options.wal_dir: 
/var/lib/ceph/mon/ceph-ceph001/store.db
2018-01-12 13:08:36.952050 7f794f513e80  4 rocksdb: 
Options.table_cache_numshardbits: 6
2018-01-12 13:08:36.952050 7f794f513e80  4 rocksdb:  
Options.max_subcompactions: 1
2018-01-12 13:08:36.952062 

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-12 Thread Sean Redmond
Hi David,

To follow up on this I had a 4th drive fail (out of 12) and have opted to
order the below disks as a replacement, I have an ongoing case with Intel
via the supplier - Will report back anything useful - But I am going to
avoid the Intel s4600 2TB SSD's for the moment.

1.92TB Samsung SM863a 2.5" Enterprise SSD, SATA3 6Gb/s, 2-bit MLC V-NAND

Regards
Sean Redmond

On Wed, Jan 10, 2018 at 11:08 PM, Sean Redmond 
wrote:

> Hi David,
>
> Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
> disk model) in None RAID mode via a perc RAID card.
>
> The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.
>
> Thanks for your feedback on the HGST disks.
>
> Thanks
>
> On Wed, Jan 10, 2018 at 10:55 PM, David Herselman  wrote:
>
>> Hi Sean,
>>
>>
>>
>> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
>> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
>> firmware. No official answer as to whether or not one can logically unlock
>> the drives, no answer to my question whether or not Intel publish serial
>> numbers anywhere pertaining to recalled batches and no information
>> pertaining to whether or not firmware updates would address any known
>> issues.
>>
>>
>>
>> This with us being an accredited Intel Gold partner…
>>
>>
>>
>>
>>
>> We’ve returned the lot and ended up with 9/12 of the drives failing in
>> the same manner. The replaced drives, which had different serial number
>> ranges, also failed. Very frustrating is that the drives fail in a way that
>> result in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>>
>>
>>
>>
>>
>> I would be interested to know what platform your drives were in and
>> whether or not they were connected to a RAID module/card.
>>
>>
>>
>> PS: After much searching we’ve decided to order the NVMe conversion kit
>> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
>> rating.
>>
>>
>>
>>
>>
>> Regards
>>
>> David Herselman
>>
>>
>>
>> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
>> *Sent:* Thursday, 11 January 2018 12:45 AM
>> *To:* David Herselman 
>> *Cc:* Christian Balzer ; ceph-users@lists.ceph.com
>>
>> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
>> activate pgs?
>>
>>
>>
>> Hi,
>>
>>
>>
>> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
>> within a matter of days after being burn-in tested then placed into
>> production.
>>
>>
>>
>> I am interested to know, did you every get any further feedback from the
>> vendor on your issue?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman  wrote:
>>
>> Hi,
>>
>> I assume this can only be a physical manufacturing flaw or a firmware
>> bug? Do Intel publish advisories on recalled equipment? Should others be
>> concerned about using Intel DC S4600 SSD drives? Could this be an
>> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
>> way, all pure Intel...
>>
>> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped
>> through images, file systems are subsequently severely damaged.
>>
>> Is it possible to get Ceph to read in partial data shards? It would
>> provide between 25-75% more yield...
>>
>>
>> Is there anything wrong with how we've proceeded thus far? Would be nice
>> to reference examples of using ceph-objectstore-tool but documentation is
>> virtually non-existent.
>>
>> We used another SSD drive to simulate bringing all the SSDs back online.
>> We carved up the drive to provide equal partitions to essentially simulate
>> the original SSDs:
>>   # Partition a drive to provide 12 x 150GB partitions, eg:
>> sdd   8:48   0   1.8T  0 disk
>> |-sdd18:49   0   140G  0 part
>> |-sdd28:50   0   140G  0 part
>> |-sdd38:51   0   140G  0 part
>> |-sdd48:52   0   140G  0 part
>> |-sdd58:53   0   140G  0 part
>> |-sdd68:54   0   140G  0 part
>> |-sdd78:55   0   140G  0 part
>> |-sdd88:56   0   140G  0 part
>> |-sdd98:57   0   140G  0 part
>> |-sdd10   8:58   0   140G  0 part
>> |-sdd11   8:59   0   140G  0 part
>> +-sdd12   8:60   0   140G  0 part
>>
>>
>>   Pre-requisites:
>> ceph osd set noout;
>> apt-get install uuid-runtime;
>>
>>
>>   for ID in `seq 24 35`; do
>> UUID=`uuidgen`;
>> OSD_SECRET=`ceph-authtool --gen-print-key`;
>> DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
>> echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID
>> -i - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
>> mkdir /var/lib/ceph/osd/ceph-$ID;
>> mkfs.xfs $DEVICE;
>> mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
>> ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
>> --name osd.$ID --add-key $OSD_SECRET;
>> ceph-osd -i $ID --mkfs 

Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-12 Thread ceph
Well, if a stranger have access to my whole Ceph data (this, all my VMs 
& rgw's data), I don't mind if he gets root access too :)


On 01/12/2018 10:18 AM, Van Leeuwen, Robert wrote:

Ceph runs on a dedicated hardware, there is nothing there except Ceph,
and the ceph daemons have already all power on ceph's data.
And there is no random-code execution allowed on this node.

Thus, spectre & meltdown are meaning-less for Ceph's node, and
mitigations should be disabled

Is this wrong ?


In principle, I would say yes:
This means if someone has half a foot between the door for whatever reason you 
will have to assume they will be able to escalate to root.
Looking at meltdown and spectre is already a good indication of creativity in 
gaining (more) access.
So I would not assume people are unable to ever gain access to your network or 
that the ceph/ssh/etc daemons have no bugs to exploit.

I would more phrase it as:
Is the performance decrease big enough that you are willing to risk running a 
less secure server.

The answer to that depends on a lot of things like:
Performance impact of the patch
Costs of extra hardware to mitigate performance impact
Impact of possible breach (e.g. GPDR fines or reputation damage can be 
extremely expensive)
Who/what is allowed on your network
How likely you are a hacker target
How good will you sleep knowing there is a potential hole in security :)
Etc.

Cheers,
Robert van Leeuwen



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Linux Meltdown (KPTI) fix and how it affects performance?

2018-01-12 Thread Van Leeuwen, Robert
> Ceph runs on a dedicated hardware, there is nothing there except Ceph, 
>and the ceph daemons have already all power on ceph's data.
>And there is no random-code execution allowed on this node.
>
>Thus, spectre & meltdown are meaning-less for Ceph's node, and 
>mitigations should be disabled
>
>Is this wrong ?

In principle, I would say yes:
This means if someone has half a foot between the door for whatever reason you 
will have to assume they will be able to escalate to root.
Looking at meltdown and spectre is already a good indication of creativity in 
gaining (more) access.
So I would not assume people are unable to ever gain access to your network or 
that the ceph/ssh/etc daemons have no bugs to exploit.

I would more phrase it as: 
Is the performance decrease big enough that you are willing to risk running a 
less secure server.

The answer to that depends on a lot of things like:
Performance impact of the patch 
Costs of extra hardware to mitigate performance impact
Impact of possible breach (e.g. GPDR fines or reputation damage can be 
extremely expensive)
Who/what is allowed on your network
How likely you are a hacker target
How good will you sleep knowing there is a potential hole in security :)
Etc.

Cheers,
Robert van Leeuwen


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issue adding OSDs

2018-01-12 Thread Luis Periquito
"ceph versions" returned all daemons as running 12.2.1.

On Fri, Jan 12, 2018 at 8:00 AM, Janne Johansson  wrote:
> Running "ceph mon versions" and "ceph osd versions" and so on as you do the
> upgrades would have helped I guess.
>
>
> 2018-01-11 17:28 GMT+01:00 Luis Periquito :
>>
>> this was a bit weird, but is now working... Writing for future
>> reference if someone faces the same issue.
>>
>> this cluster was upgraded from jewel to luminous following the
>> recommended process. When it was finished I just set the require_osd
>> to luminous. However I hadn't restarted the daemons since. So just
>> restarting all the OSDs made the problem go away.
>>
>> How to check if that was the case? The OSDs now have a "class" associated.
>>
>>
>>
>> On Wed, Jan 10, 2018 at 7:16 PM, Luis Periquito 
>> wrote:
>> > Hi,
>> >
>> > I'm running a cluster with 12.2.1 and adding more OSDs to it.
>> > Everything is running version 12.2.1 and require_osd is set to
>> > luminous.
>> >
>> > one of the pools is replicated with size 2 min_size 1, and is
>> > seemingly blocking IO while recovering. I have no slow requests,
>> > looking at the output of "ceph osd perf" it seems brilliant (all
>> > numbers are lower than 10).
>> >
>> > clients are RBD (OpenStack VM in KVM) and using (mostly) 10.2.7. I've
>> > tagged those OSDs as out and the RBD just came back to life. I did
>> > have some objects degraded:
>> >
>> > 2018-01-10 18:23:52.081957 mon.mon0 mon.0 x.x.x.x:6789/0 410414 :
>> > cluster [WRN] Health check update: 9926354/49526500 objects misplaced
>> > (20.043%) (OBJECT_MISPLACED)
>> > 2018-01-10 18:23:52.081969 mon.mon0 mon.0 x.x.x.x:6789/0 410415 :
>> > cluster [WRN] Health check update: Degraded data redundancy:
>> > 5027/49526500 objects degraded (0.010%), 1761 pgs unclean, 27 pgs
>> > degraded (PG_DEGRADED)
>> >
>> > any thoughts as to what might be happening? I've run such operations
>> > many a times...
>> >
>> > thanks for all help, as I'm grasping as to figure out what's
>> > happening...
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rocksdb Segmentation fault during compaction (on OSD)

2018-01-12 Thread Stefan Kooman
Hi,

While trying to get an OSD back in the test cluster, which had been
dropped out for unknown reason, we see a RocksDB Segmentation fault
during "compaction". I increased debugging to 20/20 for OSD / RocksDB,
see part of the logfile below:

... 49477, 49476, 49475, 49474, 49473, 49472, 49471, 49470, 49469, 49468,
49467], "files_L1": [49465], "score": 1138.25, "input_data_size": 82872298}
-1> 2018-01-12 08:48:23.915753 7f91eaf89e40  1 freelist init
 0> 2018-01-12 08:48:45.630418 7f91eaf89e40 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7f91eaf89e40 thread_name:ceph-osd

 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous 
(stable)
 1: (()+0xa65824) [0x55a124693824]
 2: (()+0x11390) [0x7f91e9238390]
 3: (()+0x1f8af) [0x7f91eab658af]
 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions 
const&, rocksdb::ImmutableCFOptions const&, 
rocksdb::BlockBasedTable::CachableEntry*, rocksdb::Block*, 
unsigned int, rocksdb::Slice const&, unsigned long, bool, 
rocksdb::Cache::Priority)+0x1d9) [0x55a124a64e49]
 5: 
(rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*,
 rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, 
rocksdb::BlockBasedTable::CachableEntry*, bool)+0x3b7) 
[0x55a124a66827]
 6: 
(rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, 
bool, rocksdb::Status)+0x2ac) [0x55a124a66b6c]
 7: 
(rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice
 const&)+0x97) [0x55a124a6f2e7]
 8: (()+0xe6c48e) [0x55a124a9a48e]
 9: (()+0xe6ca06) [0x55a124a9aa06]
 10: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0x126) 
[0x55a124a7bc86]
 11: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x20a) [0x55a124b1bdaa]
 12: 
(RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::__cxx11::basic_string const&, 
std::__cxx11::basic_string 
const&)+0x46) [0x55a1245d4676]
 13: (BitmapFreelistManager::init(unsigned long)+0x2dc) [0x55a12463976c]
 14: (BlueStore::_open_fm(bool)+0xc00) [0x55a124526c50]
 15: (BlueStore::_mount(bool)+0x3dc) [0x55a12459aa1c]
 16: (OSD::init()+0x3e2) [0x55a1241064e2]
 17: (main()+0x2f07) [0x55a1240181d7]
 18: (__libc_start_main()+0xf0) [0x7f91e81be830]
 19: (_start()+0x29) [0x55a1240a37f9]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

The disk in question is very old (powered on ~ 8 years), so it might be that
part of the data is corrupt. Would RocksDB throw a similar error like this in 
that case?

Gr. Stefan

P.s. We're trying to learn as much as possible when things do not go according
to plan. There is way more debug info available in case anyone is interested. 



-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] issue adding OSDs

2018-01-12 Thread Janne Johansson
Running "ceph mon versions" and "ceph osd versions" and so on as you do the
upgrades would have helped I guess.


2018-01-11 17:28 GMT+01:00 Luis Periquito :

> this was a bit weird, but is now working... Writing for future
> reference if someone faces the same issue.
>
> this cluster was upgraded from jewel to luminous following the
> recommended process. When it was finished I just set the require_osd
> to luminous. However I hadn't restarted the daemons since. So just
> restarting all the OSDs made the problem go away.
>
> How to check if that was the case? The OSDs now have a "class" associated.
>
>
>
> On Wed, Jan 10, 2018 at 7:16 PM, Luis Periquito 
> wrote:
> > Hi,
> >
> > I'm running a cluster with 12.2.1 and adding more OSDs to it.
> > Everything is running version 12.2.1 and require_osd is set to
> > luminous.
> >
> > one of the pools is replicated with size 2 min_size 1, and is
> > seemingly blocking IO while recovering. I have no slow requests,
> > looking at the output of "ceph osd perf" it seems brilliant (all
> > numbers are lower than 10).
> >
> > clients are RBD (OpenStack VM in KVM) and using (mostly) 10.2.7. I've
> > tagged those OSDs as out and the RBD just came back to life. I did
> > have some objects degraded:
> >
> > 2018-01-10 18:23:52.081957 mon.mon0 mon.0 x.x.x.x:6789/0 410414 :
> > cluster [WRN] Health check update: 9926354/49526500 objects misplaced
> > (20.043%) (OBJECT_MISPLACED)
> > 2018-01-10 18:23:52.081969 mon.mon0 mon.0 x.x.x.x:6789/0 410415 :
> > cluster [WRN] Health check update: Degraded data redundancy:
> > 5027/49526500 objects degraded (0.010%), 1761 pgs unclean, 27 pgs
> > degraded (PG_DEGRADED)
> >
> > any thoughts as to what might be happening? I've run such operations
> > many a times...
> >
> > thanks for all help, as I'm grasping as to figure out what's happening...
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com