Re: [ceph-users] Ceph release cadence

2017-09-09 Thread Sasha Litvak
As a user,  I woul like to add,  I would like to see a real 2 year support
for LTS releases.  Hammer releases were sketchy at best in 2017.  When
luminous was released The outstanding bugs were auto closed, good buy and
good readance.

 Also the decision to drop certain OS support created a barrier to upgrade
and looking at jewel and luminous upgrade path where you cannot easily go
back after upgrade is completed doesn't add the confidence.

So making upgrades less radical may help production support to be more
consistent and update process less dangerous.

I would say 9 month is a good reference point but for me  it is ready when
it is really ready and tested.

Keeping development release may be better for devs and early adopters.  I
don't believe production admins would go for intermediate one's as they
being released now.

This is only MHO and may be wrong.

On Sep 9, 2017 15:32, "Christian Theune"  wrote:

> Hi,
>
> have been using Ceph for multiple years now. It’s unclear to me which of
> your options fits best, but here are my preferences:
>
> * Updates are risky in a way that we tend to rather not do them every
> year. Also, having seen jewel, we’ve been well off to avoid two
>   major issues what would have bitten us and will upgrade from hammer in
> the next month or so.
>
> * Non-production releases are of not much value to me, as I have to keep
> our dev/staging/prod clusters in sync to work on our stuff.
>   As you can never downgrade, there’s no value in it for me to try
> non-production releases (without frying dev for everyone).
>
> * I’d prefer stability over new features. *Specifically* that new features
> can be properly recombined with existing features (and each
>   other) without leading to surprises. (E.g. cache tiering breaking with
> snapshots and then no way going back and a general notion of
>   “that combination wasn’t really well tested).
>
> * I’d prefer versions that I have to be maintained for production-critical
> issues maybe 2 years, so I can have some time after a new
>   production release that overlaps with the new production release
> receiving important bug fixes until I switch.
>
> Maybe this is close to what your "Drop the odd releases, and aim for a ~9
> month cadence.” would say. Waiting for a feature for a year is a pain, but
> my personal goal for Ceph is that it first has to work properly, meaning:
> not loose your data, not "stopping the show”, and not drawing you into a
> corner you can’t get out.
>
> That’s my perspective as a user. As a fellow developer I feel your pain
> about wanting to release faster and reducing maintenance load, so thanks
> for asking!
>
> Hope this helps,
> Christian
>
> > On Sep 6, 2017, at 5:23 PM, Sage Weil  wrote:
> >
> > Hi everyone,
> >
> > Traditionally, we have done a major named "stable" release twice a year,
> > and every other such release has been an "LTS" release, with fixes
> > backported for 1-2 years.
> >
> > With kraken and luminous we missed our schedule by a lot: instead of
> > releasing in October and April we released in January and August.
> >
> > A few observations:
> >
> > - Not a lot of people seem to run the "odd" releases (e.g., infernalis,
> > kraken).  This limits the value of actually making them.  It also means
> > that those who *do* run them are running riskier code (fewer users ->
> more
> > bugs).
> >
> > - The more recent requirement that upgrading clusters must make a stop at
> > each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel
> > -> lumninous) has been hugely helpful on the development side by reducing
> > the amount of cross-version compatibility code to maintain and reducing
> > the number of upgrade combinations to test.
> >
> > - When we try to do a time-based "train" release cadence, there always
> > seems to be some "must-have" thing that delays the release a bit.  This
> > doesn't happen as much with the odd releases, but it definitely happens
> > with the LTS releases.  When the next LTS is a year away, it is hard to
> > suck it up and wait that long.
> >
> > A couple of options:
> >
> > * Keep even/odd pattern, and continue being flexible with release dates
> >
> >  + flexible
> >  - unpredictable
> >  - odd releases of dubious value
> >
> > * Keep even/odd pattern, but force a 'train' model with a more regular
> > cadence
> >
> >  + predictable schedule
> >  - some features will miss the target and be delayed a year
> >
> > * Drop the odd releases but change nothing else (i.e., 12-month release
> > cadence)
> >
> >  + eliminate the confusing odd releases with dubious value
> >
> > * Drop the odd releases, and aim for a ~9 month cadence. This splits the
> > difference between the current even/odd pattern we've been doing.
> >
> >  + eliminate the confusing odd releases with dubious value
> >  + waiting for the next release isn't quite as bad
> >  - required upgrades every 9 months instead of ever 12 months
> >
> > * Drop 

Re: [ceph-users] Is the StupidAllocator supported in Luminous?

2017-09-09 Thread Eric Eastman
Opened: http://tracker.ceph.com/issues/21332

On Sat, Sep 9, 2017 at 10:03 PM, Gregory Farnum  wrote:

> Yes. Please open a ticket!
>
>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is the StupidAllocator supported in Luminous?

2017-09-09 Thread Gregory Farnum
Yes. Please open a ticket!

On Sat, Sep 9, 2017 at 11:16 AM Eric Eastman 
wrote:

> I am seeing OOM issues with some of my OSD nodes that I am testing with
> Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it
> has a smaller memory footprint, by setting the following in my ceph.conf:
>
> bluefs_allocator = stupid
> bluestore_cache_size_hdd = 1073741824
> bluestore_cache_size_ssd = 1073741824
>
> With these settings I am no longer seeing OOM errors, but on the node with
> these setting, overnight I have seen multiple Aborted messages in my log
> files:
>
> grep Abort *log
> ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 ***
> Caught signal (Aborted) **
> ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 ***
> Caught signal (Aborted) **
> ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 ***
> Caught signal (Aborted) **
> ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught
> signal (Aborted) **
> ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 ***
> Caught signal (Aborted) **
>
> Before I open a ticket, I just want to know if the StupidAllocator is
> supported in Luminous.
>
> A couple of examples of the Aborts are:
>
> 2017-09-09 12:39:27.044074 7f27f5f20700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1504975167035909, "job": 86, "event": "flush_started",
> "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553,
> "memory_usage": 260049176}
> 2017-09-09 12:39:27.044088 7f27f5f20700  4 rocksdb:
> [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86]
> Level-0 flush table #1825: started
> 2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7(
> v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842
> ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0
> lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184
> active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap
> head
> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **
>  in thread 7f2816f45700 thread_name:msgr-worker-2
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> (rc)
>  1: (()+0xa562f4) [0x5634e14882f4]
>  2: (()+0x11390) [0x7f281b2c5390]
>  3: (gsignal()+0x38) [0x7f281a261428]
>  4: (abort()+0x16a) [0x7f281a26302a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d]
>  6: (()+0x8d6b6) [0x7f281aba16b6]
>  7: (()+0x8d701) [0x7f281aba1701]
>  8: (()+0xb8d38) [0x7f281abccd38]
>  9: (()+0x76ba) [0x7f281b2bb6ba]
>  10: (clone()+0x6d) [0x7f281a33282d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> --- begin dump of recent events ---
> -1> 2017-09-09 12:39:05.878006 7f2817746700  1 --
> 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 
> osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2  998+0+46 (52256346 0
> 1629833233) 0x56359eb29000 con 0x563510c02000
>  -> 2017-09-09 12:39:05.878065 7f2816f45700  1 --
> 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 
> osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded)
> ondisk+write+known_if_redirected+full_force e3521) v8  305+0+366
> (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000
>
>
> Second example:
> 2017-09-09 07:10:58.135527 7fa2d56a0700  4 rocksdb:
> [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10]
> Flushing memtable with next log file: 2773
>
> 2017-09-09 07:10:58.262058 7fa2d56a0700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1504955458135538, "job": 10, "event": "flush_started",
> "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946,
> "memory_usage": 260049888}
> 2017-09-09 07:10:58.262077 7fa2d56a0700  4 rocksdb:
> [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10]
> Level-0 flush table #2774: started
> 2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **
>  in thread 7fa2e96c8700 thread_name:bstore_kv_sync
>
>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> (rc)
>  1: (()+0xa562f4) [0x5579585362f4]
>  2: (()+0x11390) [0x7fa2faa45390]
>  3: (gsignal()+0x38) [0x7fa2f99e1428]
>  4: (abort()+0x16a) [0x7fa2f99e302a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d]
>  6: (()+0x8d6b6) [0x7fa2fa3216b6]
>  7: (()+0x8d701) [0x7fa2fa321701]
>  8: (()+0x8d919) [0x7fa2fa321919]
>  9: (()+0x1230f) 

[ceph-users] [Luminous] rgw not deleting object

2017-09-09 Thread Jack
Hi,

I face a wild issue: I cannot remove an object from rgw (via s3 API)

My steps:
s3cmd ls s3://bucket/object -> it exists
s3cmd rm s3://bucket/object -> success
s3cmd ls s3://bucket/object -> it still exists

At this point, I can curl and get the object (thus, it does exists)

Doing the same via boto leads to the same behavior

Log sample:
2017-09-10 01:18:42.502486 7fd189e7d700  1 == starting new request
req=0x7fd189e77300 =
2017-09-10 01:18:42.504028 7fd189e7d700  1 == req done
req=0x7fd189e77300 op status=-2 http_status=204 ==
2017-09-10 01:18:42.504076 7fd189e7d700  1 civetweb: 0x560ebc275000:
10.42.43.6 - - [10/Sep/2017:01:18:38 +0200] "DELETE /bucket/object
HTTP/1.1" 1 0 - Boto/2.44.0 Python/3.5.4 Linux/4.12.0-1-amd64

What can I do ?
What data shall I provide to debug this issue ?

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph release cadence

2017-09-09 Thread Christian Theune
Hi,

have been using Ceph for multiple years now. It’s unclear to me which of your 
options fits best, but here are my preferences:

* Updates are risky in a way that we tend to rather not do them every year. 
Also, having seen jewel, we’ve been well off to avoid two
  major issues what would have bitten us and will upgrade from hammer in the 
next month or so.

* Non-production releases are of not much value to me, as I have to keep our 
dev/staging/prod clusters in sync to work on our stuff.
  As you can never downgrade, there’s no value in it for me to try 
non-production releases (without frying dev for everyone).

* I’d prefer stability over new features. *Specifically* that new features can 
be properly recombined with existing features (and each
  other) without leading to surprises. (E.g. cache tiering breaking with 
snapshots and then no way going back and a general notion of
  “that combination wasn’t really well tested).

* I’d prefer versions that I have to be maintained for production-critical 
issues maybe 2 years, so I can have some time after a new
  production release that overlaps with the new production release receiving 
important bug fixes until I switch.

Maybe this is close to what your "Drop the odd releases, and aim for a ~9 month 
cadence.” would say. Waiting for a feature for a year is a pain, but my 
personal goal for Ceph is that it first has to work properly, meaning: not 
loose your data, not "stopping the show”, and not drawing you into a corner you 
can’t get out.

That’s my perspective as a user. As a fellow developer I feel your pain about 
wanting to release faster and reducing maintenance load, so thanks for asking!

Hope this helps,
Christian

> On Sep 6, 2017, at 5:23 PM, Sage Weil  wrote:
> 
> Hi everyone,
> 
> Traditionally, we have done a major named "stable" release twice a year,
> and every other such release has been an "LTS" release, with fixes
> backported for 1-2 years.
> 
> With kraken and luminous we missed our schedule by a lot: instead of
> releasing in October and April we released in January and August.
> 
> A few observations:
> 
> - Not a lot of people seem to run the "odd" releases (e.g., infernalis,
> kraken).  This limits the value of actually making them.  It also means
> that those who *do* run them are running riskier code (fewer users -> more
> bugs).
> 
> - The more recent requirement that upgrading clusters must make a stop at
> each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel
> -> lumninous) has been hugely helpful on the development side by reducing
> the amount of cross-version compatibility code to maintain and reducing
> the number of upgrade combinations to test.
> 
> - When we try to do a time-based "train" release cadence, there always
> seems to be some "must-have" thing that delays the release a bit.  This
> doesn't happen as much with the odd releases, but it definitely happens
> with the LTS releases.  When the next LTS is a year away, it is hard to
> suck it up and wait that long.
> 
> A couple of options:
> 
> * Keep even/odd pattern, and continue being flexible with release dates
> 
>  + flexible
>  - unpredictable
>  - odd releases of dubious value
> 
> * Keep even/odd pattern, but force a 'train' model with a more regular
> cadence
> 
>  + predictable schedule
>  - some features will miss the target and be delayed a year
> 
> * Drop the odd releases but change nothing else (i.e., 12-month release
> cadence)
> 
>  + eliminate the confusing odd releases with dubious value
> 
> * Drop the odd releases, and aim for a ~9 month cadence. This splits the
> difference between the current even/odd pattern we've been doing.
> 
>  + eliminate the confusing odd releases with dubious value
>  + waiting for the next release isn't quite as bad
>  - required upgrades every 9 months instead of ever 12 months
> 
> * Drop the odd releases, but relax the "must upgrade through every LTS" to
> allow upgrades across 2 versions (e.g., luminous -> mimic or luminous ->
> nautilus).  Shorten release cycle (~6-9 months).
> 
>  + more flexibility for users
>  + downstreams have greater choice in adopting an upstrema release
>  - more LTS branches to maintain
>  - more upgrade paths to consider
> 
> Other options we should consider?  Other thoughts?
> 
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liebe Grüße,
Christian Theune

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] PCIe journal benefit for SSD OSDs

2017-09-09 Thread Stefan Priebe - Profihost AG
Hi Alexandre,

Am 07.09.2017 um 19:31 schrieb Alexandre DERUMIER:
> Hi Stefan
> 
>>> Have you already done tests how he performance changes with bluestore 
>>> while putting all 3 block devices on the same ssd?
> 
> 
> I'm going to test bluestore with 3 nodes , 18 x intel s3610 1,6TB in coming 
> weeks.
> 
> I'll send results on the mailing.

Thanks!

Greets,
Stefan

> - Mail original -
> De: "Stefan Priebe, Profihost AG" 
> À: "Christian Balzer" , "ceph-users" 
> Envoyé: Jeudi 7 Septembre 2017 08:03:31
> Objet: Re: [ceph-users] PCIe journal benefit for SSD OSDs
> 
> Hello, 
> Am 07.09.2017 um 03:53 schrieb Christian Balzer: 
>>
>> Hello, 
>>
>> On Wed, 6 Sep 2017 09:09:54 -0400 Alex Gorbachev wrote: 
>>
>>> We are planning a Jewel filestore based cluster for a performance 
>>> sensitive healthcare client, and the conservative OSD choice is 
>>> Samsung SM863A. 
>>>
>>
>> While I totally see where you're coming from and me having stated that 
>> I'll give Luminous and Bluestore some time to mature, I'd also be looking 
>> into that if I were being in the planning phase now, with like 3 months 
>> before deployment. 
>> The inherent performance increase with Bluestore (and having something 
>> that hopefully won't need touching/upgrading for a while) shouldn't be 
>> ignored. 
> 
> Yes and that's the point where i'm currently as well. Thinking about how 
> to design a new cluster based on bluestore. 
> 
>> The SSDs are fine, I've been starting to use those recently (though not 
>> with Ceph yet) as Intel DC S36xx or 37xx are impossible to get. 
>> They're a bit slower in the write IOPS department, but good enough for me. 
> 
> I've never used the Intel DC ones but always the Samsung are the Intel 
> really faster? Have you disabled te FLUSH command for the Samsung ones? 
> They don't skip the command automatically like the Intel do. Sadly the 
> Samsung SM863 got more expensive over the last months. They were a lot 
> cheaper in the first month of 2016. May be the 2,5" optane intel ssds 
> will change the game. 
> 
>>> but was wondering if anyone has seen a positive 
>>> impact from also using PCIe journals (e.g. Intel P3700 or even the 
>>> older 910 series) in front of such SSDs? 
>>>
>> NVMe journals (or WAL and DB space for Bluestore) are nice and can 
>> certainly help, especially if Ceph is tuned accordingly. 
>> Avoid non DC NVMes, I doubt you can still get 910s, they are officially 
>> EOL. 
>> You want to match capabilities and endurances, a DC P3700 800GB would be 
>> an OK match for 3-4 SM863a 960GB for example. 
> 
> That's a good point but makes the cluster more expensive. Currently 
> while using filestore i use one SSD for journal and data which works fine. 
> 
> With bluestore we've block, db and wal so we need 3 block devices per 
> OSD. If we need one PCIe or NVMe device per 3-4 devices it get's much 
> more expensive per host - currently running 10 OSDs / SSDs per Node. 
> 
> Have you already done tests how he performance changes with bluestore 
> while putting all 3 block devices on the same ssd? 
> 
> Greets, 
> Stefan 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is the StupidAllocator supported in Luminous?

2017-09-09 Thread Eric Eastman
I am seeing OOM issues with some of my OSD nodes that I am testing with
Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it
has a smaller memory footprint, by setting the following in my ceph.conf:

bluefs_allocator = stupid
bluestore_cache_size_hdd = 1073741824
bluestore_cache_size_ssd = 1073741824

With these settings I am no longer seeing OOM errors, but on the node with
these setting, overnight I have seen multiple Aborted messages in my log
files:

grep Abort *log
ceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught
signal (Aborted) **
ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 ***
Caught signal (Aborted) **
ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught
signal (Aborted) **
ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 ***
Caught signal (Aborted) **
ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal
(Aborted) **
ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal
(Aborted) **
ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 ***
Caught signal (Aborted) **
ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught signal
(Aborted) **
ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal
(Aborted) **
ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 ***
Caught signal (Aborted) **

Before I open a ticket, I just want to know if the StupidAllocator is
supported in Luminous.

A couple of examples of the Aborts are:

2017-09-09 12:39:27.044074 7f27f5f20700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1504975167035909, "job": 86, "event": "flush_started",
"num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553,
"memory_usage": 260049176}
2017-09-09 12:39:27.044088 7f27f5f20700  4 rocksdb:
[/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293]
[default] [JOB 86] Level-0 flush table #1825: started
2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7(
v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842
ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0
lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184
active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap
head
2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **
 in thread 7f2816f45700 thread_name:msgr-worker-2

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
(rc)
 1: (()+0xa562f4) [0x5634e14882f4]
 2: (()+0x11390) [0x7f281b2c5390]
 3: (gsignal()+0x38) [0x7f281a261428]
 4: (abort()+0x16a) [0x7f281a26302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d]
 6: (()+0x8d6b6) [0x7f281aba16b6]
 7: (()+0x8d701) [0x7f281aba1701]
 8: (()+0xb8d38) [0x7f281abccd38]
 9: (()+0x76ba) [0x7f281b2bb6ba]
 10: (clone()+0x6d) [0x7f281a33282d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
-1> 2017-09-09 12:39:05.878006 7f2817746700  1 --
172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 
osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2  998+0+46 (52256346 0
1629833233) 0x56359eb29000 con 0x563510c02000
 -> 2017-09-09 12:39:05.878065 7f2816f45700  1 --
10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 
osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded)
ondisk+write+known_if_redirected+full_force
e3521) v8  305+0+366 (2883828331 0 2609552142) 0x56355d9eb0c0 con
0x56355f455000


Second example:
2017-09-09 07:10:58.135527 7fa2d56a0700  4 rocksdb:
[/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264]
[default] [JOB 10] Flushing memtable with next log file: 2773

2017-09-09 07:10:58.262058 7fa2d56a0700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1504955458135538, "job": 10, "event": "flush_started",
"num_memtables": 1, "num_entries": 935059, "num_deletes": 175946,
"memory_usage": 260049888}
2017-09-09 07:10:58.262077 7fa2d56a0700  4 rocksdb:
[/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293]
[default] [JOB 10] Level-0 flush table #2774: started
2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **
 in thread 7fa2e96c8700 thread_name:bstore_kv_sync

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
(rc)
 1: (()+0xa562f4) [0x5579585362f4]
 2: (()+0x11390) [0x7fa2faa45390]
 3: (gsignal()+0x38) [0x7fa2f99e1428]
 4: (abort()+0x16a) [0x7fa2f99e302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d]
 6: (()+0x8d6b6) [0x7fa2fa3216b6]
 7: (()+0x8d701) [0x7fa2fa321701]
 8: (()+0x8d919) [0x7fa2fa321919]
 9: (()+0x1230f) [0x7fa2fb60b30f]
 10: (operator new[](unsigned long)+0x4e7) [0x7fa2fb62f4b7]
 11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x70) [0x557958939150]
 12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x45)
[0x5579589392d5]
 13: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long,

Re: [ceph-users] librados for MacOS

2017-09-09 Thread kefu chai
On Thu, Aug 3, 2017 at 4:41 PM, Willem Jan Withagen  wrote:
> On 03/08/2017 09:36, Brad Hubbard wrote:
>> On Thu, Aug 3, 2017 at 5:21 PM, Martin Palma  wrote:
>>> Hello,
>>>
>>> is there a way to get librados for MacOS? Has anybody tried to build
>>> librados for MacOS? Is this even possible?

yes, once upon a time librados and even ceph-fuse compiled and ran[0]
fine on OSX. but since we have not worked on the port for a while, the
build is broken in master. but with this patch[1], at least librados
should build now.


--
[0] https://github.com/ceph/ceph/pull/9371
[1] https://github.com/ceph/ceph/pull/17615

>>
>> Yes, it is eminently possible, but would require a dedicated effort.
>>
>> As far as I know there is no one working on this atm.
>
> Looking at the code I've come across a few #ifdef's for OSX and sorts.
> So attempts have been tried, but I think that code has rotted.
> Now FreeBSD and MacOS have a partial similar background, so ATM I would
> expect a MacOS port not to be all complex. And build on some of the
> stuff I've done for FreeBSD. Not sure if the native compiler on Mac is
> Clang, but all Clang issues are already fixed. (If Clang on Mac is at
> least at 3.8)
>
> Liek Btad says:It does require persistence, and testing. But most
> important, it will also require maintenance.
>
> --WjW
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Regards
Kefu Chai
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MAX AVAIL in ceph df

2017-09-09 Thread Sinan Polat
Hi,

 

How is the MAX AVAIL calculated in 'ceph df'? Since I am missing some space.

 

I have 26 OSD's, each is 1484GB (according to df). I have 3 replica's.
Shouldn't the MAX AVAIL be: (26*1484)/3 = 12.861GB?

Instead 'ceph df' is showing 7545G for the pool that is using the 26 OSD's.

 

What is wrong with my calculation?

 

Thanks!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [PVE-User] OSD won't start, even created ??

2017-09-09 Thread Phil Schwarz

Did a few more tests :

Older Ceph server with a pveceph create osd command (

(pveceph create osd /dev/sdb

equivalent to

ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 
a5c0cfed-...4bf939ed70 /dev/sdb )


sgdisk --print /dev/sdd

Disk /dev/sdd: 2930277168 sectors, 1.4 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 638646CF-..-62296C871132
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 2930277134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)End (sector)  Size   Code  Name
   110487808  2930277134   1.4 TiB F800  ceph data
   2204810487807   5.0 GiB F802  ceph journal


On a newer ceph server ( dpkg -l : 12.2.0-pve1 version)

sgdisk --print /dev/sdb

Disk /dev/sdb: 1465149168 sectors, 698.6 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): D63886B6-0.26-BCBCD6FFCA3C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1465149134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)End (sector)  Size   Code  Name
   12048  206847   100.0 MiB   F800  ceph data
   2  206848  1465149134   698.5 GiB     ceph block


Related to the cep-osd.admin log , i think i used a osd creation process 
leading to a bluestore osd (instead of a filestore one).
And seems that afterward the ceph server is unable to use the new 
bluestore :


( bluestore(/dev/sdb2) _read_bdev_label unable to decode label at offset 
102: buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding

)
just before trying to use it as a filestore one :

( probe_block_device_fsid /dev/sdb2 is filestore )


Tried to use the --bluestore 0 flag when creating the osd, but the flag 
is unknown.


thanks by advance for any hint.
Being ready to do a few more tests.
Best regards.

Le 08/09/2017 à 17:25, Phil Schwarz a écrit :

Hi,
any help would be really useful.
Does anyone got a clue with my issue ?

Thanks by advance.
Best regards;


Le 05/09/2017 à 20:25, Phil Schwarz a écrit :

Hi,
I come back with same issue as seen in previous thread ( link given)

trying to a 2TB SATA as OSD:
Using proxmox GUI or CLI (command given) give the same (bad) result.

Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs
redundant filesystem.

I tried to build an OSD woth same disk on another machine (stronger one
with Opteron QuadCore), failing at the same time.


Sorry for crossposting, but i think, i fail against the pveceph wrapper.


Any help or clue would be really useful..

Thanks
Best regards.










-- Link to previous thread (but same problem):
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html


-- commands :
fdisk /dev/sdc ( mklabel msdos, w, q)
ceph-disk zap /dev/sdc
pveceph createosd /dev/sdc

-- dpkg -l

 dpkg -l |grep ceph
ii  ceph 12.1.2-pve1 amd64
distributed storage and file system
ii  ceph-base12.1.2-pve1 amd64common
ceph daemon libraries and management tools
ii  ceph-common  12.1.2-pve1 amd64common
utilities to mount and interact with a ceph storage cluster
ii  ceph-mgr 12.1.2-pve1 amd64
manager for the ceph distributed storage system
ii  ceph-mon 12.1.2-pve1 amd64
monitor server for the ceph storage system
ii  ceph-osd 12.1.2-pve1 amd64OSD
server for the ceph storage system
ii  libcephfs1   10.2.5-7.2 amd64Ceph
distributed file system client library
ii  libcephfs2   12.1.2-pve1 amd64Ceph
distributed file system client library
ii  python-cephfs12.1.2-pve1 amd64Python
2 libraries for the Ceph libcephfs library

-- tail -f /var/log/ceph/ceph-osd.admin.log

2017-09-03 18:28:20.856641 7fad97e45e00  0 ceph version 12.1.2
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process
(unknown), pid 5493
2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2)
_read_bdev_label unable to decode label at offset 102:
buffer::malformed_input: void
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode
past end of struct encoding
2017-09-03 18:28:20.857200 7fad97e45e00  1 journal _open /dev/sdc2 fd 4:
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0
2017-09-03 18:28:20.857366 7fad97e45e00  1 journal close /dev/sdc2
2017-09-03 18:28:20.857431 7fad97e45e00  0 probe_block_device_fsid
/dev/sdc2 is filestore, ----
2017-09-03 18:28:21.937285 7fa5766a5e00  0 ceph version 12.1.2
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process

Re: [ceph-users] ceph OSD journal (with dmcrypt) replacement

2017-09-09 Thread Дробышевский , Владимир
AFAIK in case of dm-crypt luks (as default) ceph-disk keeps particular OSD
partition\partitions key in ceph mon attributes and uses OSD partition uuid
as an ID for this key.

So you can get all your keys running:

/usr/bin/ceph config-key ls

You'll get something like:

[
...
"dm-crypt/osd/50250ade-500a-44c4-8a47-00224d76594a/luks",
"dm-crypt/osd/940b5b1c-5926-4aa5-8cd7-ce2f22371d6a/luks",
"dm-crypt/osd/dd28c6ba-c101-4874-bc1c-401b34cb2f9b/luks",
...
]

These uuid are partition uuids.

You can check your *OSD* partition uuid and get particular key as:

# change path to your OSD (*not journal*) partition path
OSD_PATH=/dev/sdXN
OSD_UUID=`blkid -s PARTUUID -o value $OSD_PATH`

/usr/bin/ceph config-key get dm-crypt/osd/$OSD_UUID/luks



2017-09-08 18:18 GMT+05:00 M Ranga Swami Reddy :

> when I create dmcrypted jounral using cryptsetup command, its asking
> for passphase? Can I use passphase  as empty?
>
> On Wed, Sep 6, 2017 at 11:23 PM, M Ranga Swami Reddy
>  wrote:
> > Thank you. Iam able to replace the dmcrypt journal successfully.
> >
> > On Sep 5, 2017 18:14, "David Turner"  wrote:
> >>
> >> Did the journal drive fail during operation? Or was it taken out during
> >> pre-failure. If it fully failed, then most likely you can't guarantee
> the
> >> consistency of the underlying osds. In this case, you just put the
> affected
> >> osds and add them back in as new osds.
> >>
> >> In the case of having good data on the osds, you follow the standard
> >> process of closing the journal, create the new partition, set up all of
> the
> >> partition metadata so that the ceph udev rules will know what the
> journal
> >> is, and just create a new dmcrypt volume on it. I would recommend using
> the
> >> same uuid as the old journal so that you don't need to update the
> symlinks
> >> and such on the osd. After everything is done, run the journal create
> >> command for the osd and start the osd.
> >>
> >>
> >> On Tue, Sep 5, 2017, 2:47 AM M Ranga Swami Reddy 
> >> wrote:
> >>>
> >>> Hello,
> >>> How to replace an OSD's journal created with dmcrypt, from one drive
> >>> to another drive, in case of current journal drive failed.
> >>>
> >>> Thanks
> >>> Swami
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com