Re: [ceph-users] v10.1.2 Jewel release candidate release

2016-04-14 Thread Milosz Tanski
On Thu, Apr 14, 2016 at 6:32 AM, John Spray <jsp...@redhat.com> wrote:
> On Thu, Apr 14, 2016 at 8:31 AM, Vincenzo Pii
> <vincenzo@teralytics.ch> wrote:
>>
>> On 14 Apr 2016, at 00:09, Gregory Farnum <gfar...@redhat.com> wrote:
>>
>> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil <s...@redhat.com> wrote:
>>
>> Hi everyone,
>>
>> The third (and likely final) Jewel release candidate is out.  We have a
>> very small number of remaining blocker issues and a bit of final polish
>> before we publish Jewel 10.2.0, probably next week.
>>
>> There are no known issues with this release that are serious enough to
>> warn about here.  Greg is adding some CephFS checks so that admins don't
>> accidentally start using less-stable features,
>>
>>
>> s/is adding/has added/
>>
>>http://docs.ceph.com/docs/master/release-notes/
>>
>>
>> As noted in another thread, there's still a big CephFS warning in the
>> online docs. We'll be cleaning those up, since we now have the
>> recovery tools we desire! Some things are known to still be slow or
>> sub-optimal, but we consider CephFS stable and safe at this time when
>> run in the default single-MDS configuration. (It won't let you do
>> anything bad without very explicitly setting flags and acknowledging
>> they're dangerous.)
>> :)
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Hi Greg,
>>
>> A clarification:
>>
>> When you say that things will be safe in “single-MDS” configuration, do you
>> also exclude the HA setup with one active MDS and some passive (standby)
>> ones? Or this would be safe as well?
>
> Yes, having standbys is fine (including "standby replay" daemons).  We
> should really say "single active MDS configuration", but it's a bit of
> a mouthful!

master - hot standby(s) is okay
multi-master is not supported

I feel like this is terminology that's more familiar (to me) from
other systems (eg. databases).

>
> John
>
>>
>> Vincenzo Pii | TERALYTICS
>> DevOps Engineer
>>
>>
>> _______
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-09 Thread Milosz Tanski
On Thu, Oct 8, 2015 at 4:11 AM, Paweł Sadowski <c...@sadziu.pl> wrote:
>
> On 10/07/2015 10:52 PM, Sage Weil wrote:
> > On Wed, 7 Oct 2015, David Zafman wrote:
> >> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
> >> deep-scrub reads for objects not recently accessed by clients.
> > Yeah, it's the 'except for stuff already in cache' part that we don't do
> > (and the kernel doesn't give us a good interface for).  IIRC there was a
> > patch that guessed based on whether the obc was already in cache, which
> > seems like a pretty decent heuristic, but I forget if that was in the
> > final version.
>
> I've run some tests and it look like on XFS cache is discarded on
> O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
> I've found some patches to add support for "read only if in page cache"
> (preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
> Milosz Tanski can tell more about that. I think it could help a bit
> during deep scrub.


After a fair amount of bike shedding on the API (and removing
pwritev2) it looked like we (me and Christoph) had enough consensus to
get it upstream. But sadly it died, akpm preferred different approach
(fincore) and with enough roadblocks it died :/

>
>
> >> I see the NewStore objectstore sometimes using the O_DIRECT  flag for 
> >> writes.
> >> This concerns me because the open(2) man pages says:
> >>
> >> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
> >> and especially to overlapping byte regions in the same file.  Even when the
> >> filesystem correctly handles the coherency issues in this situation, 
> >> overall
> >> I/O throughput is likely to be slower than using either mode alone."
> > Yeah: an O_DIRECT write will do a cache flush on the write range, so if
> > there was already dirty data in cache you'll write twice.  There's
> > similarly an invalidate on read.  I need to go back through the newstore
> > code and see how the modes are being mixed and how it can be avoided...
> >
> > sage
> >
> >
> >> On 10/7/15 7:50 AM, Sage Weil wrote:
> >>> It's not, but it would not be ahrd to do this.  There are fadvise-style
> >>> hints being passed down that could trigger O_DIRECT reads in this case.
> >>> That may not be the best choice, though--it won't use data that happens
> >>> to be in cache and it'll also throw it out..
> >>>
> >>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> >>>> not able to verify that in source code.
> >>>>
> >>>> If not would it be possible to add such feature (maybe config option) to
> >>>> help keeping Linux page cache in better shape?
> >>>>
> >>>> Thanks,
>
> --
> PS
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] O_DIRECT on deep-scrub read

2015-10-07 Thread Milosz Tanski
On Wed, Oct 7, 2015 at 10:50 AM, Sage Weil <s...@newdream.net> wrote:
> It's not, but it would not be ahrd to do this.  There are fadvise-style
> hints being passed down that could trigger O_DIRECT reads in this case.
> That may not be the best choice, though--it won't use data that happens
> to be in cache and it'll also throw it out..
>
> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>
>> Hi,
>>
>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>> not able to verify that in source code.
>>
>> If not would it be possible to add such feature (maybe config option) to
>> help keeping Linux page cache in better shape?
>>
>> Thanks,

When I was working on preadv2 somebody brought up a per operation
O_DIRECT flag. There wasn't a clear use case at the time (outside of
to saying Linus would "love that").

>>
>> --
>> PS
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Discuss: New default recovery config settings

2015-05-29 Thread Milosz Tanski
On Fri, May 29, 2015 at 5:47 PM, Samuel Just sj...@redhat.com wrote:
 Many people have reported that they need to lower the osd recovery config 
 options to minimize the impact of recovery on client io.  We are talking 
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

 We'd like a bit of feedback first though.  Is anyone happy with the current 
 configs?  Is anyone using something between these values and the current 
 defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills 
 to 1 is probably a good idea, but I wonder whether lowering 
 osd_recovery_max_active and osd_recovery_max_single_start will cause small 
 objects to recover unacceptably slowly.

 Thoughts?
 -Sam

Sam I was thinking about this recently. We recently recently we ended
up hitting a recovery story  a scrub storm both happened at a time of
high client activity. While changing the defaults down will make these
kinds of disruptions less likely to occur, it also makes recovery
(rebalancing) very slow. What I'd like to see

What I would be happy to see is more of a QOS style tunable along the
lines of networking traffic shaping. Where can guarantee a minimum
amount of recovery load (and I say it in quotes since there's more
the one resource involved) when the cluster is busy with client IO. Or
vice versa there's a minimum amount of client IO that's guaranteed.
Then when there's lower periods of client activity the recovery (and
other background work) can proceed at full speed. Many workloads are
cyclical or seasonal (in the statistics term of it, eg. intra/infra
day seasonality).

QOS style managment should lead to a more dynamic system where we can
maximize available utilization, minimize disruptions, and not play
wack-a-mole with many conf knobs. I'm aware that this is much harder
to implement but thankfully there's a lot of literature,
implementation and practical experience out there to draw upon.

- Milosz

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] strange benchmark problem : restarting osd daemon improve performance from 100k iops to 300k iops

2015-04-24 Thread Milosz Tanski
On Fri, Apr 24, 2015 at 12:38 PM, Alexandre DERUMIER
aderum...@odiso.com wrote:

 Hi,

 I have finished to rebuild ceph with jemalloc,

 all seem to working fine.

 I got a constant 300k iops for the moment, so no speed regression.

 I'll do more long benchmark next week.

 Regards,

 Alexandre


In my experience jemalloc is much more proactive at returning memory
to the OS, vs. tcmalloc in the default setting is much greedier with
keeping/reusing memory. jemalloc tends to do better if you application
benefits from a large page cache. Also, jemalloc's aggressive behavior
is better if you're running a lot of applications per host because
you're less likely to trigger a kernel dirty write out when allocating
space (because you're not keeping large free cached around per
application).

Howard of Symas and LMDB fame did some benchmarking and comparison
here: http://symas.com/mdb/inmem/malloc/ He came to somewhat similar
conclusions.

It would be helpful if you can reproduce the issue with tcmalloc...
Turn on tcmalloc stats logging (every 1GB allocated or so), then
compare the size to claimed by tcmalloc to process RSS size. If you
can account for a large difference, esp. multipled times a number of
OSD that may be the culprit.

I know things have gotten better in tcmalloc. As in they fixed a few
bugs where really large allocations were never returned to the OS and
the turned down the default greediness. Sadly, distros have slow at
picking these up in the past. If this is a problem it might be worth
to have an option to build tcmalloc (using a version know to be good)
into Ceph at build time.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Establishing the Ceph Board

2015-04-01 Thread Milosz Tanski
Patrick,

I'm not sure where you are at with forming this. I would love to be considered.

Previously I've contributed commits (mostly CephFS kernel side) and
I'd love to contribute more there. Usually the hardest part about
these kinds of things is finding the time to participate. Since Ceph
is a project I believe in / care about it's success rest assured I can
make the time for this to happen. I currently am the CTO of Adfin, but
for the prepuces of this I rather have it unaffiliated / be a
independent community member.

Thanks,
- Milosz

On Fri, Mar 6, 2015 at 12:00 PM, Patrick McGarry pmcga...@redhat.com wrote:
 Hey Cephers,

 It looks like the governance work that has been going on for so long
 is (finally!) getting ready to come to fruition. If you would like
 more information feel free to check out the docs from CDS available
 at:

 http://wiki.ceph.com/Planning/Blueprints/Infernalis/Ceph_Governance

 What I need now is to start gathering the people/orgs that would be
 interested in participating in a Ceph Board. My goal is to involve the
 founding members in finalizing the documents and procedures so that it
 isn't just arbitrary stuff made up by one monkey. If you are
 interested in being a part of the Ceph Board please email me the
 following information:

 Name
 Organization/Affiliation
 Resources you might contribute (money, time, developers, your own commits, 
 etc)

 That should be enough to start a dialogue. My goal here is to get a
 small group of people that can help refine the governance docs, build
 the board, and ensure the longterm health of Ceph. As always if you
 have any questions, please don't hesitate to contact me. Thanks!


 --

 Best Regards,

 Patrick McGarry
 Director Ceph Community || Red Hat
 http://ceph.com  ||  http://community.redhat.com
 @scuttlemonkey || @ceph
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Committee monthly meeting #1 : executive summary

2014-04-04 Thread Milosz Tanski
Loic,

The writeup has been helpful.

What I'm curious about (and hasn't been mentioned) is can we use
erasure with CephFS? What steps have to be taken in order to setup
erasure coding for CephFS?

In our case we'd like to take advantage of the savings since a large
chunk of our data is written once, read many times, after a while
seldom accessed. It looks like by default MDS alredy sets default
pools for data and metadata. So I'm guessing this requires some
preparation in advance.

Best,
- Milosz

On Fri, Apr 4, 2014 at 12:34 PM, Loic Dachary l...@dachary.org wrote:
 Hi Ceph,

 This month Ceph User Committee meeting was about:

 Tiering, erasure code
 Using http://tracker.ceph.com/
 CephFS
 Miscellaneous

 You will find an executive summary at:


 https://wiki.ceph.com/Community/Meetings/Ceph_User_Committee_meeting_2014-04-03

 The full log of the IRC conversation is also included to provide more context 
 when needed. Feel free to edit if you see a mistake.

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre




-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] firefly timing

2014-03-18 Thread Milosz Tanski
I think it's good now, explicit (and detailed).

On Tue, Mar 18, 2014 at 4:12 PM, Sage Weil s...@inktank.com wrote:
 On Tue, 18 Mar 2014, Sage Weil wrote:
 On Tue, 18 Mar 2014, Milosz Tanski wrote:
  Is this statement in the documentation still valid: Stale data is
  expired from the cache pools based on some as-yet undetermined
  policy. As that sounds a bit scary.

 I'll update the docs :).  The policy is pretty simply but not described
 anywhere yet.

 I've updated the doc; please let me know what is/isn't clear so we can
 make sure the final doc is useful.

 John, we need to figure out where this is going to fit in the overall
 IA...

 sage




 sage


 
  - Milosz
 
  On Tue, Mar 18, 2014 at 12:06 PM, Sage Weil s...@inktank.com wrote:
   On Tue, 18 Mar 2014, Stefan Priebe - Profihost AG wrote:
   Hi Sage,
  
   i really would like to test the tiering. Is there any detailed
   documentation about it and how it works?
  
   Great!  Here is a quick synopiss on how to set it up:
  
   http://ceph.com/docs/master/dev/cache-pool/
  
   sage
  
  
  
  
   Greets,
   Stefan
  
   Am 18.03.2014 05:45, schrieb Sage Weil:
Hi everyone,
   
It's taken longer than expected, but the tests for v0.78 are calming 
down
and it looks like we'll be able to get the release out this week.
   
However, we've decided NOT to make this release firefly.  It will be a
normal development release.  This will be the first release that 
includes
some key new functionality (erasure coding and cache tiering) and 
although
it is passing our tests we'd like to have some operational experience 
with
it in more users' hands before we commit to supporting it long term.
   
The tentative plan is to freeze and then release v0.79 after a normal 
two
week cycle.  This will serve as a 'release candidate' that shaves off 
a
few rough edges from the pending release (including some improvements 
with
the API for setting up erasure coded pools).  It is possible that 0.79
will turn into firefly, but more likely that we will opt for another 
two
weeks of hardening and make 0.80 the release we name firefly and 
maintain
for the long term.
   
Long story short: 0.78 will be out soon, and you should test it!  It 
is
will vary from the final firefly in a few subtle ways, but any 
feedback or
usability and bug reports at this point will be very helpful in 
shaping
things.
   
Thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel 
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
   --
   To unsubscribe from this list: send the line unsubscribe ceph-devel in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  
   --
   To unsubscribe from this list: send the line unsubscribe ceph-devel in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 
  --
  Milosz Tanski
  CTO
  10 East 53rd Street, 37th floor
  New York, NY 10022
 
  p: 646-253-9055
  e: mil...@adfin.com
 
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





-- 
Milosz Tanski
CTO
10 East 53rd Street, 37th floor
New York, NY 10022

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com