Re: [ceph-users] Ceph cache pool full

2017-10-09 Thread Gregory Farnum
On Fri, Oct 6, 2017 at 2:22 PM Shawfeng Dong  wrote:

> Here is a quick update. I found that a CephFS client process was accessing
> the big 1TB file, which I think had a lock on the file, preventing the
> flushing of objects to the underlying data pool. Once I killed that
> process, objects started to flush to the data pool automatically (with
> target_max_bytes & target_max_objects set); and I can force the flushing
> with 'rados -p cephfs_cache cache-flush-evict-all' as well. So David
> appears to be right in saying that "it can only hold full files and not
> flush partial files". This will be problematic if we want to transfer a
> file that is bigger in size than the cache pool!
>

Hmm. I can say that there is definitely no explicit locking of cache object
files by the filesystem; no such mechanism exists.
I also can't think of any activity that would be going on keeping the
objects active in cache.

However, if you had a CephFS client actively reading or writing from the
file, any objects it was looking at would certainly be kept in the cache; I
think there's a minimum time since last activity to prevent RADOS from
flushing out stuff that's in use. If that was the issue, you just have hot
data sets bigger than your cache size. And we know our cache tiering system
doesn't work in those cases.
-Greg


>
> We did this whole scheme (EC data pool plus NVMe cache tier) just for
> experimentation. I've learned a lot from the experiment and from your guys.
> Thank you very much!
>
> For production, I think I'll simply use a replicated pool for data on the
> HDDs (with bluestore WAL and DB on the 1st NVMe), and a replicated pool for
> metadata on the 2nd NVMe.  Please let me know if you have any further
> advice / suggestion.
>
> Best,
> Shaw
>
>
>
> On Fri, Oct 6, 2017 at 10:07 AM, David Turner 
> wrote:
>
>> All of this data is test data, yeah?  I would start by removing the
>> cache-tier and pool, recreate it and attach it, configure all of the
>> settings including the maximums, and start testing things again.  I would
>> avoid doing the 1.3TB file test until after you've confirmed that the
>> smaller files are being flushed appropriately to the data pool (manually
>> flushing/evicting it) and then scale up your testing to the larger files.
>> On Fri, Oct 6, 2017 at 12:54 PM Shawfeng Dong  wrote:
>>
>>> Curiously, it has been quite a while, but there is still no object in
>>> the underlying data pool:
>>> # rados -p cephfs_data ls
>>>
>>> Any advice?
>>>
>>> On Fri, Oct 6, 2017 at 9:45 AM, David Turner 
>>> wrote:
>>>
 Notice in the URL for the documentation the use of "luminous".  When
 you looked a few weeks ago, you might have been looking at the
 documentation for a different version of Ceph.  You can change that to
 jewel, hammer, kraken, master, etc depending on which version of Ceph you
 are running or reading about.  Google gets confused and will pull up random
 versions of the ceph documentation for a page. It's on us to make sure that
 the url is pointing to the version of Ceph that we are using.

 While it's sitting there in the flush command, can you see if there are
 any objects in the underlying data pool?  Hopefully the count will be
 growing.

 On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:

> Hi Christian,
>
> I set those via CLI:
> # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
> # ceph osd pool set cephfs_cache target_max_objects 100
>
> but manual flushing doesn't appear to work:
> # rados -p cephfs_cache cache-flush-evict-all
> 100046a.0ca6
>
> it just gets stuck there for a long time.
>
> Any suggestion? Do I need to restart the daemons or reboot the nodes?
>
> Thanks,
> Shaw
>
>
>
> On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer 
> wrote:
>
>> On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:
>>
>> > I found the command: rados -p cephfs_cache cache-flush-evict-all
>> >
>> That's not what you want/need.
>> Though it will fix your current "full" issue.
>>
>> > The documentation (
>> > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/)
>> has
>> > been improved a lot since I last checked it a few weeks ago!
>> >
>> The need to set max_bytes and max_objects has been documented for ages
>> (since Hammer).
>>
>> more below...
>>
>> > -Shaw
>> >
>> > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong 
>> wrote:
>> >
>> > > Thanks, Luis.
>> > >
>> > > I've just set max_bytes and max_objects:
>> How?
>> Editing the conf file won't help until a restart.
>>
>> > > target_max_objects: 100 (1M)
>> > > target_max_bytes: 1099511627776 (1TB)
>> >

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
You can still use EC for CephFS without a cache tier since you are using
Luminous. This is new functionality since Luminous was released while the
majority of guides you will see are for setups on Jewel and older versions
of ceph. Here's the docs regarding this including how to do it.

http://docs.ceph.com/docs/luminous/rados/operations/erasure-code/#erasure-coding-with-overwrites

On Fri, Oct 6, 2017, 5:22 PM Shawfeng Dong  wrote:

> Here is a quick update. I found that a CephFS client process was accessing
> the big 1TB file, which I think had a lock on the file, preventing the
> flushing of objects to the underlying data pool. Once I killed that
> process, objects started to flush to the data pool automatically (with
> target_max_bytes & target_max_objects set); and I can force the flushing
> with 'rados -p cephfs_cache cache-flush-evict-all' as well. So David
> appears to be right in saying that "it can only hold full files and not
> flush partial files". This will be problematic if we want to transfer a
> file that is bigger in size than the cache pool!
>
> We did this whole scheme (EC data pool plus NVMe cache tier) just for
> experimentation. I've learned a lot from the experiment and from your guys.
> Thank you very much!
>
> For production, I think I'll simply use a replicated pool for data on the
> HDDs (with bluestore WAL and DB on the 1st NVMe), and a replicated pool for
> metadata on the 2nd NVMe.  Please let me know if you have any further
> advice / suggestion.
>
> Best,
> Shaw
>
>
>
> On Fri, Oct 6, 2017 at 10:07 AM, David Turner 
> wrote:
>
>> All of this data is test data, yeah?  I would start by removing the
>> cache-tier and pool, recreate it and attach it, configure all of the
>> settings including the maximums, and start testing things again.  I would
>> avoid doing the 1.3TB file test until after you've confirmed that the
>> smaller files are being flushed appropriately to the data pool (manually
>> flushing/evicting it) and then scale up your testing to the larger files.
>> On Fri, Oct 6, 2017 at 12:54 PM Shawfeng Dong  wrote:
>>
>>> Curiously, it has been quite a while, but there is still no object in
>>> the underlying data pool:
>>> # rados -p cephfs_data ls
>>>
>>> Any advice?
>>>
>>> On Fri, Oct 6, 2017 at 9:45 AM, David Turner 
>>> wrote:
>>>
 Notice in the URL for the documentation the use of "luminous".  When
 you looked a few weeks ago, you might have been looking at the
 documentation for a different version of Ceph.  You can change that to
 jewel, hammer, kraken, master, etc depending on which version of Ceph you
 are running or reading about.  Google gets confused and will pull up random
 versions of the ceph documentation for a page. It's on us to make sure that
 the url is pointing to the version of Ceph that we are using.

 While it's sitting there in the flush command, can you see if there are
 any objects in the underlying data pool?  Hopefully the count will be
 growing.

 On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:

> Hi Christian,
>
> I set those via CLI:
> # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
> # ceph osd pool set cephfs_cache target_max_objects 100
>
> but manual flushing doesn't appear to work:
> # rados -p cephfs_cache cache-flush-evict-all
> 100046a.0ca6
>
> it just gets stuck there for a long time.
>
> Any suggestion? Do I need to restart the daemons or reboot the nodes?
>
> Thanks,
> Shaw
>
>
>
> On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer 
> wrote:
>
>> On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:
>>
>> > I found the command: rados -p cephfs_cache cache-flush-evict-all
>> >
>> That's not what you want/need.
>> Though it will fix your current "full" issue.
>>
>> > The documentation (
>> > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/)
>> has
>> > been improved a lot since I last checked it a few weeks ago!
>> >
>> The need to set max_bytes and max_objects has been documented for ages
>> (since Hammer).
>>
>> more below...
>>
>> > -Shaw
>> >
>> > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong 
>> wrote:
>> >
>> > > Thanks, Luis.
>> > >
>> > > I've just set max_bytes and max_objects:
>> How?
>> Editing the conf file won't help until a restart.
>>
>> > > target_max_objects: 100 (1M)
>> > > target_max_bytes: 1099511627776 (1TB)
>> >
>> I'd lower that or the cache_target_full_ratio by another 10%.
>>
>> Christian
>> > >
>> > > but nothing appears to be happening. Is there a way to force
>> flushing?
>> > >
>> > > Thanks,
>> > > Shaw
>> > >

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Shawfeng Dong
Here is a quick update. I found that a CephFS client process was accessing
the big 1TB file, which I think had a lock on the file, preventing the
flushing of objects to the underlying data pool. Once I killed that
process, objects started to flush to the data pool automatically (with
target_max_bytes & target_max_objects set); and I can force the flushing
with 'rados -p cephfs_cache cache-flush-evict-all' as well. So David
appears to be right in saying that "it can only hold full files and not
flush partial files". This will be problematic if we want to transfer a
file that is bigger in size than the cache pool!

We did this whole scheme (EC data pool plus NVMe cache tier) just for
experimentation. I've learned a lot from the experiment and from your guys.
Thank you very much!

For production, I think I'll simply use a replicated pool for data on the
HDDs (with bluestore WAL and DB on the 1st NVMe), and a replicated pool for
metadata on the 2nd NVMe.  Please let me know if you have any further
advice / suggestion.

Best,
Shaw



On Fri, Oct 6, 2017 at 10:07 AM, David Turner  wrote:

> All of this data is test data, yeah?  I would start by removing the
> cache-tier and pool, recreate it and attach it, configure all of the
> settings including the maximums, and start testing things again.  I would
> avoid doing the 1.3TB file test until after you've confirmed that the
> smaller files are being flushed appropriately to the data pool (manually
> flushing/evicting it) and then scale up your testing to the larger files.
> On Fri, Oct 6, 2017 at 12:54 PM Shawfeng Dong  wrote:
>
>> Curiously, it has been quite a while, but there is still no object in the
>> underlying data pool:
>> # rados -p cephfs_data ls
>>
>> Any advice?
>>
>> On Fri, Oct 6, 2017 at 9:45 AM, David Turner 
>> wrote:
>>
>>> Notice in the URL for the documentation the use of "luminous".  When you
>>> looked a few weeks ago, you might have been looking at the documentation
>>> for a different version of Ceph.  You can change that to jewel, hammer,
>>> kraken, master, etc depending on which version of Ceph you are running or
>>> reading about.  Google gets confused and will pull up random versions of
>>> the ceph documentation for a page. It's on us to make sure that the url is
>>> pointing to the version of Ceph that we are using.
>>>
>>> While it's sitting there in the flush command, can you see if there are
>>> any objects in the underlying data pool?  Hopefully the count will be
>>> growing.
>>>
>>> On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:
>>>
 Hi Christian,

 I set those via CLI:
 # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
 # ceph osd pool set cephfs_cache target_max_objects 100

 but manual flushing doesn't appear to work:
 # rados -p cephfs_cache cache-flush-evict-all
 100046a.0ca6

 it just gets stuck there for a long time.

 Any suggestion? Do I need to restart the daemons or reboot the nodes?

 Thanks,
 Shaw



 On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer  wrote:

> On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:
>
> > I found the command: rados -p cephfs_cache cache-flush-evict-all
> >
> That's not what you want/need.
> Though it will fix your current "full" issue.
>
> > The documentation (
> > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/)
> has
> > been improved a lot since I last checked it a few weeks ago!
> >
> The need to set max_bytes and max_objects has been documented for ages
> (since Hammer).
>
> more below...
>
> > -Shaw
> >
> > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:
> >
> > > Thanks, Luis.
> > >
> > > I've just set max_bytes and max_objects:
> How?
> Editing the conf file won't help until a restart.
>
> > > target_max_objects: 100 (1M)
> > > target_max_bytes: 1099511627776 (1TB)
> >
> I'd lower that or the cache_target_full_ratio by another 10%.
>
> Christian
> > >
> > > but nothing appears to be happening. Is there a way to force
> flushing?
> > >
> > > Thanks,
> > > Shaw
> > >
> > > On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito <
> periqu...@gmail.com>
> > > wrote:
> > >
> > >> Not looking at anything else, you didn't set the max_bytes or
> > >> max_objects for it to start flushing...
> > >>
> > >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong 
> wrote:
> > >> > Dear all,
> > >> >
> > >> > Thanks a lot for the very insightful comments/suggestions!
> > >> >
> > >> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x
> 1TB SSDs
> > >> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe 

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
All of this data is test data, yeah?  I would start by removing the
cache-tier and pool, recreate it and attach it, configure all of the
settings including the maximums, and start testing things again.  I would
avoid doing the 1.3TB file test until after you've confirmed that the
smaller files are being flushed appropriately to the data pool (manually
flushing/evicting it) and then scale up your testing to the larger files.
On Fri, Oct 6, 2017 at 12:54 PM Shawfeng Dong  wrote:

> Curiously, it has been quite a while, but there is still no object in the
> underlying data pool:
> # rados -p cephfs_data ls
>
> Any advice?
>
> On Fri, Oct 6, 2017 at 9:45 AM, David Turner 
> wrote:
>
>> Notice in the URL for the documentation the use of "luminous".  When you
>> looked a few weeks ago, you might have been looking at the documentation
>> for a different version of Ceph.  You can change that to jewel, hammer,
>> kraken, master, etc depending on which version of Ceph you are running or
>> reading about.  Google gets confused and will pull up random versions of
>> the ceph documentation for a page. It's on us to make sure that the url is
>> pointing to the version of Ceph that we are using.
>>
>> While it's sitting there in the flush command, can you see if there are
>> any objects in the underlying data pool?  Hopefully the count will be
>> growing.
>>
>> On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:
>>
>>> Hi Christian,
>>>
>>> I set those via CLI:
>>> # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
>>> # ceph osd pool set cephfs_cache target_max_objects 100
>>>
>>> but manual flushing doesn't appear to work:
>>> # rados -p cephfs_cache cache-flush-evict-all
>>> 100046a.0ca6
>>>
>>> it just gets stuck there for a long time.
>>>
>>> Any suggestion? Do I need to restart the daemons or reboot the nodes?
>>>
>>> Thanks,
>>> Shaw
>>>
>>>
>>>
>>> On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer  wrote:
>>>
 On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:

 > I found the command: rados -p cephfs_cache cache-flush-evict-all
 >
 That's not what you want/need.
 Though it will fix your current "full" issue.

 > The documentation (
 > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/)
 has
 > been improved a lot since I last checked it a few weeks ago!
 >
 The need to set max_bytes and max_objects has been documented for ages
 (since Hammer).

 more below...

 > -Shaw
 >
 > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:
 >
 > > Thanks, Luis.
 > >
 > > I've just set max_bytes and max_objects:
 How?
 Editing the conf file won't help until a restart.

 > > target_max_objects: 100 (1M)
 > > target_max_bytes: 1099511627776 (1TB)
 >
 I'd lower that or the cache_target_full_ratio by another 10%.

 Christian
 > >
 > > but nothing appears to be happening. Is there a way to force
 flushing?
 > >
 > > Thanks,
 > > Shaw
 > >
 > > On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito 
 > > wrote:
 > >
 > >> Not looking at anything else, you didn't set the max_bytes or
 > >> max_objects for it to start flushing...
 > >>
 > >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong 
 wrote:
 > >> > Dear all,
 > >> >
 > >> > Thanks a lot for the very insightful comments/suggestions!
 > >> >
 > >> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x
 1TB SSDs
 > >> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use
 the
 > >> bluestore
 > >> > backend, with the first NVMe as the WAL and DB devices for OSDs
 on the
 > >> HDDs.
 > >> > And we try to create a cache tier out of the second NVMes.
 > >> >
 > >> > Here are the outputs of the commands suggested by David:
 > >> >
 > >> > 1) # ceph df
 > >> > GLOBAL:
 > >> > SIZE AVAIL RAW USED %RAW USED
 > >> > 265T  262T2847G  1.05
 > >> > POOLS:
 > >> > NAMEID USED  %USED  MAX AVAIL
 > >>  OBJECTS
 > >> > cephfs_data 1  0  0  248T
 > >>  0
 > >> > cephfs_metadata 2  8515k  0  248T
 > >> 24
 > >> > cephfs_cache3  1381G 100.00 0
 > >> 355385
 > >> >
 > >> > 2) # ceph osd df
 > >> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
 > >> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
 > >> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
 > >> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
 > >> >  4   hdd 7.27829  1.0 7452G 2072M  7450G 

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Shawfeng Dong
Curiously, it has been quite a while, but there is still no object in the
underlying data pool:
# rados -p cephfs_data ls

Any advice?

On Fri, Oct 6, 2017 at 9:45 AM, David Turner  wrote:

> Notice in the URL for the documentation the use of "luminous".  When you
> looked a few weeks ago, you might have been looking at the documentation
> for a different version of Ceph.  You can change that to jewel, hammer,
> kraken, master, etc depending on which version of Ceph you are running or
> reading about.  Google gets confused and will pull up random versions of
> the ceph documentation for a page. It's on us to make sure that the url is
> pointing to the version of Ceph that we are using.
>
> While it's sitting there in the flush command, can you see if there are
> any objects in the underlying data pool?  Hopefully the count will be
> growing.
>
> On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:
>
>> Hi Christian,
>>
>> I set those via CLI:
>> # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
>> # ceph osd pool set cephfs_cache target_max_objects 100
>>
>> but manual flushing doesn't appear to work:
>> # rados -p cephfs_cache cache-flush-evict-all
>> 100046a.0ca6
>>
>> it just gets stuck there for a long time.
>>
>> Any suggestion? Do I need to restart the daemons or reboot the nodes?
>>
>> Thanks,
>> Shaw
>>
>>
>>
>> On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer  wrote:
>>
>>> On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:
>>>
>>> > I found the command: rados -p cephfs_cache cache-flush-evict-all
>>> >
>>> That's not what you want/need.
>>> Though it will fix your current "full" issue.
>>>
>>> > The documentation (
>>> > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/)
>>> has
>>> > been improved a lot since I last checked it a few weeks ago!
>>> >
>>> The need to set max_bytes and max_objects has been documented for ages
>>> (since Hammer).
>>>
>>> more below...
>>>
>>> > -Shaw
>>> >
>>> > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:
>>> >
>>> > > Thanks, Luis.
>>> > >
>>> > > I've just set max_bytes and max_objects:
>>> How?
>>> Editing the conf file won't help until a restart.
>>>
>>> > > target_max_objects: 100 (1M)
>>> > > target_max_bytes: 1099511627776 (1TB)
>>> >
>>> I'd lower that or the cache_target_full_ratio by another 10%.
>>>
>>> Christian
>>> > >
>>> > > but nothing appears to be happening. Is there a way to force
>>> flushing?
>>> > >
>>> > > Thanks,
>>> > > Shaw
>>> > >
>>> > > On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito 
>>> > > wrote:
>>> > >
>>> > >> Not looking at anything else, you didn't set the max_bytes or
>>> > >> max_objects for it to start flushing...
>>> > >>
>>> > >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong 
>>> wrote:
>>> > >> > Dear all,
>>> > >> >
>>> > >> > Thanks a lot for the very insightful comments/suggestions!
>>> > >> >
>>> > >> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x
>>> 1TB SSDs
>>> > >> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the
>>> > >> bluestore
>>> > >> > backend, with the first NVMe as the WAL and DB devices for OSDs
>>> on the
>>> > >> HDDs.
>>> > >> > And we try to create a cache tier out of the second NVMes.
>>> > >> >
>>> > >> > Here are the outputs of the commands suggested by David:
>>> > >> >
>>> > >> > 1) # ceph df
>>> > >> > GLOBAL:
>>> > >> > SIZE AVAIL RAW USED %RAW USED
>>> > >> > 265T  262T2847G  1.05
>>> > >> > POOLS:
>>> > >> > NAMEID USED  %USED  MAX AVAIL
>>> > >>  OBJECTS
>>> > >> > cephfs_data 1  0  0  248T
>>> > >>  0
>>> > >> > cephfs_metadata 2  8515k  0  248T
>>> > >> 24
>>> > >> > cephfs_cache3  1381G 100.00 0
>>> > >> 355385
>>> > >> >
>>> > >> > 2) # ceph osd df
>>> > >> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
>>> > >> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
>>> > >> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>>> > >> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
>>> > >> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>>> > >> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
>>> > >> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>>> > >> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
>>> > >> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
>>> > >> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
>>> > >> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
>>> > >> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
>>> > >> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
>>> > >> > 12   hdd 7.27829  1.0 7452G 2072M  

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
Notice in the URL for the documentation the use of "luminous".  When you
looked a few weeks ago, you might have been looking at the documentation
for a different version of Ceph.  You can change that to jewel, hammer,
kraken, master, etc depending on which version of Ceph you are running or
reading about.  Google gets confused and will pull up random versions of
the ceph documentation for a page. It's on us to make sure that the url is
pointing to the version of Ceph that we are using.

While it's sitting there in the flush command, can you see if there are any
objects in the underlying data pool?  Hopefully the count will be growing.

On Fri, Oct 6, 2017 at 12:39 PM Shawfeng Dong  wrote:

> Hi Christian,
>
> I set those via CLI:
> # ceph osd pool set cephfs_cache target_max_bytes 1099511627776
> # ceph osd pool set cephfs_cache target_max_objects 100
>
> but manual flushing doesn't appear to work:
> # rados -p cephfs_cache cache-flush-evict-all
> 100046a.0ca6
>
> it just gets stuck there for a long time.
>
> Any suggestion? Do I need to restart the daemons or reboot the nodes?
>
> Thanks,
> Shaw
>
>
>
> On Fri, Oct 6, 2017 at 9:31 AM, Christian Balzer  wrote:
>
>> On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:
>>
>> > I found the command: rados -p cephfs_cache cache-flush-evict-all
>> >
>> That's not what you want/need.
>> Though it will fix your current "full" issue.
>>
>> > The documentation (
>> > http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/) has
>> > been improved a lot since I last checked it a few weeks ago!
>> >
>> The need to set max_bytes and max_objects has been documented for ages
>> (since Hammer).
>>
>> more below...
>>
>> > -Shaw
>> >
>> > On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:
>> >
>> > > Thanks, Luis.
>> > >
>> > > I've just set max_bytes and max_objects:
>> How?
>> Editing the conf file won't help until a restart.
>>
>> > > target_max_objects: 100 (1M)
>> > > target_max_bytes: 1099511627776 (1TB)
>> >
>> I'd lower that or the cache_target_full_ratio by another 10%.
>>
>> Christian
>> > >
>> > > but nothing appears to be happening. Is there a way to force flushing?
>> > >
>> > > Thanks,
>> > > Shaw
>> > >
>> > > On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito 
>> > > wrote:
>> > >
>> > >> Not looking at anything else, you didn't set the max_bytes or
>> > >> max_objects for it to start flushing...
>> > >>
>> > >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:
>> > >> > Dear all,
>> > >> >
>> > >> > Thanks a lot for the very insightful comments/suggestions!
>> > >> >
>> > >> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x
>> 1TB SSDs
>> > >> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the
>> > >> bluestore
>> > >> > backend, with the first NVMe as the WAL and DB devices for OSDs on
>> the
>> > >> HDDs.
>> > >> > And we try to create a cache tier out of the second NVMes.
>> > >> >
>> > >> > Here are the outputs of the commands suggested by David:
>> > >> >
>> > >> > 1) # ceph df
>> > >> > GLOBAL:
>> > >> > SIZE AVAIL RAW USED %RAW USED
>> > >> > 265T  262T2847G  1.05
>> > >> > POOLS:
>> > >> > NAMEID USED  %USED  MAX AVAIL
>> > >>  OBJECTS
>> > >> > cephfs_data 1  0  0  248T
>> > >>  0
>> > >> > cephfs_metadata 2  8515k  0  248T
>> > >> 24
>> > >> > cephfs_cache3  1381G 100.00 0
>> > >> 355385
>> > >> >
>> > >> > 2) # ceph osd df
>> > >> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
>> > >> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
>> > >> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>> > >> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
>> > >> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>> > >> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
>> > >> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>> > >> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
>> > >> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
>> > >> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
>> > >> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
>> > >> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
>> > >> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
>> > >> > 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
>> > >> > 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
>> > >> > 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
>> > >> > 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
>> > >> > 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>> > >> > 17   hdd 7.27829  1.0 7452G 2072M  

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Christian Balzer
On Fri, 6 Oct 2017 09:14:40 -0700 Shawfeng Dong wrote:

> I found the command: rados -p cephfs_cache cache-flush-evict-all
> 
That's not what you want/need.
Though it will fix your current "full" issue.

> The documentation (
> http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/) has
> been improved a lot since I last checked it a few weeks ago!
>
The need to set max_bytes and max_objects has been documented for ages
(since Hammer).

more below...

> -Shaw
> 
> On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:
> 
> > Thanks, Luis.
> >
> > I've just set max_bytes and max_objects:
How?
Editing the conf file won't help until a restart.

> > target_max_objects: 100 (1M)
> > target_max_bytes: 1099511627776 (1TB)
>
I'd lower that or the cache_target_full_ratio by another 10%.

Christian
> >
> > but nothing appears to be happening. Is there a way to force flushing?
> >
> > Thanks,
> > Shaw
> >
> > On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito 
> > wrote:
> >  
> >> Not looking at anything else, you didn't set the max_bytes or
> >> max_objects for it to start flushing...
> >>
> >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:  
> >> > Dear all,
> >> >
> >> > Thanks a lot for the very insightful comments/suggestions!
> >> >
> >> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
> >> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the  
> >> bluestore  
> >> > backend, with the first NVMe as the WAL and DB devices for OSDs on the  
> >> HDDs.  
> >> > And we try to create a cache tier out of the second NVMes.
> >> >
> >> > Here are the outputs of the commands suggested by David:
> >> >
> >> > 1) # ceph df
> >> > GLOBAL:
> >> > SIZE AVAIL RAW USED %RAW USED
> >> > 265T  262T2847G  1.05
> >> > POOLS:
> >> > NAMEID USED  %USED  MAX AVAIL  
> >>  OBJECTS  
> >> > cephfs_data 1  0  0  248T  
> >>  0  
> >> > cephfs_metadata 2  8515k  0  248T  
> >> 24  
> >> > cephfs_cache3  1381G 100.00 0  
> >> 355385  
> >> >
> >> > 2) # ceph osd df
> >> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
> >> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
> >> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
> >> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> >> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> >> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
> >> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
> >> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
> >> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
> >> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> >> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
> >> > 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
> >> > 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> >> > 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
> >> > 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
> >> > 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> >> > 17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
> >> > 18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >> > 19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
> >> > 20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
> >> > 21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
> >> > 22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
> >> > 23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
> >> > 37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
> >> > 24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
> >> > 25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
> >> > 26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
> >> > 27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
> >> > 28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> >> > 29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
> >> > 30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
> >> > 31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
> >> > 32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
> >> > 33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
> >> > 34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
> >> > 35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
> >> > 38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
> >> > TOTAL  265T 2847G   262T  1.05
> >> > MIN/MAX VAR: 0.03/90.63  STDDEV: 

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Shawfeng Dong
I found the command: rados -p cephfs_cache cache-flush-evict-all

The documentation (
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/) has
been improved a lot since I last checked it a few weeks ago!

-Shaw

On Fri, Oct 6, 2017 at 9:10 AM, Shawfeng Dong  wrote:

> Thanks, Luis.
>
> I've just set max_bytes and max_objects:
> target_max_objects: 100 (1M)
> target_max_bytes: 1099511627776 (1TB)
>
> but nothing appears to be happening. Is there a way to force flushing?
>
> Thanks,
> Shaw
>
> On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito 
> wrote:
>
>> Not looking at anything else, you didn't set the max_bytes or
>> max_objects for it to start flushing...
>>
>> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:
>> > Dear all,
>> >
>> > Thanks a lot for the very insightful comments/suggestions!
>> >
>> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
>> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the
>> bluestore
>> > backend, with the first NVMe as the WAL and DB devices for OSDs on the
>> HDDs.
>> > And we try to create a cache tier out of the second NVMes.
>> >
>> > Here are the outputs of the commands suggested by David:
>> >
>> > 1) # ceph df
>> > GLOBAL:
>> > SIZE AVAIL RAW USED %RAW USED
>> > 265T  262T2847G  1.05
>> > POOLS:
>> > NAMEID USED  %USED  MAX AVAIL
>>  OBJECTS
>> > cephfs_data 1  0  0  248T
>>  0
>> > cephfs_metadata 2  8515k  0  248T
>> 24
>> > cephfs_cache3  1381G 100.00 0
>> 355385
>> >
>> > 2) # ceph osd df
>> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
>> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
>> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
>> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
>> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
>> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
>> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
>> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
>> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
>> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
>> > 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
>> > 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
>> > 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
>> > 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
>> > 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>> > 17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
>> > 18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>> > 19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
>> > 20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
>> > 21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
>> > 22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
>> > 23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
>> > 37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
>> > 24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
>> > 25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
>> > 26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
>> > 27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
>> > 28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
>> > 29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
>> > 30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
>> > 31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
>> > 32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
>> > 33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
>> > 34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
>> > 35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
>> > 38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
>> > TOTAL  265T 2847G   262T  1.05
>> > MIN/MAX VAR: 0.03/90.63  STDDEV: 22.81
>> >
>> > 3) # ceph osd tree
>> > -1   265.29291 root default
>> > -388.43097 host pulpo-osd01
>> >  0   hdd   7.27829 osd.0up  1.0 1.0
>> >  1   hdd   7.27829 osd.1up  1.0 1.0
>> >  2   hdd   7.27829 osd.2up  1.0 1.0
>> >  3   hdd   7.27829 osd.3up  1.0 1.0
>> >  4   hdd   7.27829 osd.4up  1.0 1.0
>> >  5   hdd   7.27829 osd.5up  1.0 1.0
>> >  6   hdd   7.27829 osd.6up  

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Christian Balzer
On Fri, 6 Oct 2017 16:55:31 +0100 Luis Periquito wrote:

> Not looking at anything else, you didn't set the max_bytes or
> max_objects for it to start flushing...
> 
Precisely!
He says, cackling, as he goes to cash in his bet. ^o^


> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:
> > Dear all,
> >
> > Thanks a lot for the very insightful comments/suggestions!
> >
> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the bluestore
> > backend, with the first NVMe as the WAL and DB devices for OSDs on the HDDs.
> > And we try to create a cache tier out of the second NVMes.
> >
> > Here are the outputs of the commands suggested by David:
> >
> > 1) # ceph df
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 265T  262T2847G  1.05
> > POOLS:
> > NAMEID USED  %USED  MAX AVAIL OBJECTS
> > cephfs_data 1  0  0  248T   0
> > cephfs_metadata 2  8515k  0  248T  24
> > cephfs_cache3  1381G 100.00 0  355385
> >
> > 2) # ceph osd df
> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
> > 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
> > 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> > 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
> > 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
> > 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> > 17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
> > 18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> > 19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
> > 20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
> > 21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
> > 22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
> > 23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
> > 37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
> > 24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
> > 25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
> > 26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
> > 27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
> > 28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> > 29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
> > 30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
> > 31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
> > 32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
> > 33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
> > 34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
> > 35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
> > 38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
> > TOTAL  265T 2847G   262T  1.05
> > MIN/MAX VAR: 0.03/90.63  STDDEV: 22.81
> >
> > 3) # ceph osd tree
> > -1   265.29291 root default
> > -388.43097 host pulpo-osd01
> >  0   hdd   7.27829 osd.0up  1.0 1.0
> >  1   hdd   7.27829 osd.1up  1.0 1.0
> >  2   hdd   7.27829 osd.2up  1.0 1.0
> >  3   hdd   7.27829 osd.3up  1.0 1.0
> >  4   hdd   7.27829 osd.4up  1.0 1.0
> >  5   hdd   7.27829 osd.5up  1.0 1.0
> >  6   hdd   7.27829 osd.6up  1.0 1.0
> >  7   hdd   7.27829 osd.7up  1.0 1.0
> >  8   hdd   7.27829 osd.8up  1.0 1.0
> >  9   hdd   7.27829 osd.9up  1.0 1.0
> > 10   hdd   7.27829 osd.10   up  1.0 1.0
> > 11   hdd   7.27829 osd.11   up  1.0 1.0
> > 36  nvme   1.09149 osd.36   up  1.0 1.0
> > -588.43097 host pulpo-osd02
> > 12   hdd   7.27829 osd.12   up  1.0 1.0
> > 13   hdd   7.27829

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Shawfeng Dong
Thanks, Luis.

I've just set max_bytes and max_objects:
target_max_objects: 100 (1M)
target_max_bytes: 1099511627776 (1TB)

but nothing appears to be happening. Is there a way to force flushing?

Thanks,
Shaw

On Fri, Oct 6, 2017 at 8:55 AM, Luis Periquito  wrote:

> Not looking at anything else, you didn't set the max_bytes or
> max_objects for it to start flushing...
>
> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:
> > Dear all,
> >
> > Thanks a lot for the very insightful comments/suggestions!
> >
> > There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
> > (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the
> bluestore
> > backend, with the first NVMe as the WAL and DB devices for OSDs on the
> HDDs.
> > And we try to create a cache tier out of the second NVMes.
> >
> > Here are the outputs of the commands suggested by David:
> >
> > 1) # ceph df
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 265T  262T2847G  1.05
> > POOLS:
> > NAMEID USED  %USED  MAX AVAIL OBJECTS
> > cephfs_data 1  0  0  248T   0
> > cephfs_metadata 2  8515k  0  248T  24
> > cephfs_cache3  1381G 100.00 0  355385
> >
> > 2) # ceph osd df
> >  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
> >  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
> >  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
> >  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> >  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> >  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> >  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
> >  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
> >  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
> > 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
> > 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> > 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
> > 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
> > 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> > 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
> > 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
> > 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> > 17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
> > 18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> > 19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
> > 20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
> > 21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
> > 22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
> > 23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
> > 37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
> > 24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
> > 25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
> > 26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
> > 27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
> > 28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> > 29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
> > 30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
> > 31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
> > 32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
> > 33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
> > 34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
> > 35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
> > 38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
> > TOTAL  265T 2847G   262T  1.05
> > MIN/MAX VAR: 0.03/90.63  STDDEV: 22.81
> >
> > 3) # ceph osd tree
> > -1   265.29291 root default
> > -388.43097 host pulpo-osd01
> >  0   hdd   7.27829 osd.0up  1.0 1.0
> >  1   hdd   7.27829 osd.1up  1.0 1.0
> >  2   hdd   7.27829 osd.2up  1.0 1.0
> >  3   hdd   7.27829 osd.3up  1.0 1.0
> >  4   hdd   7.27829 osd.4up  1.0 1.0
> >  5   hdd   7.27829 osd.5up  1.0 1.0
> >  6   hdd   7.27829 osd.6up  1.0 1.0
> >  7   hdd   7.27829 osd.7up  1.0 1.0
> >  8   hdd   7.27829 osd.8up  1.0 1.0
> >  9   hdd   7.27829 osd.9up  1.0 1.0
> > 10   hdd   7.27829 osd.10   up  1.0 1.0
> > 11   hdd   7.27829 osd.11   up  1.0 1.0
> > 36  nvme   1.09149 

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Luis Periquito
Not looking at anything else, you didn't set the max_bytes or
max_objects for it to start flushing...

On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong  wrote:
> Dear all,
>
> Thanks a lot for the very insightful comments/suggestions!
>
> There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
> (boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the bluestore
> backend, with the first NVMe as the WAL and DB devices for OSDs on the HDDs.
> And we try to create a cache tier out of the second NVMes.
>
> Here are the outputs of the commands suggested by David:
>
> 1) # ceph df
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED
> 265T  262T2847G  1.05
> POOLS:
> NAMEID USED  %USED  MAX AVAIL OBJECTS
> cephfs_data 1  0  0  248T   0
> cephfs_metadata 2  8515k  0  248T  24
> cephfs_cache3  1381G 100.00 0  355385
>
> 2) # ceph osd df
>  0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
>  1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
>  2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>  3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
>  4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
>  5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
>  6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
>  7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
>  8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
>  9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
> 10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
> 11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> 36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
> 12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
> 13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
> 14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
> 15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
> 16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
> 17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
> 18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
> 19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
> 20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
> 21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
> 22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
> 23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
> 37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
> 24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
> 25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
> 26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
> 27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
> 28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
> 29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
> 30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
> 31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
> 32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
> 33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
> 34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
> 35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
> 38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
> TOTAL  265T 2847G   262T  1.05
> MIN/MAX VAR: 0.03/90.63  STDDEV: 22.81
>
> 3) # ceph osd tree
> -1   265.29291 root default
> -388.43097 host pulpo-osd01
>  0   hdd   7.27829 osd.0up  1.0 1.0
>  1   hdd   7.27829 osd.1up  1.0 1.0
>  2   hdd   7.27829 osd.2up  1.0 1.0
>  3   hdd   7.27829 osd.3up  1.0 1.0
>  4   hdd   7.27829 osd.4up  1.0 1.0
>  5   hdd   7.27829 osd.5up  1.0 1.0
>  6   hdd   7.27829 osd.6up  1.0 1.0
>  7   hdd   7.27829 osd.7up  1.0 1.0
>  8   hdd   7.27829 osd.8up  1.0 1.0
>  9   hdd   7.27829 osd.9up  1.0 1.0
> 10   hdd   7.27829 osd.10   up  1.0 1.0
> 11   hdd   7.27829 osd.11   up  1.0 1.0
> 36  nvme   1.09149 osd.36   up  1.0 1.0
> -588.43097 host pulpo-osd02
> 12   hdd   7.27829 osd.12   up  1.0 1.0
> 13   hdd   7.27829 osd.13   up  1.0 1.0
> 14   hdd   7.27829 osd.14   up  1.0 1.0
> 15   hdd   7.27829 osd.15   up  1.0 1.0
> 16   hdd   7.27829 osd.16   up  1.0 1.0
> 17   hdd   7.27829 osd.17   up  1.0 

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread Shawfeng Dong
Dear all,

Thanks a lot for the very insightful comments/suggestions!

There are 3 OSD servers in our pilot Ceph cluster, each with 2x 1TB SSDs
(boot disks), 12x 8TB SATA HDDs and 2x 1.2TB NVMe SSDs. We use the
bluestore backend, with the first NVMe as the WAL and DB devices for OSDs
on the HDDs. And we try to create a cache tier out of the second NVMes.

Here are the outputs of the commands suggested by David:

1) # ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
265T  262T2847G  1.05
POOLS:
NAMEID USED  %USED  MAX AVAIL OBJECTS
cephfs_data 1  0  0  248T   0
cephfs_metadata 2  8515k  0  248T  24
cephfs_cache3  1381G 100.00 0  355385

2) # ceph osd df
 0   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 174
 1   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 169
 2   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
 3   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 159
 4   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
 5   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
 6   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
 7   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 179
 8   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 163
 9   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 194
10   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 185
11   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
36  nvme 1.09149  1.0 1117G  855G   262G 76.53 73.01  79
12   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 180
13   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 168
14   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 178
15   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 170
16   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 149
17   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 203
18   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 173
19   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 158
20   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 154
21   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 160
22   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 167
23   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 188
37  nvme 1.09149  1.0 1117G 1061G 57214M 95.00 90.63  98
24   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 187
25   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 200
26   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 147
27   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 171
28   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 162
29   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 152
30   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 174
31   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 176
32   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 182
33   hdd 7.27829  1.0 7452G 2072M  7450G  0.03  0.03 155
34   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 166
35   hdd 7.27829  1.0 7452G 2076M  7450G  0.03  0.03 176
38  nvme 1.09149  1.0 1117G  857G   260G 76.71 73.18  79
TOTAL  265T 2847G   262T  1.05
MIN/MAX VAR: 0.03/90.63  STDDEV: 22.81

3) # ceph osd tree
-1   265.29291 root default
-388.43097 host pulpo-osd01
 0   hdd   7.27829 osd.0up  1.0 1.0
 1   hdd   7.27829 osd.1up  1.0 1.0
 2   hdd   7.27829 osd.2up  1.0 1.0
 3   hdd   7.27829 osd.3up  1.0 1.0
 4   hdd   7.27829 osd.4up  1.0 1.0
 5   hdd   7.27829 osd.5up  1.0 1.0
 6   hdd   7.27829 osd.6up  1.0 1.0
 7   hdd   7.27829 osd.7up  1.0 1.0
 8   hdd   7.27829 osd.8up  1.0 1.0
 9   hdd   7.27829 osd.9up  1.0 1.0
10   hdd   7.27829 osd.10   up  1.0 1.0
11   hdd   7.27829 osd.11   up  1.0 1.0
36  nvme   1.09149 osd.36   up  1.0 1.0
-588.43097 host pulpo-osd02
12   hdd   7.27829 osd.12   up  1.0 1.0
13   hdd   7.27829 osd.13   up  1.0 1.0
14   hdd   7.27829 osd.14   up  1.0 1.0
15   hdd   7.27829 osd.15   up  1.0 1.0
16   hdd   7.27829 osd.16   up  1.0 1.0
17   hdd   7.27829 osd.17   up  1.0 1.0
18   hdd   7.27829 osd.18   up  1.0 1.0
19   hdd   7.27829 osd.19   up  1.0 1.0
20   hdd   7.27829 osd.20   up  1.0 1.0
21   hdd   7.27829 osd.21   up  1.0 1.0
22   hdd   7.27829 osd.22   up  1.0 1.0
23   hdd   

Re: [ceph-users] Ceph cache pool full

2017-10-06 Thread David Turner
On Fri, Oct 6, 2017, 1:05 AM Christian Balzer  wrote:

>
> Hello,
>
> On Fri, 06 Oct 2017 03:30:41 + David Turner wrote:
>
> > You're missing most all of the important bits. What the osds in your
> > cluster look like, your tree, and your cache pool settings.
> >
> > ceph df
> > ceph osd df
> > ceph osd tree
> > ceph osd pool get cephfs_cache all
> >
> Especially the last one.
>
> My money is on not having set target_max_objects and target_max_bytes to
> sensible values along with the ratios.
> In short, not having read the (albeit spotty) documentation.
>
> > You have your writeback cache on 3 nvme drives. It looks like you have
> > 1.6TB available between them for the cache. I don't know the behavior of
> a
> > writeback cache tier on cephfs for large files, but I would guess that it
> > can only hold full files and not flush partial files.
>
> I VERY much doubt that, if so it would be a massive flaw.
> One assumes that cache operations work on the RADOS object level, no
> matter what.
>
I hope that it is on the rados level, but not a single object had been
flushed to the backing pool. So I hazarded a guess. Seeing his settings
will shed more light.

>
> > That would mean your
> > cache needs to have enough space for any file being written to the
> cluster.
> > In this case a 1.3TB file with 3x replication would require 3.9TB (more
> > than double what you have available) of available space in your writeback
> > cache.
> >
> > There are very few use cases that benefit from a cache tier. The docs for
> > Luminous warn as much.
> You keep repeating that like a broken record.
>
> And while certainly not false I for one wouldn't be able to use (justify
> using) Ceph w/o cache tiers in our main use case.


> In this case I assume they were following and old cheat sheet or such,
> suggesting the previously required cache tier with EC pools.
>

http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/

I know I keep repeating it, especially recently as there have been a lot of
people asking about it. The Luminous docs added a large section about how
it is probably not what you want. Like me, it is not saying that there are
no use cases for it. There was no information provided about the use case
and I made some suggestions/guesses. I'm also guessing that they are
following a guide where a writeback cache was necessary for CephFS to use
EC prior to Luminous. I also usually add that people should test it out and
find what works best for them. I will always defer to your practical use of
cache tiers as well, especially when using rbds.

I manage a cluster that I intend to continue running a writeback cache in
front of CephFS on the same drives as the EC pool. The use case receives a
good enough benefit from the cache tier that it isn't even required to use
flash media to see it. It is used for video editing and the files are
usually modified and read within the first 24 hours and then left in cold
storage until deleted. I have the cache timed to keep everything in it for
24 hours and then evict it by using a minimum time to flush and evict at 24
hours and a target max bytes of 0. All files are in there for that time and
then it never has to decide what to keep as it doesn't keep anything longer
than that. Luckily read performance from cold storage is not a requirement
of this cluster as any read operation has to first read it from EC storage,
write it to replica storage, and then read it from replica storage... Yuck.

>
> Christian
>
> >What is your goal by implementing this cache? If the
> > answer is to utilize extra space on the nvmes, then just remove it and
> say
> > thank you. The better use of nvmes in that case are as a part of the
> > bluestore stack and give your osds larger DB partitions. Keeping your
> > metadata pool on nvmes is still a good idea.
> >
> > On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong  wrote:
> >
> > > Dear all,
> > >
> > > We just set up a Ceph cluster, running the latest stable release Ceph
> > > v12.2.0 (Luminous):
> > > # ceph --version
> > > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> > > (rc)
> > >
> > > The goal is to serve Ceph filesystem, for which we created 3 pools:
> > > # ceph osd lspools
> > > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> > > where
> > > * cephfs_data is the data pool (36 OSDs on HDDs), which is
> erased-coded;
> > > * cephfs_metadata is the metadata pool
> > > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> > > cache-mode is writeback.
> > >
> > > Everything had worked fine, until today when we tried to copy a 1.3TB
> file
> > > to the CephFS.  We got the "No space left on device" error!
> > >
> > > 'ceph -s' says some OSDs are full:
> > > # ceph -s
> > >   cluster:
> > > id: e18516bf-39cb-4670-9f13-88ccb7d19769
> > > health: HEALTH_ERR
> > > full flag(s) set
> > > 1 full osd(s)
> > > 1 pools have many more 

Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread Christian Wuerdig
The default filesize limit for CephFS is 1TB, see also here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/018208.html
(also includes a pointer on how to increase it)

On Fri, Oct 6, 2017 at 12:45 PM, Shawfeng Dong  wrote:
> Dear all,
>
> We just set up a Ceph cluster, running the latest stable release Ceph
> v12.2.0 (Luminous):
> # ceph --version
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>
> The goal is to serve Ceph filesystem, for which we created 3 pools:
> # ceph osd lspools
> 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> where
> * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> * cephfs_metadata is the metadata pool
> * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> cache-mode is writeback.
>
> Everything had worked fine, until today when we tried to copy a 1.3TB file
> to the CephFS.  We got the "No space left on device" error!
>
> 'ceph -s' says some OSDs are full:
> # ceph -s
>   cluster:
> id: e18516bf-39cb-4670-9f13-88ccb7d19769
> health: HEALTH_ERR
> full flag(s) set
> 1 full osd(s)
> 1 pools have many more objects per pg than average
>
>   services:
> mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> osd: 39 osds: 39 up, 39 in
>  flags full
>
>   data:
> pools:   3 pools, 2176 pgs
> objects: 347k objects, 1381 GB
> usage:   2847 GB used, 262 TB / 265 TB avail
> pgs: 2176 active+clean
>
>   io:
> client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
>
> And indeed the cache pool is full:
> # rados df
> POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> DEGRADED RD_OPS   RD
> WR_OPS  WR
> cephfs_cache1381G  355385  0 710770  0   0
> 0 10004954 15
> 22G 1398063  1611G
> cephfs_data 0   0  0  0  0   0
> 00
>   0   0  0
> cephfs_metadata 8515k  24  0 72  0   0
> 03  3
> 0723953 10541k
>
> total_objects355409
> total_used   2847G
> total_avail  262T
> total_space  265T
>
> However, the data pool is completely empty! So it seems that data has only
> been written to the cache pool, but not written back to the data pool.
>
> I am really at a loss whether this is due to a setup error on my part, or a
> Luminous bug. Could anyone shed some light on this? Please let me know if
> you need any further info.
>
> Best,
> Shaw
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread Christian Balzer

Hello,

On Fri, 06 Oct 2017 03:30:41 + David Turner wrote:

> You're missing most all of the important bits. What the osds in your
> cluster look like, your tree, and your cache pool settings.
> 
> ceph df
> ceph osd df
> ceph osd tree
> ceph osd pool get cephfs_cache all
>
Especially the last one.

My money is on not having set target_max_objects and target_max_bytes to
sensible values along with the ratios.
In short, not having read the (albeit spotty) documentation.
 
> You have your writeback cache on 3 nvme drives. It looks like you have
> 1.6TB available between them for the cache. I don't know the behavior of a
> writeback cache tier on cephfs for large files, but I would guess that it
> can only hold full files and not flush partial files. 

I VERY much doubt that, if so it would be a massive flaw.
One assumes that cache operations work on the RADOS object level, no
matter what.

> That would mean your
> cache needs to have enough space for any file being written to the cluster.
> In this case a 1.3TB file with 3x replication would require 3.9TB (more
> than double what you have available) of available space in your writeback
> cache.
> 
> There are very few use cases that benefit from a cache tier. The docs for
> Luminous warn as much. 
You keep repeating that like a broken record.

And while certainly not false I for one wouldn't be able to use (justify
using) Ceph w/o cache tiers in our main use case.

In this case I assume they were following and old cheat sheet or such,
suggesting the previously required cache tier with EC pools.

Christian

>What is your goal by implementing this cache? If the
> answer is to utilize extra space on the nvmes, then just remove it and say
> thank you. The better use of nvmes in that case are as a part of the
> bluestore stack and give your osds larger DB partitions. Keeping your
> metadata pool on nvmes is still a good idea.
> 
> On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong  wrote:
> 
> > Dear all,
> >
> > We just set up a Ceph cluster, running the latest stable release Ceph
> > v12.2.0 (Luminous):
> > # ceph --version
> > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> > (rc)
> >
> > The goal is to serve Ceph filesystem, for which we created 3 pools:
> > # ceph osd lspools
> > 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> > where
> > * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> > * cephfs_metadata is the metadata pool
> > * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> > cache-mode is writeback.
> >
> > Everything had worked fine, until today when we tried to copy a 1.3TB file
> > to the CephFS.  We got the "No space left on device" error!
> >
> > 'ceph -s' says some OSDs are full:
> > # ceph -s
> >   cluster:
> > id: e18516bf-39cb-4670-9f13-88ccb7d19769
> > health: HEALTH_ERR
> > full flag(s) set
> > 1 full osd(s)
> > 1 pools have many more objects per pg than average
> >
> >   services:
> > mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> > mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> > mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> > osd: 39 osds: 39 up, 39 in
> >  flags full
> >
> >   data:
> > pools:   3 pools, 2176 pgs
> > objects: 347k objects, 1381 GB
> > usage:   2847 GB used, 262 TB / 265 TB avail
> > pgs: 2176 active+clean
> >
> >   io:
> > client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
> >
> > And indeed the cache pool is full:
> > # rados df
> > POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> > DEGRADED RD_OPS   RD
> > WR_OPS  WR
> > cephfs_cache1381G  355385  0 710770  0   0
> > 0 10004954 15
> > 22G 1398063  1611G
> > cephfs_data 0   0  0  0  0   0
> > 00
> >   0   0  0
> > cephfs_metadata 8515k  24  0 72  0   0
> > 03  3
> > 0723953 10541k
> >
> > total_objects355409
> > total_used   2847G
> > total_avail  262T
> > total_space  265T
> >
> > However, the data pool is completely empty! So it seems that data has only
> > been written to the cache pool, but not written back to the data pool.
> >
> > I am really at a loss whether this is due to a setup error on my part, or
> > a Luminous bug. Could anyone shed some light on this? Please let me know if
> > you need any further info.
> >
> > Best,
> > Shaw
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Ceph cache pool full

2017-10-05 Thread David Turner
You're missing most all of the important bits. What the osds in your
cluster look like, your tree, and your cache pool settings.

ceph df
ceph osd df
ceph osd tree
ceph osd pool get cephfs_cache all

You have your writeback cache on 3 nvme drives. It looks like you have
1.6TB available between them for the cache. I don't know the behavior of a
writeback cache tier on cephfs for large files, but I would guess that it
can only hold full files and not flush partial files. That would mean your
cache needs to have enough space for any file being written to the cluster.
In this case a 1.3TB file with 3x replication would require 3.9TB (more
than double what you have available) of available space in your writeback
cache.

There are very few use cases that benefit from a cache tier. The docs for
Luminous warn as much. What is your goal by implementing this cache? If the
answer is to utilize extra space on the nvmes, then just remove it and say
thank you. The better use of nvmes in that case are as a part of the
bluestore stack and give your osds larger DB partitions. Keeping your
metadata pool on nvmes is still a good idea.

On Thu, Oct 5, 2017, 7:45 PM Shawfeng Dong  wrote:

> Dear all,
>
> We just set up a Ceph cluster, running the latest stable release Ceph
> v12.2.0 (Luminous):
> # ceph --version
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
> (rc)
>
> The goal is to serve Ceph filesystem, for which we created 3 pools:
> # ceph osd lspools
> 1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
> where
> * cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
> * cephfs_metadata is the metadata pool
> * cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
> cache-mode is writeback.
>
> Everything had worked fine, until today when we tried to copy a 1.3TB file
> to the CephFS.  We got the "No space left on device" error!
>
> 'ceph -s' says some OSDs are full:
> # ceph -s
>   cluster:
> id: e18516bf-39cb-4670-9f13-88ccb7d19769
> health: HEALTH_ERR
> full flag(s) set
> 1 full osd(s)
> 1 pools have many more objects per pg than average
>
>   services:
> mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
> mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
> mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
> osd: 39 osds: 39 up, 39 in
>  flags full
>
>   data:
> pools:   3 pools, 2176 pgs
> objects: 347k objects, 1381 GB
> usage:   2847 GB used, 262 TB / 265 TB avail
> pgs: 2176 active+clean
>
>   io:
> client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr
>
> And indeed the cache pool is full:
> # rados df
> POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
> DEGRADED RD_OPS   RD
> WR_OPS  WR
> cephfs_cache1381G  355385  0 710770  0   0
> 0 10004954 15
> 22G 1398063  1611G
> cephfs_data 0   0  0  0  0   0
> 00
>   0   0  0
> cephfs_metadata 8515k  24  0 72  0   0
> 03  3
> 0723953 10541k
>
> total_objects355409
> total_used   2847G
> total_avail  262T
> total_space  265T
>
> However, the data pool is completely empty! So it seems that data has only
> been written to the cache pool, but not written back to the data pool.
>
> I am really at a loss whether this is due to a setup error on my part, or
> a Luminous bug. Could anyone shed some light on this? Please let me know if
> you need any further info.
>
> Best,
> Shaw
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph cache pool full

2017-10-05 Thread Shawfeng Dong
Dear all,

We just set up a Ceph cluster, running the latest stable release Ceph
v12.2.0 (Luminous):
# ceph --version
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

The goal is to serve Ceph filesystem, for which we created 3 pools:
# ceph osd lspools
1 cephfs_data,2 cephfs_metadata,3 cephfs_cache,
where
* cephfs_data is the data pool (36 OSDs on HDDs), which is erased-coded;
* cephfs_metadata is the metadata pool
* cephfs_cache is the cache tier (3 OSDs on NVMes) for cephfs_data. The
cache-mode is writeback.

Everything had worked fine, until today when we tried to copy a 1.3TB file
to the CephFS.  We got the "No space left on device" error!

'ceph -s' says some OSDs are full:
# ceph -s
  cluster:
id: e18516bf-39cb-4670-9f13-88ccb7d19769
health: HEALTH_ERR
full flag(s) set
1 full osd(s)
1 pools have many more objects per pg than average

  services:
mon: 3 daemons, quorum pulpo-admin,pulpo-mon01,pulpo-mds01
mgr: pulpo-mds01(active), standbys: pulpo-admin, pulpo-mon01
mds: pulpos-1/1/1 up  {0=pulpo-mds01=up:active}
osd: 39 osds: 39 up, 39 in
 flags full

  data:
pools:   3 pools, 2176 pgs
objects: 347k objects, 1381 GB
usage:   2847 GB used, 262 TB / 265 TB avail
pgs: 2176 active+clean

  io:
client:   19301 kB/s rd, 2935 op/s rd, 0 op/s wr

And indeed the cache pool is full:
# rados df
POOL_NAME   USED  OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND
DEGRADED RD_OPS   RD
WR_OPS  WR
cephfs_cache1381G  355385  0 710770  0   0
  0 10004954 15
22G 1398063  1611G
cephfs_data 0   0  0  0  0   0
  00
  0   0  0
cephfs_metadata 8515k  24  0 72  0   0
  03  3
0723953 10541k

total_objects355409
total_used   2847G
total_avail  262T
total_space  265T

However, the data pool is completely empty! So it seems that data has only
been written to the cache pool, but not written back to the data pool.

I am really at a loss whether this is due to a setup error on my part, or a
Luminous bug. Could anyone shed some light on this? Please let me know if
you need any further info.

Best,
Shaw
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com