Re: [ceph-users] bluestore compression enabled but no data compressed

2019-03-14 Thread Ragan, Tj (Dr.)
Hi Frank,

Did you ever get the 0.5 compression ratio thing figured out?

Thanks
-TJ Ragan


On 23 Oct 2018, at 16:56, Igor Fedotov 
mailto:ifedo...@suse.de>> wrote:

Hi Frank,


On 10/23/2018 2:56 PM, Frank Schilder wrote:
Dear David and Igor,

thank you very much for your help. I have one more question about chunk sizes 
and data granularity on bluestore and will summarize the information I got on 
bluestore compression at the end.

1) Compression ratio
---

Following Igor's explanation, I tried to understand the numbers for 
compressed_allocated and compressed_original and am somewhat stuck with 
figuring out how bluestore arithmetic works. I created a 32GB file of zeros 
using dd with write size bs=8M on a cephfs with

ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=con-fs-data-test"

The data pool is an 8+2 erasure coded pool with properties

pool 37 'con-fs-data-test' erasure size 10 min_size 9 crush_rule 11 
object_hash rjenkins pg_num 900 pgp_num 900 last_change 9970 flags 
hashpspool,ec_overwrites stripe_width 32768 compression_mode aggressive 
application cephfs

As I understand EC pools, a 4M object is split into 8x0.5M data shards that are 
stored together with 2x0.5M coding shards on one OSD each. So, I would expect a 
full object write to put a 512K chunk on each OSD in the PG. Looking at some 
config options of one of the OSDs, I see:

"bluestore_compression_max_blob_size_hdd": "524288",
"bluestore_compression_min_blob_size_hdd": "131072",
"bluestore_max_blob_size_hdd": "524288",
"bluestore_min_alloc_size_hdd": "65536",

>From this, I would conclude that the largest chunk size is 512K, which also 
>equals compression_max_blob_size. The minimum allocation size is 64K for any 
>object. What I would expect now is, that the full object writes to cephfs 
>create chunk sizes of 512M per OSD in the PG, meaning that with an all-zero 
>file I should observe a compresses_allocated ratio of 64K/512K=0.125 instead 
>of the 0.5 reported below. It looks like that chunks of 128K are written 
>instead of 512K. I'm happy with the 64K granularity, but the observed maximum 
>chunk size seems a factor of 4 too small.

Where am I going wrong, what am I overlooking?
Please note how selection whether to use compression_max_blob_size or 
compression_min_blob_size is performed.

Max blob size threshold is mainly for objects that are tagged with flags 
indicating non-random access, e.g. sequential read and/or write, immutable, 
append-only etc.
Here is how it's determined in the code:
  if ((alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_SEQUENTIAL_READ) &&
  (alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_RANDOM_READ) == 0 &&
  (alloc_hints & (CEPH_OSD_ALLOC_HINT_FLAG_IMMUTABLE |
  CEPH_OSD_ALLOC_HINT_FLAG_APPEND_ONLY)) &&
  (alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_RANDOM_WRITE) == 0) {
dout(20) << __func__ << " will prefer large blob and csum sizes" << dendl;

This is done to minimize the overhead during future random access since it will 
need full blob decompression.
Hence min blob size is used for regular random I/O. Which is probably you case 
as well.
You can check bluestore log (once its level is raised to 20) to confirm this. 
E.g. by looking for the following line output:
  dout(20) << __func__ << " prefer csum_order " << wctx->csum_order
   << " target_blob_size 0x" << std::hex << wctx->target_blob_size
   << std::dec << dendl;

So you can simply increase bluestore_compression_min_blob_size_hdd if you want 
longer compressed chunks.
With the above-mentioned penalty on subsequent access though.

2) Bluestore compression configuration
---

If I understand David correctly, pool and OSD settings do *not* override each 
other, but are rather *combined* into a resulting setting as follows. Let

0 - (n)one
1 - (p)assive
2 - (a)ggressive
3 - (f)orce

? - (u)nset

be the 4+1 possible settings of compression modes with numeric values assigned 
as shown. Then, the resulting numeric compression mode for data in a pool on a 
specific OSD is

res_compr_mode = min(mode OSD, mode pool)

or in form of a table:

  pool
 | n  p  a  f  u
   --+--
   n | n  n  n  n  n
O  p | n  p  p  p  ?
S  a | n  p  a  a  ?
D  f | n  p  a  f  ?
   u | n  ?  ?  ?  u

which would allow for the flexible configuration as mentioned by David below.

I'm actually not sure if I can confirm this. I have some pools where 
compression_mode is not set and which reside on separate OSDs with compression 
enabled, yet there is compressed data on these OSDs. Wondering if I polluted my 
test with "ceph config set bluestore_compression_mode aggressive" that I 
executed earlier, or if my above interpretation is still wrong. Does the 
setting issued with "ceph config set bluestore_compression_mode aggressive" 
apply to 

[ceph-users] Compression never better than 50%

2019-03-13 Thread Ragan, Tj (Dr.)
Hi All,

I’ve been investigating compression and (long-story-short,) found that I can 
never get better than 50% compression ratio.

My setup:
Mimic 13.2.2
OSDs:  Bluestore, Sparse files looped to /dev/loop0, lvm to create logical 
volumes.  bluestore_compression_mode: passive
Pool: 3-replica, compression_mode: force, compression_algorithm: zstd
Data: dd if=/dev/zero of=dd_test bs=2M count=1

Before addition to rados there are zero objects in the pool and, perf dump of 
one of the osds contains:
"bluestore_allocated": 165281792,
"bluestore_stored": 54092500,
"bluestore_compressed": 0,
"bluestore_compressed_allocated": 0,
"bluestore_compressed_original": 0,
And after `sudo rados -p rep3_testing put dd_test dd_test`, `rados ls` shows 
only the one object, and the same osd as above contains:
"bluestore_allocated": 166330368,
"bluestore_stored": 56189652,
"bluestore_compressed": 592,
"bluestore_compressed_allocated": 1048576,
"bluestore_compressed_original": 2097152,

… which means that the bluestore_stored went up by 2097152 bytes (exactly the 
file size of dd_test) and the bluestore_stored went up by 1048576 - which is 
exactly half as much.  (Note that these numbers are the same as the 
bluestore_compressed_* values.)  When I repeated this test using zlib or 
snappy, I got exactly 50% compression with both as well.   If instead I use the 
zstd command line program, this file ends up being just 207 bytes - which is 
what you’d expect for a file of just zeros.

Please can explain why I'm not getting better compression, or ideally tell me 
I’ve forgotten a setting and which setting I need to change?

Thank you,

-TJ Ragan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Files in CephFS data pool

2019-02-15 Thread Ragan, Tj (Dr.)
Is there anyway to find out which files are stored in a CephFS data pool?  I 
know you can reference the extended attributes, but those are only relevant for 
files created after ceph.dir.layout.pool or ceph.file.layout.pool attributes 
are set - I need to know about all the files in a pool.

Thanks!

-TJ Ragan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Ragan, Tj (Dr.)
Yes, I’ve done that.  I’ve also tried changing the priority field from 1 to 2, 
with no effect.


On 15 Nov 2017, at 09:58, Hans van den Bogert 
<hansbog...@gmail.com<mailto:hansbog...@gmail.com>> wrote:

Never mind, you already said you are on the latest ceph-deploy, so that can’t 
be it.
I’m not familiar with deploying on Centos, but I can imagine that the last part 
of the checklist is important:

http://docs.ceph.com/docs/luminous/start/quick-start-preflight/#priorities-preferences

Can you verify that you did that part?

On Nov 15, 2017, at 10:41 AM, Hans van den Bogert <hansbog...@gmail.com> wrote:

Hi,

Can you show the contents of the file, /etc/yum.repos.d/ceph.repo ?

Regards,

Hans
On Nov 15, 2017, at 10:27 AM, Ragan, Tj (Dr.) <tj.ra...@leicester.ac.uk> wrote:

Hi All,

I feel like I’m doing something silly.  I’m spinning up a new cluster, and 
followed the instructions on the pre-flight and quick start here:

http://docs.ceph.com/docs/luminous/start/quick-start-preflight/
http://docs.ceph.com/docs/luminous/start/quick-ceph-deploy/

but ceph-deploy always installs Jewel.

ceph-deploy is version 1.5.39 and I’m running CentOS 7 (7-4.1708)

Any help would be appreciated.

-TJ Ragan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Ragan, Tj (Dr.)
$ cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://download.ceph.com/rpm-jewel/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-jewel/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1

[ceph-source]
name=Ceph source packages
baseurl=http://download.ceph.com/rpm-jewel/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1



It’s worth noting that if I change these to rpm-luminous then it breaks too:

$ sudo sed -i 's/jewel/luminous/' /etc/yum.repos.d/ceph.repo
$ cat !$
cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://download.ceph.com/rpm-luminous/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1

[ceph-source]
name=Ceph source packages
baseurl=http://download.ceph.com/rpm-luminous/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1

$ ceph-deploy install --release luminous admin1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/ceph-deploy/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /bin/ceph-deploy install --release 
luminous admin1
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  testing   : None
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  dev_commit: None
[ceph_deploy.cli][INFO  ]  install_mds   : False
[ceph_deploy.cli][INFO  ]  stable: None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  adjust_repos  : True
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  install_all   : False
[ceph_deploy.cli][INFO  ]  repo  : False
[ceph_deploy.cli][INFO  ]  host  : ['admin1']
[ceph_deploy.cli][INFO  ]  install_rgw   : False
[ceph_deploy.cli][INFO  ]  install_tests : False
[ceph_deploy.cli][INFO  ]  repo_url  : None
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  install_osd   : False
[ceph_deploy.cli][INFO  ]  version_kind  : stable
[ceph_deploy.cli][INFO  ]  install_common: False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  dev   : master
[ceph_deploy.cli][INFO  ]  nogpgcheck: False
[ceph_deploy.cli][INFO  ]  local_mirror  : None
[ceph_deploy.cli][INFO  ]  release   : luminous
[ceph_deploy.cli][INFO  ]  install_mon   : False
[ceph_deploy.cli][INFO  ]  gpg_url   : None
[ceph_deploy.install][DEBUG ] Installing stable version luminous on cluster 
ceph hosts admin1
[ceph_deploy.install][DEBUG ] Detecting platform for host admin1 ...
[admin1][DEBUG ] connection detected need for sudo
[admin1][DEBUG ] connected to host: admin1
[admin1][DEBUG ] detect platform information from remote host
[admin1][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: CentOS Linux 7.4.1708 Core
[admin1][INFO  ] installing Ceph on admin1
[admin1][INFO  ] Running command: sudo yum clean all
[admin1][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[admin1][DEBUG ] Cleaning repos: Ceph Ceph-noarch base ceph-noarch ceph-source 
epel extras
[admin1][DEBUG ]   : updates
[admin1][DEBUG ] Cleaning up everything
[admin1][DEBUG ] Maybe you want: rm -rf /var/cache/yum, to also free up space 
taken by orphaned data from disabled or removed repos
[admin1][DEBUG ] Cleaning up list of fastest mirrors
[admin1][INFO  ] Running command: sudo yum -y install epel-release
[admin1][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[admin1][DEBUG ] Determining fastest mirrors
[admin1][DEBUG ]  * base: mirror.netw.io
[admin1][DEBUG ]  * epel: 
anorien.csc.warwick.ac.uk
[admin1][DEBUG ]  * extras: 
anorien.csc.warwick.ac.uk

[ceph-users] ceps-deploy won't install luminous

2017-11-15 Thread Ragan, Tj (Dr.)
Hi All,

I feel like I’m doing something silly.  I’m spinning up a new cluster, and 
followed the instructions on the pre-flight and quick start here:

http://docs.ceph.com/docs/luminous/start/quick-start-preflight/
http://docs.ceph.com/docs/luminous/start/quick-ceph-deploy/

but ceph-deploy always installs Jewel.

ceph-deploy is version 1.5.39 and I’m running CentOS 7 (7-4.1708)

Any help would be appreciated.

-TJ Ragan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-27 Thread Ragan, Tj (Dr.)
Hi Haomai,

According to the documentation, and a brief test to confirm, the lz4 
compression plugin isn’t distributed in the official release.  I’ve tried 
asking google how to add it back to no avail, so how have you added the plugin? 
 Is it simply a matter of putting a symlink in the right place or will I have 
to recompile?

Any suggestions or pointers would be gratefully received.

-TJ Ragan



On 26 Oct 2017, at 07:44, Haomai Wang > 
wrote:


Stefan Priebe - Profihost AG 
>于2017年10月26日 周四17:06写道:
Hi Sage,

Am 25.10.2017 um 21:54 schrieb Sage Weil:
> On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> in the lumious release notes is stated that zstd is not supported by
>> bluestor due to performance reason. I'm wondering why btrfs instead
>> states that zstd is as fast as lz4 but compresses as good as zlib.
>>
>> Why is zlib than supported by bluestor? And why does btrfs / facebook
>> behave different?
>>
>> "BlueStore supports inline compression using zlib, snappy, or LZ4. (Ceph
>> also supports zstd for RGW compression but zstd is not recommended for
>> BlueStore for performance reasons.)"
>
> zstd will work but in our testing the performance wasn't great for
> bluestore in particular.  The problem was that for each compression run
> there is a relatively high start-up cost initializing the zstd
> context/state (IIRC a memset of a huge memory buffer) that dominated the
> execution time... primarily because bluestore is generally compressing
> pretty small chunks of data at a time, not big buffers or streams.
>
> Take a look at unittest_compression timings on compressing 16KB buffers
> (smaller than bluestore needs usually, but illustrated of the problem):
>
> [ RUN  ] Compressor/CompressorTest.compress_16384/0
> [plugin zlib (zlib/isal)]
> [   OK ] Compressor/CompressorTest.compress_16384/0 (294 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/1
> [plugin zlib (zlib/noisal)]
> [   OK ] Compressor/CompressorTest.compress_16384/1 (1755 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/2
> [plugin snappy 
> (snappy)]
> [   OK ] Compressor/CompressorTest.compress_16384/2 (169 ms)
> [ RUN  ] Compressor/CompressorTest.compress_16384/3
> [plugin zstd (zstd)]
> [   OK ] Compressor/CompressorTest.compress_16384/3 (4528 ms)
>
> It's an order of magnitude slower than zlib or snappy, which probably
> isn't acceptable--even if it is a bit smaller.
>
> We just updated to a newer zstd the other day but I haven't been paying
> attention to the zstd code changes.  When I was working on this the plugin
> was initially also misusing the zstd API, but it was also pointed out
> that the size of the memset is dependent on the compression level.
> Maybe a different (default) choice there woudl help.
>
> https://github.com/facebook/zstd/issues/408#issuecomment-252163241

thanks for the fast reply. Btrfs uses a default compression level of 3
but i think this is the default anyway.

Does the zstd plugin of ceph already uses the mentioned
ZSTD_resetCStream instead of creating and initializing a new one every time?

So if performance matters ceph would recommand snappy?


in our test, lz4 is better than snappy

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com