Re: [ceph-users] RBD cache being filled up in small increases instead of 4MB

2017-07-14 Thread Gregory Farnum
On Fri, Jul 14, 2017 at 3:43 PM, Ruben Rodriguez  wrote:
>
> I'm having an issue with small sequential reads (such as searching
> through source code files, etc), and I found that multiple small reads
> withing a 4MB boundary would fetch the same object from the OSD multiple
> times, as it gets inserted into the RBD cache partially.
>
> How to reproduce: rbd image accessed from a Qemu vm using virtio-scsi,
> writethrough cache on. Monitor with perf dump on the rbd client. The
> image is filled up with zeroes in advance. Rbd readahead is off.
>
> 1 - Small read from a previously unread section of the disk:
> dd if=/dev/sdb ibs=512 count=1 skip=41943040 iflag=skip_bytes
> Notes: dd cannot read less than 512 bytes. The skip is arbitrary to
> avoid the beginning of the disk, which would have been read at boot.
>
> Expected outcomes: perf dump should show a +1 increase on values rd,
> cache_ops_miss and op_r. This happens correctly.
> It should show a 4194304 increase in data_read as a whole object is put
> into the cache. Instead it increases by 4096. (not sure why 4096, btw).
>
> 2 - Small read from less than 4MB distance (in the example, +5000b).
> dd if=/dev/sdb ibs=512 count=1 skip=41948040 iflag=skip_bytes
> Expected outcomes: perf dump should show a +1 increase on cache_ops_hit.
> Instead cache_ops_miss increases.
> It should show a 4194304 increase in data_read as a whole object is put
> into the cache. Instead it increases by 4096.
> op_r should not increase. Instead it increases by one, indicating that
> the object was fetched again.
>
> My tests show that this could be causing a 6 to 20-fold performance loss
> in small sequential reads.
>
> Is it by design that the RBD cache only inserts the portion requested by
> the client instead of the whole last object fetched? Could it be a
> tunable in any of my layers (fs, block device, qemu, rbd...) that is
> preventing this?

I don't know the exact readahead default values in that stack, but
there's no general reason to think RBD (or any Ceph component) will
read a whole object at a time. In this case, you're asking for 512
bytes and it appears to have turned that into a 4KB read (probably the
virtual block size in use?), which seems pretty reasonable — if you
were asking for 512 bytes out of every 4MB and it was reading 4MB each
time, you'd probably be wondering why you were only getting 1/8192 the
expected bandwidth. ;)
-Greg

>
> Regards,
> --
> Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
> GPG Key: 05EF 1D2F FE61 747D 1FC8  27C3 7FAC 7D26 472F 4409
> https://fsf.org | https://gnu.org
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v10.2.9 Jewel released

2017-07-14 Thread Nathan Cutler

v10.2.9 Jewel released
==

This point release fixes a regression introduced in v10.2.8.

We recommend that all Jewel users upgrade.

For more detailed information, see the complete changelog[1]
and release notes[2].

Notable Changes
---

* cephfs: Damaged MDS with 10.2.8 (pr#16282, Nathan Cutler)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-10.2.9.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* For ceph-deploy, see 
http://docs.ceph.com/docs/master/install/install-ceph-deploy

* Release SHA1: 2ee413f77150c0f375ff6f10edd6c8f9c7d060d0

[1]: http://docs.ceph.com/docs/master/_downloads/v10.2.9.txt
[2]: http://ceph.com/releases/v10-2-9-jewel-released/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v10.2.8 Jewel released

2017-07-14 Thread Nathan Cutler

v10.2.8 Jewel released
==

This point release brought a number of important bugfixes in all major
components of Ceph. However, it also introduced a regression that
could cause MDS damage, and a new release, v10.2.9, was published to
address this.  Therefore, Jewel users should not upgrade to this
version – instead, we recommend upgrading directly to v10.2.9.

That being said, the v10.2.8 release notes do contain important
information, so please read on.

For more detailed information, refer to the complete changelog[1] and
the release notes[2].

OSD Removal Caveat
--

There was a bug introduced in Jewel (#19119) that broke the mapping
behavior when an “out” OSD that still existed in the CRUSH map was
removed with ‘osd rm’.  This could result in ‘misdirected op’ and
other errors. The bug is now fixed, but the fix itself introduces the
same risk because the behavior may vary between clients and OSDs. To
avoid problems, please ensure that all OSDs are removed from the CRUSH
map before deleting them. That is, be sure to do:

   ceph osd crush rm osd.123

before:

   ceph osd rm osd.123

Snap Trimmer Improvements
-

This release greatly improves control and throttling of the snap
trimmer. It introduces the “osd max trimming pgs” option (defaulting
to 2), which limits how many PGs on an OSD can be trimming snapshots
at a time. And it restores the safe use of the “osd snap trim sleep”
option, wihch defaults to 0 but otherwise adds the given number of
seconds in delay between every dispatch of trim operations to the
underlying system.

Other Notable Changes
-

* build/ops: “osd marked itself down” will not recognised if host runs
  mon + osd on shutdown/reboot (pr#13492, Boris Ranto)
* build/ops: ceph-base package missing dependency for psmisc
  (pr#13786, Nathan Cutler)
* build/ops: enable build of ceph-resource-agents package on rpm-based
  os (pr#13606, Nathan Cutler)
* build/ops: rbdmap.service not included in debian packaging
  (jewel-only) (pr#14383, Ken Dreyer)
* cephfs: Journaler may execute on_safe contexts prematurely
  (pr#15468, “Yan, Zheng”)
* cephfs: MDS assert failed when shutting down (issue#19204, pr#14683,
  John Spray)
* cephfs: MDS goes readonly writing backtrace for a file whose data
  pool has been removed (pr#14682, John Spray)
* cephfs: MDS server crashes due to inconsistent metadata (pr#14676,
  John Spray)
* cephfs: No output for ceph mds rmfailed 0 –yes-i-really-mean-it
  command (pr#14674, John Spray)
* cephfs: Test failure: test_data_isolated
  (tasks.cephfs.test_volume_client.TestVolumeClient) (pr#14685, “Yan,
  Zheng”)
* cephfs: Test failure: test_open_inode (issue#18661, pr#14669, John
  Spray)
* cephfs: The mount point break off when mds switch hanppened
  (pr#14679, Guan yunfei)
* cephfs: ceph-fuse does not recover after lost connection to MDS
  (pr#14698, Kefu Chai, Henrik Korkuc, Patrick Donnelly)
* cephfs: client: fix the cross-quota rename boundary check conditions
  (pr#14667, Greg Farnum)
* cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to
  a file (pr#14684, Yang Honggang)
* cephfs: non-local quota changes not visible until some IO is done
  (pr#15466, John Spray, Nathan Cutler)
* cephfs: normalize file open flags internally used by cephfs
  (pr#15000, Jan Fajerski, “Yan, Zheng”)
* common: monitor creation with IPv6 public network segfaults
  (pr#14324, Fabian Grünbichler)
* common: radosstriper: protect aio_write API from calls with 0 bytes
  (pr#13254, Sebastien Ponce)
* core: Objecter::epoch_barrier isn’t respected in _op_submit()
  (pr#14332, Ilya Dryomov)
* core: clear divergent_priors set off disk (issue#17916, pr#14596,
  Greg Farnum)
* core: improve snap trimming, enable restriction of parallelism
  (pr#14492, Samuel Just, Greg Farnum)
* core: os/filestore/HashIndex: be loud about splits (pr#13788, Dan
  van der Ster)
* core: os/filestore: fix clang static check warn use-after-free
  (pr#14044, liuchang0812, yaoning)
* core: transient jerasure unit test failures (issue#18070,
  issue#17951, pr#14701, Kefu Chai, Pan Liu, Loic Dachary, Jason
  Dillaman)
* core: two instances of omap_digest mismatch (issue#18533, pr#14204,
  Samuel Just, David Zafman)
* doc: Improvements to crushtool manpage (issue#19649, pr#14635, Loic
  Dachary, Nathan Cutler)
* doc: PendingReleaseNotes: note about 19119 (issue#19119, pr#13732,
  Sage Weil)
* doc: admin ops: fix the quota section (issue#19397, pr#14654, Chu,
  Hua-Rong)
* doc: radosgw-admin: add the ‘object stat’ command to usage
  (pr#13872, Pavan Rallabhandi)
* doc: rgw S3 create bucket should not do response in json (pr#13874,
  Abhishek Lekshmanan)
* fs: Invalid error code returned by MDS is causing a kernel client
  WARNING (pr#13831, Jan Fajerski, xie xingguo)
* librbd: Incomplete declaration for ContextWQ in librbd/Journal.h
  (pr#14152, Boris Ranto)
* librbd: Issues with C API image metadata retrieval functions
  (pr#14666, 

Re: [ceph-users] how to list and reset the scrub schedules

2017-07-14 Thread Gregory Farnum
On Fri, Jul 14, 2017 at 5:41 AM Dan van der Ster  wrote:

> Hi,
>
> Occasionally we want to change the scrub schedule for a pool or whole
> cluster, but we want to do this by injecting new settings without
> restarting every daemon.
>
> I've noticed that in jewel, changes to scrub_min/max_interval and
> deep_scrub_interval do not take immediate effect, presumably because
> the scrub schedules are calculated in advance for all the PGs on an
> OSD.
>
> Does anyone know how to list that scrub schedule for a given OSD?
>

I'm not aware of any "scrub schedule" as such, just the constraints around
when new scrubbing happens. What exactly were you doing previously that
isn't working now?


>
> And better yet, does anyone know a way to reset that schedule, so that
> the OSD generates a new one with the new configuration?
>
> (I've noticed that by chance setting sortbitwise triggers many scrubs
> -- maybe a new peering interval resets the scrub schedules?) Any
> non-destructive way to trigger a new peering interval on demand?
>
> Cheers,
>
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mgr create error No such file or directory:

2017-07-14 Thread Vasu Kulkarni
On Fri, Jul 14, 2017 at 10:37 AM, Oscar Segarra 
wrote:

> I'm testing on latest Jewell version I've found in repositories:
>
you can skip that command then, I will fix the document to add a note for
jewel or pre luminous build.


>
> [root@vdicnode01 yum.repos.d]# ceph --version
> ceph version 10.2.8 (f5b1f1fd7c0be0506ba73502a675de9d048b744e)
>
> thanks a lot!
>
> 2017-07-14 19:21 GMT+02:00 Vasu Kulkarni :
>
>> It is tested for master and is working fine, I will run those same tests
>> on luminous and check if there is an issue and update here. mgr create is
>> needed for luminous+ bulids only.
>>
>> On Fri, Jul 14, 2017 at 10:18 AM, Roger Brown 
>> wrote:
>>
>>> I've been trying to work through similar mgr issues for
>>> Xenial-Luminous...
>>>
>>> roger@desktop:~/ceph-cluster$ ceph-deploy mgr create mon1 nuc2
>>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>>> /home/roger/.cephdeploy.conf
>>> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /usr/bin/ceph-deploy mgr
>>> create mon1 nuc2
>>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>>> [ceph_deploy.cli][INFO  ]  username  : None
>>> [ceph_deploy.cli][INFO  ]  verbose   : False
>>> [ceph_deploy.cli][INFO  ]  mgr   : [('mon1',
>>> 'mon1'), ('nuc2', 'nuc2')]
>>> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>>> [ceph_deploy.cli][INFO  ]  subcommand: create
>>> [ceph_deploy.cli][INFO  ]  quiet : False
>>> [ceph_deploy.cli][INFO  ]  cd_conf   :
>>> 
>>> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>>> [ceph_deploy.cli][INFO  ]  func  : >> at 0x7f25b4772668>
>>> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>>> [ceph_deploy.cli][INFO  ]  default_release   : False
>>> [ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts mon1:mon1
>>> nuc2:nuc2
>>> [mon1][DEBUG ] connection detected need for sudo
>>> [mon1][DEBUG ] connected to host: mon1
>>> [mon1][DEBUG ] detect platform information from remote host
>>> [mon1][DEBUG ] detect machine type
>>> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
>>> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
>>> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to mon1
>>> [mon1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>>> [mon1][DEBUG ] create path if it doesn't exist
>>> [mon1][INFO  ] Running command: sudo ceph --cluster ceph --name
>>> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
>>> auth get-or-create mgr.mon1 mon allow profile mgr osd allow * mds allow *
>>> -o /var/lib/ceph/mgr/ceph-mon1/keyring
>>> [mon1][ERROR ] 2017-07-14 11:17:19.667418 7f309613f700  0 librados:
>>> client.bootstrap-mgr authentication error (22) Invalid argument
>>> [mon1][ERROR ] (22, 'error connecting to the cluster')
>>> [mon1][ERROR ] exit code from command was: 1
>>> [ceph_deploy.mgr][ERROR ] could not create mgr
>>> [nuc2][DEBUG ] connection detected need for sudo
>>> [nuc2][DEBUG ] connected to host: nuc2
>>> [nuc2][DEBUG ] detect platform information from remote host
>>> [nuc2][DEBUG ] detect machine type
>>> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
>>> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
>>> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to nuc2
>>> [nuc2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>>> [nuc2][DEBUG ] create path if it doesn't exist
>>> [nuc2][INFO  ] Running command: sudo ceph --cluster ceph --name
>>> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
>>> auth get-or-create mgr.nuc2 mon allow profile mgr osd allow * mds allow *
>>> -o /var/lib/ceph/mgr/ceph-nuc2/keyring
>>> [nuc2][ERROR ] 2017-07-14 17:17:21.800166 7fe344f32700  0 librados:
>>> client.bootstrap-mgr authentication error (22) Invalid argument
>>> [nuc2][ERROR ] (22, 'error connecting to the cluster')
>>> [nuc2][ERROR ] exit code from command was: 1
>>> [ceph_deploy.mgr][ERROR ] could not create mgr
>>> [ceph_deploy][ERROR ] GenericError: Failed to create 2 MGRs
>>> roger@desktop:~/ceph-cluster$
>>>
>>>
>>>
>>> On Fri, Jul 14, 2017 at 11:01 AM Oscar Segarra 
>>> wrote:
>>>
 Hi,

 I'm following the instructions of the web (
 http://docs.ceph.com/docs/master/start/quick-ceph-deploy/) and I'm
 trying to create a manager on my first node.

 In my environment I have 2 nodes:

 - vdicnode01 (mon, mgr and osd)
 - vdicnode02 (osd)

 Each server has to NIC, the public and the private where all ceph
 trafic will go over.

 I have created .local entries in /etc/hosts:

 192.168.100.101   vdicnode01.local
 192.168.100.102   vdicnode02.local

 Public names are resolved via DNS.

 When I try to create the mgr in a 

Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-14 Thread Dan van der Ster
You probably have osd_max_scrubs=1 and the PG just isn't getting a
slot to start.
Here's a little trick to get that going right away:

ceph osd set noscrub
ceph osd set nodeep-scrub
ceph tell osd.* injectargs -- --osd_max_scrubs 2
ceph pg deep-scrub 22.1611
... wait until it starts scrubbing ...
ceph tell osd.* injectargs -- --osd_max_scrubs 1
ceph osd unset nodeep-scrub
ceph osd unset noscrub

.. Dan


On Fri, Jul 14, 2017 at 7:45 PM, Aaron Bassett
 wrote:
> I issued the pg deep scrub command ~24 hours ago and nothing has changed. I
> see nothing in the active osd's log about kicking off the scrub.
>
> On Jul 13, 2017, at 2:24 PM, David Turner  wrote:
>
> # ceph pg deep-scrub 22.1611
>
> On Thu, Jul 13, 2017 at 1:00 PM Aaron Bassett 
> wrote:
>>
>> I'm not sure if I'm doing something wrong, but when I run this:
>>
>> # ceph osd deep-scrub 294
>>
>>
>> All i get in the osd log is:
>>
>> 2017-07-13 16:57:53.782841 7f40d089f700  0 log_channel(cluster) log [INF]
>> : 21.1ae9 deep-scrub starts
>> 2017-07-13 16:57:53.785261 7f40ce09a700  0 log_channel(cluster) log [INF]
>> : 21.1ae9 deep-scrub ok
>>
>>
>> each time I run it, its the same pg.
>>
>> Is there some reason its not scrubbing all the pgs?
>>
>> Aaron
>>
>> > On Jul 13, 2017, at 10:29 AM, Aaron Bassett
>> >  wrote:
>> >
>> > Ok good to hear, I just kicked one off on the acting primary so I guess
>> > I'll be patient now...
>> >
>> > Thanks,
>> > Aaron
>> >
>> >> On Jul 13, 2017, at 10:28 AM, Dan van der Ster 
>> >> wrote:
>> >>
>> >> On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
>> >>  wrote:
>> >>> Because it was a read error I check SMART stats for that osd's disk
>> >>> and sure enough, it had some uncorrected read errors. In order to stop it
>> >>> from causing more problems > I stopped the daemon to let ceph recover 
>> >>> from
>> >>> the other osds. The cluster has now finished rebalancing, but remains in 
>> >>> ERR
>> >>> state as it still thinks this pg is inconsistent.
>> >>
>> >> It should clear up after you trigger another deep-scrub on that PG.
>> >>
>> >> Cheers, Dan
>> >
>>
>> CONFIDENTIALITY NOTICE
>> This e-mail message and any attachments are only for the use of the
>> intended recipient and may contain information that is privileged,
>> confidential or exempt from disclosure under applicable law. If you are not
>> the intended recipient, any disclosure, distribution or other use of this
>> e-mail message or attachments is prohibited. If you have received this
>> e-mail message in error, please delete and notify the sender immediately.
>> Thank you.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck inconsistent, but appears ok?

2017-07-14 Thread Aaron Bassett
I issued the pg deep scrub command ~24 hours ago and nothing has changed. I see 
nothing in the active osd's log about kicking off the scrub.

On Jul 13, 2017, at 2:24 PM, David Turner 
> wrote:

# ceph pg deep-scrub 22.1611

On Thu, Jul 13, 2017 at 1:00 PM Aaron Bassett 
> wrote:
I'm not sure if I'm doing something wrong, but when I run this:

# ceph osd deep-scrub 294


All i get in the osd log is:

2017-07-13 16:57:53.782841 7f40d089f700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub starts
2017-07-13 16:57:53.785261 7f40ce09a700  0 log_channel(cluster) log [INF] : 
21.1ae9 deep-scrub ok


each time I run it, its the same pg.

Is there some reason its not scrubbing all the pgs?

Aaron

> On Jul 13, 2017, at 10:29 AM, Aaron Bassett 
> > wrote:
>
> Ok good to hear, I just kicked one off on the acting primary so I guess I'll 
> be patient now...
>
> Thanks,
> Aaron
>
>> On Jul 13, 2017, at 10:28 AM, Dan van der Ster 
>> > wrote:
>>
>> On Thu, Jul 13, 2017 at 4:23 PM, Aaron Bassett
>> > wrote:
>>> Because it was a read error I check SMART stats for that osd's disk and 
>>> sure enough, it had some uncorrected read errors. In order to stop it from 
>>> causing more problems > I stopped the daemon to let ceph recover from the 
>>> other osds. The cluster has now finished rebalancing, but remains in ERR 
>>> state as it still thinks this pg is inconsistent.
>>
>> It should clear up after you trigger another deep-scrub on that PG.
>>
>> Cheers, Dan
>

CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy mgr create error No such file or directory:

2017-07-14 Thread Oscar Segarra
I'm testing on latest Jewell version I've found in repositories:

[root@vdicnode01 yum.repos.d]# ceph --version
ceph version 10.2.8 (f5b1f1fd7c0be0506ba73502a675de9d048b744e)

thanks a lot!

2017-07-14 19:21 GMT+02:00 Vasu Kulkarni :

> It is tested for master and is working fine, I will run those same tests
> on luminous and check if there is an issue and update here. mgr create is
> needed for luminous+ bulids only.
>
> On Fri, Jul 14, 2017 at 10:18 AM, Roger Brown 
> wrote:
>
>> I've been trying to work through similar mgr issues for Xenial-Luminous...
>>
>> roger@desktop:~/ceph-cluster$ ceph-deploy mgr create mon1 nuc2
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /home/roger/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /usr/bin/ceph-deploy mgr
>> create mon1 nuc2
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username  : None
>> [ceph_deploy.cli][INFO  ]  verbose   : False
>> [ceph_deploy.cli][INFO  ]  mgr   : [('mon1',
>> 'mon1'), ('nuc2', 'nuc2')]
>> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>> [ceph_deploy.cli][INFO  ]  subcommand: create
>> [ceph_deploy.cli][INFO  ]  quiet : False
>> [ceph_deploy.cli][INFO  ]  cd_conf   :
>> 
>> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>> [ceph_deploy.cli][INFO  ]  func  : > at 0x7f25b4772668>
>> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>> [ceph_deploy.cli][INFO  ]  default_release   : False
>> [ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts mon1:mon1
>> nuc2:nuc2
>> [mon1][DEBUG ] connection detected need for sudo
>> [mon1][DEBUG ] connected to host: mon1
>> [mon1][DEBUG ] detect platform information from remote host
>> [mon1][DEBUG ] detect machine type
>> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
>> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
>> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to mon1
>> [mon1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>> [mon1][DEBUG ] create path if it doesn't exist
>> [mon1][INFO  ] Running command: sudo ceph --cluster ceph --name
>> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
>> auth get-or-create mgr.mon1 mon allow profile mgr osd allow * mds allow *
>> -o /var/lib/ceph/mgr/ceph-mon1/keyring
>> [mon1][ERROR ] 2017-07-14 11:17:19.667418 7f309613f700  0 librados:
>> client.bootstrap-mgr authentication error (22) Invalid argument
>> [mon1][ERROR ] (22, 'error connecting to the cluster')
>> [mon1][ERROR ] exit code from command was: 1
>> [ceph_deploy.mgr][ERROR ] could not create mgr
>> [nuc2][DEBUG ] connection detected need for sudo
>> [nuc2][DEBUG ] connected to host: nuc2
>> [nuc2][DEBUG ] detect platform information from remote host
>> [nuc2][DEBUG ] detect machine type
>> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
>> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
>> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to nuc2
>> [nuc2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>> [nuc2][DEBUG ] create path if it doesn't exist
>> [nuc2][INFO  ] Running command: sudo ceph --cluster ceph --name
>> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
>> auth get-or-create mgr.nuc2 mon allow profile mgr osd allow * mds allow *
>> -o /var/lib/ceph/mgr/ceph-nuc2/keyring
>> [nuc2][ERROR ] 2017-07-14 17:17:21.800166 7fe344f32700  0 librados:
>> client.bootstrap-mgr authentication error (22) Invalid argument
>> [nuc2][ERROR ] (22, 'error connecting to the cluster')
>> [nuc2][ERROR ] exit code from command was: 1
>> [ceph_deploy.mgr][ERROR ] could not create mgr
>> [ceph_deploy][ERROR ] GenericError: Failed to create 2 MGRs
>> roger@desktop:~/ceph-cluster$
>>
>>
>>
>> On Fri, Jul 14, 2017 at 11:01 AM Oscar Segarra 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm following the instructions of the web (http://docs.ceph.com/docs/mas
>>> ter/start/quick-ceph-deploy/) and I'm trying to create a manager on my
>>> first node.
>>>
>>> In my environment I have 2 nodes:
>>>
>>> - vdicnode01 (mon, mgr and osd)
>>> - vdicnode02 (osd)
>>>
>>> Each server has to NIC, the public and the private where all ceph trafic
>>> will go over.
>>>
>>> I have created .local entries in /etc/hosts:
>>>
>>> 192.168.100.101   vdicnode01.local
>>> 192.168.100.102   vdicnode02.local
>>>
>>> Public names are resolved via DNS.
>>>
>>> When I try to create the mgr in a fresh install I get the following
>>> error:
>>>
>>> [vdicceph@vdicnode01 ceph]$ ceph-deploy --username vdicceph mgr create
>>> vdicnode01.local
>>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>>> /home/vdicceph/.cephdeploy.conf
>>> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): 

Re: [ceph-users] ceph-deploy mgr create error No such file or directory:

2017-07-14 Thread Vasu Kulkarni
It is tested for master and is working fine, I will run those same tests on
luminous and check if there is an issue and update here. mgr create is
needed for luminous+ bulids only.

On Fri, Jul 14, 2017 at 10:18 AM, Roger Brown  wrote:

> I've been trying to work through similar mgr issues for Xenial-Luminous...
>
> roger@desktop:~/ceph-cluster$ ceph-deploy mgr create mon1 nuc2
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/roger/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /usr/bin/ceph-deploy mgr
> create mon1 nuc2
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  mgr   : [('mon1',
> 'mon1'), ('nuc2', 'nuc2')]
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  subcommand: create
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   :
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  func  :  at 0x7f25b4772668>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts mon1:mon1
> nuc2:nuc2
> [mon1][DEBUG ] connection detected need for sudo
> [mon1][DEBUG ] connected to host: mon1
> [mon1][DEBUG ] detect platform information from remote host
> [mon1][DEBUG ] detect machine type
> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to mon1
> [mon1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [mon1][DEBUG ] create path if it doesn't exist
> [mon1][INFO  ] Running command: sudo ceph --cluster ceph --name
> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
> auth get-or-create mgr.mon1 mon allow profile mgr osd allow * mds allow *
> -o /var/lib/ceph/mgr/ceph-mon1/keyring
> [mon1][ERROR ] 2017-07-14 11:17:19.667418 7f309613f700  0 librados:
> client.bootstrap-mgr authentication error (22) Invalid argument
> [mon1][ERROR ] (22, 'error connecting to the cluster')
> [mon1][ERROR ] exit code from command was: 1
> [ceph_deploy.mgr][ERROR ] could not create mgr
> [nuc2][DEBUG ] connection detected need for sudo
> [nuc2][DEBUG ] connected to host: nuc2
> [nuc2][DEBUG ] detect platform information from remote host
> [nuc2][DEBUG ] detect machine type
> [ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
> [ceph_deploy.mgr][DEBUG ] remote host will use systemd
> [ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to nuc2
> [nuc2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> [nuc2][DEBUG ] create path if it doesn't exist
> [nuc2][INFO  ] Running command: sudo ceph --cluster ceph --name
> client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
> auth get-or-create mgr.nuc2 mon allow profile mgr osd allow * mds allow *
> -o /var/lib/ceph/mgr/ceph-nuc2/keyring
> [nuc2][ERROR ] 2017-07-14 17:17:21.800166 7fe344f32700  0 librados:
> client.bootstrap-mgr authentication error (22) Invalid argument
> [nuc2][ERROR ] (22, 'error connecting to the cluster')
> [nuc2][ERROR ] exit code from command was: 1
> [ceph_deploy.mgr][ERROR ] could not create mgr
> [ceph_deploy][ERROR ] GenericError: Failed to create 2 MGRs
> roger@desktop:~/ceph-cluster$
>
>
>
> On Fri, Jul 14, 2017 at 11:01 AM Oscar Segarra 
> wrote:
>
>> Hi,
>>
>> I'm following the instructions of the web (http://docs.ceph.com/docs/
>> master/start/quick-ceph-deploy/) and I'm trying to create a manager on
>> my first node.
>>
>> In my environment I have 2 nodes:
>>
>> - vdicnode01 (mon, mgr and osd)
>> - vdicnode02 (osd)
>>
>> Each server has to NIC, the public and the private where all ceph trafic
>> will go over.
>>
>> I have created .local entries in /etc/hosts:
>>
>> 192.168.100.101   vdicnode01.local
>> 192.168.100.102   vdicnode02.local
>>
>> Public names are resolved via DNS.
>>
>> When I try to create the mgr in a fresh install I get the following error:
>>
>> [vdicceph@vdicnode01 ceph]$ ceph-deploy --username vdicceph mgr create
>> vdicnode01.local
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /home/vdicceph/.cephdeploy.conf
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /bin/ceph-deploy --username
>> vdicceph mgr create vdicnode01.local
>> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> [ceph_deploy.cli][INFO  ]  username  : vdicceph
>> [ceph_deploy.cli][INFO  ]  verbose   : False
>> [ceph_deploy.cli][INFO  ]  mgr   :
>> [('vdicnode01.local', 'vdicnode01.local')]
>> 

Re: [ceph-users] ceph-deploy mgr create error No such file or directory:

2017-07-14 Thread Roger Brown
I've been trying to work through similar mgr issues for Xenial-Luminous...

roger@desktop:~/ceph-cluster$ ceph-deploy mgr create mon1 nuc2
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/roger/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.38): /usr/bin/ceph-deploy mgr create
mon1 nuc2
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  mgr   : [('mon1',
'mon1'), ('nuc2', 'nuc2')]
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts mon1:mon1
nuc2:nuc2
[mon1][DEBUG ] connection detected need for sudo
[mon1][DEBUG ] connected to host: mon1
[mon1][DEBUG ] detect platform information from remote host
[mon1][DEBUG ] detect machine type
[ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
[ceph_deploy.mgr][DEBUG ] remote host will use systemd
[ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to mon1
[mon1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[mon1][DEBUG ] create path if it doesn't exist
[mon1][INFO  ] Running command: sudo ceph --cluster ceph --name
client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
auth get-or-create mgr.mon1 mon allow profile mgr osd allow * mds allow *
-o /var/lib/ceph/mgr/ceph-mon1/keyring
[mon1][ERROR ] 2017-07-14 11:17:19.667418 7f309613f700  0 librados:
client.bootstrap-mgr authentication error (22) Invalid argument
[mon1][ERROR ] (22, 'error connecting to the cluster')
[mon1][ERROR ] exit code from command was: 1
[ceph_deploy.mgr][ERROR ] could not create mgr
[nuc2][DEBUG ] connection detected need for sudo
[nuc2][DEBUG ] connected to host: nuc2
[nuc2][DEBUG ] detect platform information from remote host
[nuc2][DEBUG ] detect machine type
[ceph_deploy.mgr][INFO  ] Distro info: Ubuntu 16.04 xenial
[ceph_deploy.mgr][DEBUG ] remote host will use systemd
[ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to nuc2
[nuc2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[nuc2][DEBUG ] create path if it doesn't exist
[nuc2][INFO  ] Running command: sudo ceph --cluster ceph --name
client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring
auth get-or-create mgr.nuc2 mon allow profile mgr osd allow * mds allow *
-o /var/lib/ceph/mgr/ceph-nuc2/keyring
[nuc2][ERROR ] 2017-07-14 17:17:21.800166 7fe344f32700  0 librados:
client.bootstrap-mgr authentication error (22) Invalid argument
[nuc2][ERROR ] (22, 'error connecting to the cluster')
[nuc2][ERROR ] exit code from command was: 1
[ceph_deploy.mgr][ERROR ] could not create mgr
[ceph_deploy][ERROR ] GenericError: Failed to create 2 MGRs
roger@desktop:~/ceph-cluster$



On Fri, Jul 14, 2017 at 11:01 AM Oscar Segarra 
wrote:

> Hi,
>
> I'm following the instructions of the web (
> http://docs.ceph.com/docs/master/start/quick-ceph-deploy/) and I'm trying
> to create a manager on my first node.
>
> In my environment I have 2 nodes:
>
> - vdicnode01 (mon, mgr and osd)
> - vdicnode02 (osd)
>
> Each server has to NIC, the public and the private where all ceph trafic
> will go over.
>
> I have created .local entries in /etc/hosts:
>
> 192.168.100.101   vdicnode01.local
> 192.168.100.102   vdicnode02.local
>
> Public names are resolved via DNS.
>
> When I try to create the mgr in a fresh install I get the following error:
>
> [vdicceph@vdicnode01 ceph]$ ceph-deploy --username vdicceph mgr create
> vdicnode01.local
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/vdicceph/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /bin/ceph-deploy --username
> vdicceph mgr create vdicnode01.local
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : vdicceph
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  mgr   :
> [('vdicnode01.local', 'vdicnode01.local')]
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
> [ceph_deploy.cli][INFO  ]  subcommand: create
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   :
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  func  :  at 0x1916848>
> [ceph_deploy.cli][INFO  ]  ceph_conf  

Re: [ceph-users] Stealth Jewel release?

2017-07-14 Thread David Turner
Is there going to be an announcement for 10.2.9 either? I haven't seen
anything other than users noticing the packages.

On Fri, Jul 14, 2017, 10:30 AM Martin Palma  wrote:

> Thank you for the clarification and yes we saw that v10.2.9 was just
> released. :-)
>
> Best,
> Martin
>
> On Fri, Jul 14, 2017 at 3:53 PM, Patrick Donnelly 
> wrote:
> > On Fri, Jul 14, 2017 at 12:26 AM, Martin Palma  wrote:
> >> So only the ceph-mds is affected? Let's say if we have mons and osds
> >> on 10.2.8 and the MDS on 10.2.6 or 10.2.7 we would be "safe"?
> >
> > Yes, only the MDS was affected.
> >
> > As Udo mentioned, v10.2.9 is out so feel free to upgrade to that instead.
> >
> > --
> > Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster network question

2017-07-14 Thread David Turner
Only the osds use the dedicated cluster network.  Ping the mons and mds
services on the network will do nothing.

On Fri, Jul 14, 2017, 11:39 AM Laszlo Budai  wrote:

> Dear all,
>
> I'm reading the docs at
> http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
> regarding the cluster network and I wonder which nodes are connected to the
> dedicated cluster network?
>
> The digram on the mentioned page only shows the OSDs connected to the
> cluster network, while the text says: "To support two networks, each Ceph
> Node will need to have more than one NIC." - which would mean that OSD +
> MON + MSD all should be connected to the dedicated cluster network. Which
> one is correct? Can I have the dedicated cluster network only for the OSDs?
> while the MONs are only connected to the public net?
>
> Thank you!
> Laszlo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mount rbd

2017-07-14 Thread Nick Fisk


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Jason Dillaman
> Sent: 14 July 2017 16:40
> To: li...@marcelofrota.info
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph mount rbd
> 
> On Fri, Jul 14, 2017 at 9:44 AM,   wrote:
> > Gonzalo,
> >
> >
> >
> > You are right, i told so much about my enviroment actual and maybe i
> > didn't know explain my problem the better form, with ceph in the
> > moment, mutiple hosts clients can mount and write datas in my system
> > and this is one problem, because i could have filesystem corruption.
> >
> >
> >
> > Example, today, if runing the comand in two machines in the same time,
> > it will work.
> >
> >
> >
> > mount /dev/rbd0 /mnt/veeamrepo
> >
> > cd /mnt/veeamrepo ; touch testfile.txt
> >
> >
> >
> > I need ensure, only one machine will can execute this.
> >
> 
> A user could do the same thing with any number of remote block devices (i.e. 
> I could map an iSCSI target multiple times). As I said
> before, you can use the "exclusive" option available since kernel 4.12, roll 
> your own solution using the advisory locks available from
> the rbd CLI, or just use CephFS if you want to be able to access a file 
> system on multiple hosts.

Pacemaker, will also prevent a RBD to be mounted multiple times, if you want to 
manage the fencing outside of Ceph.

> 
> >
> > Thanks a lot,
> >
> > Marcelo
> >
> >
> > Em 14/07/2017, Gonzalo Aguilar Delgado 
> > escreveu:
> >
> >
> >> Hi,
> >>
> >> Why you would like to maintain copies by yourself. You replicate on
> >> ceph and then on different files inside ceph? Let ceph take care of 
> >> counting.
> >> Create a pool with 3 or more copies and let ceph take care of what's
> >> stored and where.
> >>
> >> Best regards,
> >>
> >>
> >> El 13/07/17 a las 17:06, li...@marcelofrota.info escribió:
> >> >
> >> > I will explain More about my system actual, in the moment i have 2
> >> > machines using drbd in mode master/slave and i running the
> >> > aplication in machine master, but existing 2 questions importants
> >> > in my enviroment with drbd actualy :
> >> >
> >> > 1 - If machine one is master and mounting partitions, the slave
> >> > don't can mount the system, Unless it happens one problem in
> >> > machine master, this is one mode, to prevent write in filesystem
> >> > incorrect
> >> >
> >> > 2 - When i write data in machine master in drbd, the drbd write
> >> > datas in slave machine Automatically, with this, if one problem
> >> > happens in node master, the machine slave have coppy the data.
> >> >
> >> > In the moment, in my enviroment testing with ceph, using the
> >> > version
> >> > 4.10 of kernel and i mount the system in two machines in the same
> >> > time, in production enviroment, i could serious problem with this
> >> > comportament.
> >> >
> >> > How can i use the ceph and Ensure that I could get these 2
> >> > behaviors kept in a new environment with Ceph?
> >> >
> >> > Thanks a lot,
> >> >
> >> > Marcelo
> >> >
> >> >
> >> > Em 28/06/2017, Jason Dillaman  escreveu:
> >> > > ... additionally, the forthcoming 4.12 kernel release will
> >> > > support non-cooperative exclusive locking. By default, since 4.9,
> >> > > when the exclusive-lock feature is enabled, only a single client
> >> > > can write to
> >> > the
> >> > > block device at a time -- but they will cooperatively pass the
> >> > > lock
> >> > back
> >> > > and forth upon write request. With the new "rbd map" option, you
> >> > > can
> >> > map a
> >> > > image on exactly one host and prevent other hosts from mapping
> >> > > the
> >> > image.
> >> > > If that host should die, the exclusive-lock will automatically
> >> > > become available to other hosts for mapping.
> >> > >
> >> > > Of course, I always have to ask the use-case behind mapping the
> >> > > same
> >> > image
> >> > > on multiple hosts. Perhaps CephFS would be a better fit if you
> >> > > are
> >> > trying
> >> > > to serve out a filesystem?
> >> > >
> >> > > On Wed, Jun 28, 2017 at 6:25 PM, Maged Mokhtar
> >> >  wrote:
> >> > >
> >> > > > On 2017-06-28 22:55, li...@marcelofrota.info wrote:
> >> > > >
> >> > > > Hi People,
> >> > > >
> >> > > > I am testing the new enviroment, with ceph + rbd with ubuntu
> >> > 16.04, and i
> >> > > > have one question.
> >> > > >
> >> > > > I have my cluster ceph and mount the using the comands to ceph
> >> > > > in
> >> > my linux
> >> > > > enviroment :
> >> > > >
> >> > > > rbd create veeamrepo --size 20480 rbd --image veeamrepo info
> >> > > > modprobe rbd rbd map veeamrepo rbd feature disable veeamrepo
> >> > > > exclusive-lock object-map fast-diff deep-flatten mkdir
> >> > > > /mnt/veeamrepo mount /dev/rbd0 /mnt/veeamrepo
> >> > > >
> >> > > > The comands work fine, but i have one problem, in the moment, i
> >> > can mount
> >> > > > the /mnt/veeamrepo in the same 

Re: [ceph-users] Ceph mount rbd

2017-07-14 Thread Jason Dillaman
On Fri, Jul 14, 2017 at 9:44 AM,   wrote:
> Gonzalo,
>
>
>
> You are right, i told so much about my enviroment actual and maybe i didn't
> know explain my problem the better form, with ceph in the moment, mutiple
> hosts clients can mount and write datas in my system and this is one
> problem, because i could have filesystem corruption.
>
>
>
> Example, today, if runing the comand in two machines in the same time, it
> will work.
>
>
>
> mount /dev/rbd0 /mnt/veeamrepo
>
> cd /mnt/veeamrepo ; touch testfile.txt
>
>
>
> I need ensure, only one machine will can execute this.
>

A user could do the same thing with any number of remote block devices
(i.e. I could map an iSCSI target multiple times). As I said before,
you can use the "exclusive" option available since kernel 4.12, roll
your own solution using the advisory locks available from the rbd CLI,
or just use CephFS if you want to be able to access a file system on
multiple hosts.

>
> Thanks a lot,
>
> Marcelo
>
>
> Em 14/07/2017, Gonzalo Aguilar Delgado 
> escreveu:
>
>
>> Hi,
>>
>> Why you would like to maintain copies by yourself. You replicate on ceph
>> and then on different files inside ceph? Let ceph take care of counting.
>> Create a pool with 3 or more copies and let ceph take care of what's
>> stored and where.
>>
>> Best regards,
>>
>>
>> El 13/07/17 a las 17:06, li...@marcelofrota.info escribió:
>> >
>> > I will explain More about my system actual, in the moment i have 2
>> > machines using drbd in mode master/slave and i running the aplication
>> > in machine master, but existing 2 questions importants in my
>> > enviroment with drbd actualy :
>> >
>> > 1 - If machine one is master and mounting partitions, the slave don't
>> > can mount the system, Unless it happens one problem in machine master,
>> > this is one mode, to prevent write in filesystem incorrect
>> >
>> > 2 - When i write data in machine master in drbd, the drbd write datas
>> > in slave machine Automatically, with this, if one problem happens in
>> > node master, the machine slave have coppy the data.
>> >
>> > In the moment, in my enviroment testing with ceph, using the version
>> > 4.10 of kernel and i mount the system in two machines in the same
>> > time, in production enviroment, i could serious problem with this
>> > comportament.
>> >
>> > How can i use the ceph and Ensure that I could get these 2 behaviors
>> > kept in a new environment with Ceph?
>> >
>> > Thanks a lot,
>> >
>> > Marcelo
>> >
>> >
>> > Em 28/06/2017, Jason Dillaman  escreveu:
>> > > ... additionally, the forthcoming 4.12 kernel release will support
>> > > non-cooperative exclusive locking. By default, since 4.9, when the
>> > > exclusive-lock feature is enabled, only a single client can write to
>> > the
>> > > block device at a time -- but they will cooperatively pass the lock
>> > back
>> > > and forth upon write request. With the new "rbd map" option, you can
>> > map a
>> > > image on exactly one host and prevent other hosts from mapping the
>> > image.
>> > > If that host should die, the exclusive-lock will automatically become
>> > > available to other hosts for mapping.
>> > >
>> > > Of course, I always have to ask the use-case behind mapping the same
>> > image
>> > > on multiple hosts. Perhaps CephFS would be a better fit if you are
>> > trying
>> > > to serve out a filesystem?
>> > >
>> > > On Wed, Jun 28, 2017 at 6:25 PM, Maged Mokhtar
>> >  wrote:
>> > >
>> > > > On 2017-06-28 22:55, li...@marcelofrota.info wrote:
>> > > >
>> > > > Hi People,
>> > > >
>> > > > I am testing the new enviroment, with ceph + rbd with ubuntu
>> > 16.04, and i
>> > > > have one question.
>> > > >
>> > > > I have my cluster ceph and mount the using the comands to ceph in
>> > my linux
>> > > > enviroment :
>> > > >
>> > > > rbd create veeamrepo --size 20480
>> > > > rbd --image veeamrepo info
>> > > > modprobe rbd
>> > > > rbd map veeamrepo
>> > > > rbd feature disable veeamrepo exclusive-lock object-map fast-diff
>> > > > deep-flatten
>> > > > mkdir /mnt/veeamrepo
>> > > > mount /dev/rbd0 /mnt/veeamrepo
>> > > >
>> > > > The comands work fine, but i have one problem, in the moment, i
>> > can mount
>> > > > the /mnt/veeamrepo in the same time in 2 machines, and this is a
>> > bad option
>> > > > for me in the moment, because this could generate one filesystem
>> > corrupt.
>> > > >
>> > > > I need only one machine to be allowed to mount and write at a time.
>> > > >
>> > > > Example if machine1 mount the /mnt/veeamrepo and machine2 try
>> > mount, one
>> > > > error would be displayed, show message the machine can not mount,
>> > because
>> > > > the system already mounted in machine1.
>> > > >
>> > > > Someone, could help-me with this or give some tips, for solution my
>> > > > problem. ?
>> > > >
>> > > > Thanks a lot
>> > > >
>> > > > ___
>> > > > ceph-users mailing 

[ceph-users] cluster network question

2017-07-14 Thread Laszlo Budai

Dear all,

I'm reading the docs at 
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/ 
regarding the cluster network and I wonder which nodes are connected to the 
dedicated cluster network?

The digram on the mentioned page only shows the OSDs connected to the cluster network, 
while the text says: "To support two networks, each Ceph Node will need to have more 
than one NIC." - which would mean that OSD + MON + MSD all should be connected to 
the dedicated cluster network. Which one is correct? Can I have the dedicated cluster 
network only for the OSDs? while the MONs are only connected to the public net?

Thank you!
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Sage Weil
On Fri, 14 Jul 2017, Lars Marowsky-Bree wrote:
> On 2017-07-14T14:12:08, Sage Weil  wrote:
> 
> > > Any thoughts on how to mitigate this, or on whether I got this all wrong 
> > > and
> > > am missing a crucial detail that blows this wall of text away, please let 
> > > me
> > > know.
> > I don't know; the requirement that mons be upgraded before OSDs doesn't 
> > seem that unreasonable to me.  That might be slightly more painful in a 
> > hyperconverged scenario (osds and mons on the same host), but it should 
> > just require some admin TLC (restart mon daemons instead of 
> > rebooting).
> 
> I think it's quite unreasonable, to be quite honest. Collocated MONs
> with OSDs is very typical for smaller cluster environments.

Yes, but how many of those clusters can only upgrade by updating the 
packages and rebooting?  Our documented procedures have always recommended 
upgrading the packages, then restarting either mons or osds first and to 
my recollection nobody has complained.  TBH my first encounter with the 
"reboot on upgrade" procedure in the Linux world was with Fedora (which I 
just recently switched to for my desktop)--and FWIW it felt very 
anachronistic.

But regardless, the real issue is this is a trade-off between the testing 
and software complexity burden vs user flexibility.  Enforcing an upgrade 
order means we have less to test and have greater confidence the user 
won't see something we haven't.  It also means, in this case, that we can 
rip out out a ton of legacy code in luminous without having to keep 
compatibility workarounds in place for another whole LTS cycle (a year!).  
That reduces code complexity, improves quality, and improves velocity.  
The downside is that the upgrade procedures has to be done in a particular 
order.

Honestly, though, I think it is a good idea for operators to be 
careful with their upgrades anyway.  They should upgrade just mons, let 
cluster stabilize, and make sure things are okay (e.g., no new 
health warnings saying they have to 'ceph osd set sortbitwise') before 
continuing.

Also, although I think it's a good idea to do the mon upgrade relatively 
quickly (one after the other until they are upgraded), the OSD upgrade can 
be stretched out longer.  (We do pretty thorough thrashing tests with 
mixed-version OSD clusters, but go through the mon upgrades pretty 
quickly.)
 
> > Is there something in some distros that *requires* a reboot in order to 
> > upgrade packages?
> 
> Not necessarily.
> 
> *But* once we've upgraded the packages, a failure or reboot might
> trigger this.

True, but this is rare, and even so the worst that can happen in this 
case is the OSDs don't come up until the other mons are upgrade.  If the 
admin plans to upgrade the mons in succession without lingering with 
mixed-versions mon the worst-case downtime window is very small--and only 
kicks in if *more than one* of the mon nodes fails (taking out OSDs in 
more than one failure domain).

> And customers don't always upgrade all nodes at once in a short period
> (the benefit of a supposed rolling upgrade cycle), increasing the risk.

I think they should plan to do this for the mons.  We can make a note 
stating as much in the upgrade procedure docs?
 
> I wish we'd already be fully containerized so indeed the MONs were truly
> independent of everything else going on on the cluster, but ...

Indeed!  Next time around...

> > Also, this only seems like it will affect users that are getting their 
> > ceph packages from the distro itself and not from a ceph.com channel or a 
> > special subscription/product channel (this is how the RHEL stuff works, I 
> > think).
> 
> Even there, upgrading only the MON daemons and not the OSDs is tricky?

I mean you would upgrade all of the packages, but only restart the mon 
daemons.  The deb packages have skipped the auto-restart in the postinst 
(or whatever) stage for years.  I'm pretty sure the rpms do the same?

Anyway, does that make sense?  Yes, it means that you can't just reboot in 
succession if your mons are mixed with OSDs.  But this time adding that 
restriction let us do the SnapSet and snapdir conversion in a single 
release, which is a *huge* win and will let us rip out a bunch of ugly OSD 
code.  We might not have a need for it next time around (and can try to 
avoid it), but I'm guessing something will come up and it will again be a 
hard call to make balancing between sloppy/easy upgrades vs simpler 
code...

sage


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Mike Lowe
It was required for Bobtail to Cuttlefish and Cuttlefish to Dumpling.  

Exactly how many mons do you have such that you are concerned about failure?  
If you have let’s say 3 mons, you update all the bits, then it shouldn’t take 
you more than 2 minutes to restart the mons one by one.  You can take your time 
updating/restarting the osd’s.  I generally consider it bad practice to save 
your system updates for a major ceph upgrade. How exactly can you parse the 
difference between a ceph bug and a kernel regression if you do them all at 
once?  You have a resilient system why wouldn’t you take advantage of that 
property to change one thing at a time?  So what we are really talking about 
here is a hardware failure in the short period it takes to restart mon services 
because you shouldn’t be rebooting.  If the ceph mon doesn’t come back from a 
restart then you have a bug which in all likelihood will happen on the first 
mon and at that point you have options to roll back or run with degraded mons 
until Sage et al puts out a fix.  My only significant downtime was due to a bug 
in a new release having to do with pg splitting, 8 hours later I had my fix.

> On Jul 14, 2017, at 10:39 AM, Lars Marowsky-Bree  wrote:
> 
> On 2017-07-14T10:34:35, Mike Lowe  wrote:
> 
>> Having run ceph clusters in production for the past six years and upgrading 
>> from every stable release starting with argonaut to the next, I can honestly 
>> say being careful about order of operations has not been a problem.
> 
> This requirement did not exist as a mandatory one for previous releases.
> 
> The problem is not the sunshine-all-is-good path. It's about what to do
> in case of failures during the upgrade process.
> 
> 
> 
> -- 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Joao Eduardo Luis

On 07/14/2017 03:12 PM, Sage Weil wrote:

On Fri, 14 Jul 2017, Joao Eduardo Luis wrote:

On top of this all, I found during my tests that any OSD, running luminous
prior to the luminous quorum, will need to be restarted before it can properly
boot into the cluster. I'm guessing this is a bug rather than a feature
though.


That sounds like a bug.. probably didn't subscribe to map updates from
_start_boot() or something.  Can you open an immediate ticket?


http://tracker.ceph.com/issues/20631

  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Lars Marowsky-Bree
On 2017-07-14T10:34:35, Mike Lowe  wrote:

> Having run ceph clusters in production for the past six years and upgrading 
> from every stable release starting with argonaut to the next, I can honestly 
> say being careful about order of operations has not been a problem.

This requirement did not exist as a mandatory one for previous releases.

The problem is not the sunshine-all-is-good path. It's about what to do
in case of failures during the upgrade process.



-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Mike Lowe
Having run ceph clusters in production for the past six years and upgrading 
from every stable release starting with argonaut to the next, I can honestly 
say being careful about order of operations has not been a problem.

> On Jul 14, 2017, at 10:27 AM, Lars Marowsky-Bree  wrote:
> 
> On 2017-07-14T14:12:08, Sage Weil  wrote:
> 
>>> Any thoughts on how to mitigate this, or on whether I got this all wrong and
>>> am missing a crucial detail that blows this wall of text away, please let me
>>> know.
>> I don't know; the requirement that mons be upgraded before OSDs doesn't 
>> seem that unreasonable to me.  That might be slightly more painful in a 
>> hyperconverged scenario (osds and mons on the same host), but it should 
>> just require some admin TLC (restart mon daemons instead of 
>> rebooting).
> 
> I think it's quite unreasonable, to be quite honest. Collocated MONs
> with OSDs is very typical for smaller cluster environments.
> 
>> Is there something in some distros that *requires* a reboot in order to 
>> upgrade packages?
> 
> Not necessarily.
> 
> *But* once we've upgraded the packages, a failure or reboot might
> trigger this.
> 
> And customers don't always upgrade all nodes at once in a short period
> (the benefit of a supposed rolling upgrade cycle), increasing the risk.
> 
> I wish we'd already be fully containerized so indeed the MONs were truly
> independent of everything else going on on the cluster, but ...
> 
>> Also, this only seems like it will affect users that are getting their 
>> ceph packages from the distro itself and not from a ceph.com channel or a 
>> special subscription/product channel (this is how the RHEL stuff works, I 
>> think).
> 
> Even there, upgrading only the MON daemons and not the OSDs is tricky?
> 
> 
> 
> 
> -- 
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stealth Jewel release?

2017-07-14 Thread Martin Palma
Thank you for the clarification and yes we saw that v10.2.9 was just
released. :-)

Best,
Martin

On Fri, Jul 14, 2017 at 3:53 PM, Patrick Donnelly  wrote:
> On Fri, Jul 14, 2017 at 12:26 AM, Martin Palma  wrote:
>> So only the ceph-mds is affected? Let's say if we have mons and osds
>> on 10.2.8 and the MDS on 10.2.6 or 10.2.7 we would be "safe"?
>
> Yes, only the MDS was affected.
>
> As Udo mentioned, v10.2.9 is out so feel free to upgrade to that instead.
>
> --
> Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Lars Marowsky-Bree
On 2017-07-14T14:12:08, Sage Weil  wrote:

> > Any thoughts on how to mitigate this, or on whether I got this all wrong and
> > am missing a crucial detail that blows this wall of text away, please let me
> > know.
> I don't know; the requirement that mons be upgraded before OSDs doesn't 
> seem that unreasonable to me.  That might be slightly more painful in a 
> hyperconverged scenario (osds and mons on the same host), but it should 
> just require some admin TLC (restart mon daemons instead of 
> rebooting).

I think it's quite unreasonable, to be quite honest. Collocated MONs
with OSDs is very typical for smaller cluster environments.

> Is there something in some distros that *requires* a reboot in order to 
> upgrade packages?

Not necessarily.

*But* once we've upgraded the packages, a failure or reboot might
trigger this.

And customers don't always upgrade all nodes at once in a short period
(the benefit of a supposed rolling upgrade cycle), increasing the risk.

I wish we'd already be fully containerized so indeed the MONs were truly
independent of everything else going on on the cluster, but ...

> Also, this only seems like it will affect users that are getting their 
> ceph packages from the distro itself and not from a ceph.com channel or a 
> special subscription/product channel (this is how the RHEL stuff works, I 
> think).

Even there, upgrading only the MON daemons and not the OSDs is tricky?




-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Joao Eduardo Luis

On 07/14/2017 03:12 PM, Sage Weil wrote:

On Fri, 14 Jul 2017, Joao Eduardo Luis wrote:

Dear all,


The current upgrade procedure to jewel, as stated by the RC's release notes,


You mean (jewel or kraken) -> luminous, I assume...


Yeah. *sigh*

  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Sage Weil
On Fri, 14 Jul 2017, Joao Eduardo Luis wrote:
> Dear all,
> 
> 
> The current upgrade procedure to jewel, as stated by the RC's release notes,

You mean (jewel or kraken) -> luminous, I assume...

> can be boiled down to
> 
> - upgrade all monitors first
> - upgrade osds only after we have a **full** quorum, comprised of all the
> monitors in the monmap, of luminous monitors (i.e., once we have the
> 'luminous' feature enabled in the monmap).
> 
> While this is a reasonable idea in principle, reducing a lot of the possible
> upgrade testing combinations, and a simple enough procedure from Ceph's
> point-of-view, it seems it's not a widespread upgrade procedure.
> 
> As far as I can tell, it's not uncommon for users to take this maintenance
> window to perform system-wide upgrades, including kernel and glibc for
> instance, and finishing the upgrade with a reboot.
> 
> The problem with our current upgrade procedure is that once the first server
> reboots, the osds in that server will be unable to boot, as the monitor quorum
> is not yet 'luminous'.
> 
> The only way to minimize potential downtime is to upgrade and restart all the
> nodes at the same time, which can be daunting and it basically defeats the
> purpose of a rolling upgrade. And in this scenario, there is an expectation of
> downtime, something Ceph is built to prevent.
> 
> Additionally, requiring the `luminous` feature to be enabled in the quorum
> becomes even less realistic in the face of possible failures. God forbid that
> in the middle of upgrading, the last remaining monitor server dies a horrible
> death - e.g., power, network. We'll be left with still a 'not-luminous'
> quorum, and a bunch of OSDs waiting for this flag to be flipped. And not it's
> a race to either get that monitor up, or remove it from the monmap.
> 
> Even if one were to make the decision of only upgrading system packages,
> reboot, and then upgrade Ceph packages, there is the unfortunate possibility
> that library interdependencies would require Ceph's binaries to be updated, so
> this may be a show-stopper as well.
> 
> Alternatively, if one is to simply upgrade the system and not reboot, and then
> proceed to perform the upgrade procedure, one would still be in a fragile
> position: if, for some reason, one of the nodes reboots, we're in the same
> precarious situation as before.
> 
> Personally, I can see two ways out of this, at different positions in the
> reasonability spectrum:
> 
> 1. add temporary monitor nodes to the cluster, may they be on VMs or bare
> hardware, already running Luminous, and then remove the same amount of
> monitors from the cluster. This leaves us to upgrade a single monitor node.
> This has the drawback of folks not having spare nodes to run the monitors on,
> or running monitors on VMs -- which may affect their performance during the
> upgrade window, and increase complexity in terms of firewall and routing
> rules.
> 
> 2. migrate/upgrade all nodes on which Monitors are located first, then only
> restart them after we've gotten all nodes upgraded. If anything goes wrong,
> one can hurry through this step or fall-back to 3.
> 
> 3. Reducing the monitor quorum to 1. This pains me to even think about, and it
> bothers me to bits that I'm finding myself even considering this as a
> reasonable possibility. It shouldn't, because it isn't. But it's a lot more
> realistic than expecting OSD downtime during an upgrade procedure.
> 
> On top of this all, I found during my tests that any OSD, running luminous
> prior to the luminous quorum, will need to be restarted before it can properly
> boot into the cluster. I'm guessing this is a bug rather than a feature
> though.

That sounds like a bug.. probably didn't subscribe to map updates from 
_start_boot() or something.  Can you open an immediate ticket?

> Any thoughts on how to mitigate this, or on whether I got this all wrong and
> am missing a crucial detail that blows this wall of text away, please let me
> know.

I don't know; the requirement that mons be upgraded before OSDs doesn't 
seem that unreasonable to me.  That might be slightly more painful in a 
hyperconverged scenario (osds and mons on the same host), but it should 
just require some admin TLC (restart mon daemons instead of 
rebooting).

Also, for large clusters, users often have mons on dedicated hosts.  And 
for small clusters even the slopppy "just reboot" approach will have a 
smaller impact.

Is there something in some distros that *requires* a reboot in order to 
upgrade packages?

Also, this only seems like it will affect users that are getting their 
ceph packages from the distro itself and not from a ceph.com channel or a 
special subscription/product channel (this is how the RHEL stuff works, I 
think).

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrade procedure to Luminous

2017-07-14 Thread Joao Eduardo Luis

Dear all,


The current upgrade procedure to jewel, as stated by the RC's release 
notes, can be boiled down to


- upgrade all monitors first
- upgrade osds only after we have a **full** quorum, comprised of all 
the monitors in the monmap, of luminous monitors (i.e., once we have the 
'luminous' feature enabled in the monmap).


While this is a reasonable idea in principle, reducing a lot of the 
possible upgrade testing combinations, and a simple enough procedure 
from Ceph's point-of-view, it seems it's not a widespread upgrade procedure.


As far as I can tell, it's not uncommon for users to take this 
maintenance window to perform system-wide upgrades, including kernel and 
glibc for instance, and finishing the upgrade with a reboot.


The problem with our current upgrade procedure is that once the first 
server reboots, the osds in that server will be unable to boot, as the 
monitor quorum is not yet 'luminous'.


The only way to minimize potential downtime is to upgrade and restart 
all the nodes at the same time, which can be daunting and it basically 
defeats the purpose of a rolling upgrade. And in this scenario, there is 
an expectation of downtime, something Ceph is built to prevent.


Additionally, requiring the `luminous` feature to be enabled in the 
quorum becomes even less realistic in the face of possible failures. God 
forbid that in the middle of upgrading, the last remaining monitor 
server dies a horrible death - e.g., power, network. We'll be left with 
still a 'not-luminous' quorum, and a bunch of OSDs waiting for this flag 
to be flipped. And not it's a race to either get that monitor up, or 
remove it from the monmap.


Even if one were to make the decision of only upgrading system packages, 
reboot, and then upgrade Ceph packages, there is the unfortunate 
possibility that library interdependencies would require Ceph's binaries 
to be updated, so this may be a show-stopper as well.


Alternatively, if one is to simply upgrade the system and not reboot, 
and then proceed to perform the upgrade procedure, one would still be in 
a fragile position: if, for some reason, one of the nodes reboots, we're 
in the same precarious situation as before.


Personally, I can see two ways out of this, at different positions in 
the reasonability spectrum:


1. add temporary monitor nodes to the cluster, may they be on VMs or 
bare hardware, already running Luminous, and then remove the same amount 
of monitors from the cluster. This leaves us to upgrade a single monitor 
node. This has the drawback of folks not having spare nodes to run the 
monitors on, or running monitors on VMs -- which may affect their 
performance during the upgrade window, and increase complexity in terms 
of firewall and routing rules.


2. migrate/upgrade all nodes on which Monitors are located first, then 
only restart them after we've gotten all nodes upgraded. If anything 
goes wrong, one can hurry through this step or fall-back to 3.


3. Reducing the monitor quorum to 1. This pains me to even think about, 
and it bothers me to bits that I'm finding myself even considering this 
as a reasonable possibility. It shouldn't, because it isn't. But it's a 
lot more realistic than expecting OSD downtime during an upgrade procedure.


On top of this all, I found during my tests that any OSD, running 
luminous prior to the luminous quorum, will need to be restarted before 
it can properly boot into the cluster. I'm guessing this is a bug rather 
than a feature though.


Any thoughts on how to mitigate this, or on whether I got this all wrong 
and am missing a crucial detail that blows this wall of text away, 
please let me know.



  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stealth Jewel release?

2017-07-14 Thread Patrick Donnelly
On Fri, Jul 14, 2017 at 12:26 AM, Martin Palma  wrote:
> So only the ceph-mds is affected? Let's say if we have mons and osds
> on 10.2.8 and the MDS on 10.2.6 or 10.2.7 we would be "safe"?

Yes, only the MDS was affected.

As Udo mentioned, v10.2.9 is out so feel free to upgrade to that instead.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: calculate past_intervals wrong, lead to choose wrong authority osd, then osd assert(newhead >= log.tail)

2017-07-14 Thread Sage Weil
On Fri, 14 Jul 2017, Chenyehua wrote:
> Thanks, Sage.
> 
> It doesn't happen every time, but the probability is high 
> 
> Reproduce as Follows:
> HOST-A   HOST-B  HOST-C
> osd 7  osd 21   osd11
>  1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up set[11,7],then 
> shutdown HOST-C 
>  2. for a long time, cluster has only HOST A and HOST B, write data 
>  3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times 
>  4. start HOST-A, osd 7 assert
>  And set pg log configuration:
> "osd_min_pg_log_entries": "100",
> "osd_max_pg_log_entries": "300",
> "osd_pg_log_trim_min": "100"
> 
> After analyzing ceph-osd.log, I think the root cause is "some osd compute the 
> wrong past interval"
> 
> In my test ,osd 11 has down for a long time ,it has very old data, when it 
> came up, first received full_1001 osdmap,and generate past_interval as 
> follows:
> 92~1000 [11,7]/[11,7]this is wrong interval; Actually 
> during osdmap 92 to 1000, pg already become active+clean on [7,21],and data 
> has updated a lot; 
> this past interval is inappropriate and could lead to osd11 think "I've been 
> alive and is the primary during osdmap 92~1000,  already have the same data 
> epoch with osd 7"
> next osdmap epoch, pg mapped only on [11], this pg can become active and 
> modify the last_epoch_start; In the future, other osd who has newest data 
> come back, find_best_info would choose out the one who has bigger 
> last_epoch_start, yeah it's osd 11; finally osd who has the older data 
> unexpectly become the best.

Ooh... I think this is because the past intervals generation can't really 
deal with the osdmap gap.  You can make it easier to trigger this case 
by setting mon_min_osdmap_epochs = 50 or some other smaller number (the 
default is 500).  (You can, conversely, make this very hard to trigger by 
setting it to a value larger than 500, at the expense of some mon disk 
space.)

Can you try reproducing with a lower value?  If we can reliably reproduce 
this I'm confident we can come up with a real fix (for hammer and jewel, 
if it is also affected).  I'm pretty sure luminous is not since the past 
intervals code was almost completely restructured, but with a method to 
reproduce we can confirm.

Thank you!
sage


> 
> Besides, after using the ceph-objectstore-tool import/export, the cluster 
> became healthy and all pgs were active+clean, however, the client io still 
> failed.
> 
> I have not tested on jewel, but looked at Jewel code, I think this problem 
> seems still exists; later I will test it;
> 
> I want to solve this problem in hammer branch, what should I do? Could you 
> give me some advice? Thanks.
> 
> 
> 
> -邮件原件-
> 发件人: Sage Weil [mailto:sw...@redhat.com] 
> 发送时间: 2017年7月13日 22:51
> 收件人: chenyehua 11692 (RD)
> 抄送: 'ceph-us...@ceph.com'
> 主题: Re: calculate past_intervals wrong, lead to choose wrong authority osd, 
> then osd assert(newhead >= log.tail) 
> 
> Hi Chenyehua,
> 
> This looks a lot like a problem we've seen several times on hammer and never 
> managed to find a root cause for.  Is this something that happened once or 
> can you reproduce it?  To my knowledge it has not happened on jewel, so my 
> first suggestion is to upgrade (hammer is pretty old now :).
> 
> Are you looking for help on resolving this specific issue for this cluster?  
> In the past we've used a combination of ceph-objectstore-tool import/export 
> and/or osd_find_best_info_ignore_history_les on the primary to resolve it.
> 
> sage
> 
> 
> 
> On Thu, 13 Jul 2017, Chenyehua wrote:
> 
> > 
> > Hi Sage
> > 
> > I find  the osd assert due to the  wrongly  generated past_intervals, 
> > could you give me some advice and solutions to this problem?
> > 
> >  
> > 
> > Here is the detail:
> > 
> >  
> > 
> > Ceph version: 0.94.5
> > 
> >  
> > 
> > HOST-A   HOST-B    HOST-C
> > 
> > osd 7    osd 21   osd11
> > 
> > 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up 
> > set[11,7],then shutdown HOST-C
> > 
> > 2. for a long time, cluster has only HOST A and HOST B, write data
> > 
> > 3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times
> > 
> > 4. start HOST-A, osd 21 assert
> > 
> >  
> > 
> > Analysis:
> > 
> > when osd 11 start, it generate past_intervals wrongly, make [92~1000] 
> > in the same interval
> > 
> > pg map 1673,osd11 become the primary,and pg 1.20f change from peering 
> > to
> > activating+undersized+degraded , modified last_epoch_start;
> > 
> > osd7 start, find_best_info will choose out bigger
> > last_epoch_start,althought osd7 has the latest data;
> > 
> > past_intervals on osd 7:
> > 
> > ~95     [11,7]/[11,7]
> > 
> > 96~100    [7]/[7]
> > 
> > 101     [7,21]/[7,21]
> > 
> > 102~178     [7,21]/[7]
> > 
> > 179~1663  [7,21]/[7,21]
> > 
> > 1664~1672  [21]/[21]
> > 
> > 1673~1692  [11]/[11]
> > 
> >  
> > 
> > past_intervals on osd11:
> > 
> > 

Re: [ceph-users] Ceph mount rbd

2017-07-14 Thread lista

Gonzalo,

You are right, i told so much about my enviroment actual and maybe i didn't 
know explain my problem the better form, with ceph in the moment, mutiple hosts 
clients can mount and write datas in my system and this is one problem, because 
i could have filesystem corruption.

Example, today, if runing the comand in two machines in the same time, it will 
work.

mount /dev/rbd0 /mnt/veeamrepo
cd /mnt/veeamrepo ; touch testfile.txt

I need ensure, only one machine will can execute this.

Thanks a lot,
Marcelo

Em 14/07/2017, Gonzalo Aguilar Delgado gagui...@aguilardelgado.com
escreveu:
 Hi, 
 
 Why you would like to maintain copies by yourself. You replicate on ceph 
 and then on different files inside ceph? Let ceph take care of counting. 
 Create a pool with 3 or more copies and let ceph take care of what's 
 stored and where. 
 
 Best regards, 
 
 
 El 13/07/17 a las 17:06, li...@marcelofrota.info escribi: 
  
  I will explain More about my system actual, in the moment i have 2 
  machines using drbd in mode master/slave and i running the aplication 
  in machine master, but existing 2 questions importants in my 
  enviroment with drbd actualy : 
  
  1 - If machine one is master and mounting partitions, the slave don't 
  can mount the system, Unless it happens one problem in machine master, 
  this is one mode, to prevent write in filesystem incorrect 
  
  2 - When i write data in machine master in drbd, the drbd write datas 
  in slave machine Automatically, with this, if one problem happens in 
  node master, the machine slave have coppy the data. 
  
  In the moment, in my enviroment testing with ceph, using the version 
  4.10 of kernel and i mount the system in two machines in the same 
  time, in production enviroment, i could serious problem with this 
  comportament. 
  
  How can i use the ceph and Ensure that I could get these 2 behaviors 
  kept in a new environment with Ceph? 
  
  Thanks a lot, 
  
  Marcelo 
  
  
  Em 28/06/2017, Jason Dillaman jdill...@redhat.com escreveu: 
   ... additionally, the forthcoming 4.12 kernel release will support 
   non-cooperative exclusive locking. By default, since 4.9, when the 
   exclusive-lock feature is enabled, only a single client can write to 
  the 
   block device at a time -- but they will cooperatively pass the lock 
  back 
   and forth upon write request. With the new "rbd map" option, you can 
  map a 
   image on exactly one host and prevent other hosts from mapping the 
  image. 
   If that host should die, the exclusive-lock will automatically become 
   available to other hosts for mapping. 
   
   Of course, I always have to ask the use-case behind mapping the same 
  image 
   on multiple hosts. Perhaps CephFS would be a better fit if you are 
  trying 
   to serve out a filesystem? 
   
   On Wed, Jun 28, 2017 at 6:25 PM, Maged Mokhtar 
  mmokh...@petasan.org wrote: 
   
On 2017-06-28 22:55, li...@marcelofrota.info wrote: 

Hi People, 

I am testing the new enviroment, with ceph + rbd with ubuntu 
  16.04, and i 
have one question. 

I have my cluster ceph and mount the using the comands to ceph in 
  my linux 
enviroment : 

rbd create veeamrepo --size 20480 
rbd --image veeamrepo info 
modprobe rbd 
rbd map veeamrepo 
rbd feature disable veeamrepo exclusive-lock object-map fast-diff 
deep-flatten 
mkdir /mnt/veeamrepo 
mount /dev/rbd0 /mnt/veeamrepo 

The comands work fine, but i have one problem, in the moment, i 
  can mount 
the /mnt/veeamrepo in the same time in 2 machines, and this is a 
  bad option 
for me in the moment, because this could generate one filesystem 
  corrupt. 

I need only one machine to be allowed to mount and write at a time. 

Example if machine1 mount the /mnt/veeamrepo and machine2 try 
  mount, one 
error would be displayed, show message the machine can not mount, 
  because 
the system already mounted in machine1. 

Someone, could help-me with this or give some tips, for solution my 
problem. ? 

Thanks a lot 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 



You can use Pacemaker to map the rbd and mount the filesystem on 1 
  server 
and in case of failure switch to another server. 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


   
   
   -- 
   Jason 
  
  
  
  ___ 
  ceph-users mailing list 
  ceph-users@lists.ceph.com 
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 

Re: [ceph-users] 答复: 答复: No "snapset" attribute for clone object

2017-07-14 Thread Jason Dillaman
The only people that have experienced it seem to be using cache
tiering. I don't know if anyone has deeply investigate it yet. You
could attempt to evict those objects from the cache tier so that the
snapdir request is proxied down to the base tier to see if that works.

On Fri, Jul 14, 2017 at 3:02 AM, 许雪寒  wrote:
> Yes, I believe so. Is there any workarounds?
>
> -邮件原件-
> 发件人: Jason Dillaman [mailto:jdill...@redhat.com]
> 发送时间: 2017年7月13日 21:13
> 收件人: 许雪寒
> 抄送: ceph-users@lists.ceph.com
> 主题: Re: [ceph-users] 答复: No "snapset" attribute for clone object
>
> Quite possibly the same as this issue? [1]
>
> [1] http://tracker.ceph.com/issues/17445
>
> On Thu, Jul 13, 2017 at 8:13 AM, 许雪寒  wrote:
>> By the way, we are using hammer version's rbd command to export-diff rbd 
>> images on Jewel version's cluster.
>>
>> -邮件原件-
>> 发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 许雪寒
>> 发送时间: 2017年7月13日 19:54
>> 收件人: ceph-users@lists.ceph.com
>> 主题: [ceph-users] No "snapset" attribute for clone object
>>
>> We are using rbd for block devices of VMs, and recently we found that after 
>> we created snapshots for some rbd images, there existed such objects for 
>> which there are clone objects who doesn't have "snapset" extensive 
>> attributes with them.
>>
>> It seems that the lack of "snapset" attributes for clone objects has led to 
>> segmentation faults when we try to do "export-diff".
>>
>> Is this a bug?
>> We are using 10.2.5, jewel version.
>>
>> Thank you:-)
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to list and reset the scrub schedules

2017-07-14 Thread Dan van der Ster
Hi,

Occasionally we want to change the scrub schedule for a pool or whole
cluster, but we want to do this by injecting new settings without
restarting every daemon.

I've noticed that in jewel, changes to scrub_min/max_interval and
deep_scrub_interval do not take immediate effect, presumably because
the scrub schedules are calculated in advance for all the PGs on an
OSD.

Does anyone know how to list that scrub schedule for a given OSD?

And better yet, does anyone know a way to reset that schedule, so that
the OSD generates a new one with the new configuration?

(I've noticed that by chance setting sortbitwise triggers many scrubs
-- maybe a new peering interval resets the scrub schedules?) Any
non-destructive way to trigger a new peering interval on demand?

Cheers,

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer -> jewel 10.2.8 upgrade and setting sortbitwise

2017-07-14 Thread Dan van der Ster
On Mon, Jul 10, 2017 at 5:06 PM, Sage Weil  wrote:
> On Mon, 10 Jul 2017, Luis Periquito wrote:
>> Hi Dan,
>>
>> I've enabled it in a couple of big-ish clusters and had the same
>> experience - a few seconds disruption caused by a peering process
>> being triggered, like any other crushmap update does. Can't remember
>> if it triggered data movement, but I have a feeling it did...
>
> That's consistent with what one should expect.
>
> The flag triggers a new peering interval, which means the PGs will peer,
> but there is no change in the mapping or data layout or anything else.
> The only thing that is potentially scary here is that *every* PG will
> repeer at the same time.

Thanks Sage & Luis. I confirm that setting sortbitwise on a large
cluster is basically a non-event... nothing to worry about.

(Btw, we just upgraded our biggest prod clusters to jewel -- that also
went totally smooth!)

-- Dan

> sage
>
>
>>
>>
>>
>> On Mon, Jul 10, 2017 at 3:17 PM, Dan van der Ster  
>> wrote:
>> > Hi all,
>> >
>> > With 10.2.8, ceph will now warn if you didn't yet set sortbitwise.
>> >
>> > I just updated a test cluster, saw that warning, then did the necessary
>> >   ceph osd set sortbitwise
>> >
>> > I noticed a short re-peering which took around 10s on this small
>> > cluster with very little data.
>> >
>> > Has anyone done this already on a large cluster with lots of objects?
>> > It would be nice to hear that it isn't disruptive before running it on
>> > our big production instances.
>> >
>> > Cheers, Dan
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FW: Regarding Ceph Debug Logs

2017-07-14 Thread Roshni Chatterjee
Hi All,

I am new to ceph and I am trying to debug a few scenarios . I have 2 queries as 
listed below  -
1.Regarding enabling debug logs for ceph
2.Regarding internal processes of ceph

QUERY 1
I have enabled the logs by setting the log level in ceph conf file attached 
above -

But none of this is generating information that could be used in debugging .
Also ,
The following CLI generates log as attached in cli_log attachment -
What are the following codes in the log like -
7f12a474c700 , 7f12895af700

QUERY 2

While running Ceph on LTTng , I came across multiple processes spawned like 
tp_osd_tp ,tp_osd_recov, tp_osd_disk, osd_op ,tp_osd_cmd.
Why are these processes being spawned so many times and what is the function if 
each ?
If someone can share any document/link to understand the ceph internals 
processes/functions/working would be highly appreciated.

Regards,
Roshni
[global]
fsid = ebc528fa-61f5-44be-b3df-5af92fce9379
mon_initial_members = ceph-admin
mon_host = 10.0.4.118
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 1  # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 128
osd pool default pgp num = 128
debug ms = 1/5
debug tp log = 1/20
osd_tracing = true
osd_objectstore_tracing = true
rados_tracing = true
rbd_tracing = true

[mon]
debug mon = 20
debug paxos = 1/5
debug auth = 2

[osd]
debug osd = 1/5
debug filestore = 1/5
debug journal = 1
debug monc = 5/20

[mds]
debug mds = 1
debug mds balancer = 1
debug mds log = 1
debug mds migrator = 1

log to syslog = true
err to syslog = true
root@ceph-admin:/home/cephuser/cluster# rbd --pool mypool snap create --snap 
mysnap_myimage18 myimage
2017-07-13 17:20:41.107210 7f12a4759100  1 -- :/0 messenger.start
2017-07-13 17:20:41.108308 7f12a4759100  1 -- :/782351370 --> 10.0.4.118:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x557b971d75a0 con 0x557b971d7020
2017-07-13 17:20:41.111972 7f12a474c700  1 -- 10.0.4.118:0/782351370 learned my 
addr 10.0.4.118:0/782351370
2017-07-13 17:20:41.112625 7f12895af700  1 -- 10.0.4.118:0/782351370 <== mon.0 
10.0.4.118:6789/0 1  mon_map magic: 0 v1  200+0+0 (2014826046 0 0) 
0x7f1274000bc0 con 0x557b971d7020
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Regarding Ceph Debug Logs

2017-07-14 Thread Roshni Chatterjee

Hi All,

I am new to ceph and I am trying to debug a few scenarios . I have 2 queries as 
listed below  -
1.Regarding enabling debug logs for ceph
2.Regarding internal processes of ceph

QUERY 1 >>
I have enabled the logs by setting the log level in /etc/ceph/ceph.conf 
attached above -

But none of this is generating information that could be used in debugging .
Also ,
The following CLI generates log as attached in cli_log attachment -
What are the following codes in the log like -
7f12a474c700 , 7f12895af700

QUERY 2>>

While running Ceph on LTTng , I came across multiple processes spawned like 
tp_osd_tp ,tp_osd_recov, tp_osd_disk, osd_op ,tp_osd_cmd.
Why are these processes being spawned so many times and what is the function if 
each ?
If someone can share any document/link to understand the ceph internals 
processes/functions/working would be highly appreciated.
Regards,
Roshni
[global]
fsid = ebc528fa-61f5-44be-b3df-5af92fce9379
mon_initial_members = ceph-admin
mon_host = 10.0.4.118
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 1  # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 128
osd pool default pgp num = 128
debug ms = 1/5
debug tp log = 1/20
osd_tracing = true
osd_objectstore_tracing = true
rados_tracing = true
rbd_tracing = true

[mon]
debug mon = 20
debug paxos = 1/5
debug auth = 2

[osd]
debug osd = 1/5
debug filestore = 1/5
debug journal = 1
debug monc = 5/20

[mds]
debug mds = 1
debug mds balancer = 1
debug mds log = 1
debug mds migrator = 1

log to syslog = true
err to syslog = true
root@ceph-admin:/home/cephuser/cluster# rbd --pool mypool snap create --snap 
mysnap_myimage18 myimage
2017-07-13 17:20:41.107210 7f12a4759100  1 -- :/0 messenger.start
2017-07-13 17:20:41.108308 7f12a4759100  1 -- :/782351370 --> 10.0.4.118:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x557b971d75a0 con 0x557b971d7020
2017-07-13 17:20:41.111972 7f12a474c700  1 -- 10.0.4.118:0/782351370 learned my 
addr 10.0.4.118:0/782351370
2017-07-13 17:20:41.112625 7f12895af700  1 -- 10.0.4.118:0/782351370 <== mon.0 
10.0.4.118:6789/0 1  mon_map magic: 0 v1  200+0+0 (2014826046 0 0) 
0x7f1274000bc0 con 0x557b971d7020
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Regarding Ceph Debug Logs

2017-07-14 Thread Roshni Chatterjee
Hi All,

I am new to ceph and I am trying to debug a few scenarios . I have 2 queries as 
listed below  -
1.Regarding enabling debug logs for ceph
2.Regarding internal processes of ceph

QUERY 1

I have enabled the logs by setting the log level in /etc/ceph/ceph.conf 
attached above -

But none of this is generating information that could be used in debugging .
Also ,
The following CLI generates log as attached in cli_log attachment -
What are the following codes in the log like -
7f12a474c700 , 7f12895af700

QUERY 2
While running Ceph on LTTng , I came across multiple processes spawned like 
tp_osd_tp ,tp_osd_recov, tp_osd_disk, osd_op ,tp_osd_cmd.
Why are these processes being spawned so many times and what is the function if 
each ?
If someone can share any document/link to understand the ceph internals 
processes/functions/working would be highly appreciated.


Regards,
Roshni
[global]
fsid = ebc528fa-61f5-44be-b3df-5af92fce9379
mon_initial_members = ceph-admin
mon_host = 10.0.4.118
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 1  # Write an object n times.
osd pool default min size = 1 # Allow writing n copy in a degraded state.
osd pool default pg num = 128
osd pool default pgp num = 128
debug ms = 1/5
debug tp log = 1/20
osd_tracing = true
osd_objectstore_tracing = true
rados_tracing = true
rbd_tracing = true

[mon]
debug mon = 20
debug paxos = 1/5
debug auth = 2

[osd]
debug osd = 1/5
debug filestore = 1/5
debug journal = 1
debug monc = 5/20

[mds]
debug mds = 1
debug mds balancer = 1
debug mds log = 1
debug mds migrator = 1

log to syslog = true
err to syslog = true
root@ceph-admin:/home/cephuser/cluster# rbd --pool mypool snap create --snap 
mysnap_myimage18 myimage
2017-07-13 17:20:41.107210 7f12a4759100  1 -- :/0 messenger.start
2017-07-13 17:20:41.108308 7f12a4759100  1 -- :/782351370 --> 10.0.4.118:6789/0 
-- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x557b971d75a0 con 0x557b971d7020
2017-07-13 17:20:41.111972 7f12a474c700  1 -- 10.0.4.118:0/782351370 learned my 
addr 10.0.4.118:0/782351370
2017-07-13 17:20:41.112625 7f12895af700  1 -- 10.0.4.118:0/782351370 <== mon.0 
10.0.4.118:6789/0 1  mon_map magic: 0 v1  200+0+0 (2014826046 0 0) 
0x7f1274000bc0 con 0x557b971d7020
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Richard Hesketh
On 14/07/17 11:03, Ilya Dryomov wrote:
> On Fri, Jul 14, 2017 at 11:29 AM, Riccardo Murri
>  wrote:
>> Hello,
>>
>> I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04.
>>
>> Everything looks fine, but the `mount.ceph` command fails (error 110, 
>> timeout);
>> kernel logs show a number of messages like these before the `mount`
>> prog gives up:
>>
>> libceph: ... feature set mismatch, my 107b84a842aca < server's
>> 40107b84a842aca, missing 400
>>
>> I read in [1] that this is feature
>> CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING which is only supported in
>> kernels 4.5 and up -- whereas Ubuntu 16.04 runs Linux 4.4.
>>
>> Is there some tunable or configuration file entry that I can set,
>> which will make Luminous FS mounting work on the std Ubuntu 16.04
>> Linux kernel?  I.e., is there a way I can avoid upgrading the kernel?
>>
>> [1]: 
>> http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
> 
> Yes, you should be able to set your CRUSH tunables profile to hammer
> with "ceph osd crush tunables hammer".
> 
> Thanks,
> 
> Ilya

Alternatively, keep in mind you can install ceph-fuse and mount the FS using 
that userland client instead, if you'd prefer the tunables in your cluster to 
be up to date.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: calculate past_intervals wrong, lead to choose wrong authority osd, then osd assert(newhead >= log.tail)

2017-07-14 Thread Chenyehua
Thanks, Sage.

It doesn't happen every time, but the probability is high 

Reproduce as Follows:
HOST-A   HOST-B  HOST-C
osd 7  osd 21   osd11
 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up set[11,7],then 
shutdown HOST-C 
 2. for a long time, cluster has only HOST A and HOST B, write data 
 3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times 
 4. start HOST-A, osd 7 assert
 And set pg log configuration:
"osd_min_pg_log_entries": "100",
"osd_max_pg_log_entries": "300",
"osd_pg_log_trim_min": "100"

After analyzing ceph-osd.log, I think the root cause is "some osd compute the 
wrong past interval"

In my test ,osd 11 has down for a long time ,it has very old data, when it came 
up, first received full_1001 osdmap,and generate past_interval as follows:
92~1000 [11,7]/[11,7]this is wrong interval; Actually 
during osdmap 92 to 1000, pg already become active+clean on [7,21],and data has 
updated a lot; 
this past interval is inappropriate and could lead to osd11 think "I've been 
alive and is the primary during osdmap 92~1000,  already have the same data 
epoch with osd 7"
next osdmap epoch, pg mapped only on [11], this pg can become active and modify 
the last_epoch_start; In the future, other osd who has newest data come back, 
find_best_info would choose out the one who has bigger last_epoch_start, yeah 
it's osd 11; finally osd who has the older data unexpectly become the best.

Besides, after using the ceph-objectstore-tool import/export, the cluster 
became healthy and all pgs were active+clean, however, the client io still 
failed.

I have not tested on jewel, but looked at Jewel code, I think this problem 
seems still exists; later I will test it;

I want to solve this problem in hammer branch, what should I do? Could you give 
me some advice? Thanks.



-邮件原件-
发件人: Sage Weil [mailto:sw...@redhat.com] 
发送时间: 2017年7月13日 22:51
收件人: chenyehua 11692 (RD)
抄送: 'ceph-us...@ceph.com'
主题: Re: calculate past_intervals wrong, lead to choose wrong authority osd, 
then osd assert(newhead >= log.tail) 

Hi Chenyehua,

This looks a lot like a problem we've seen several times on hammer and never 
managed to find a root cause for.  Is this something that happened once or can 
you reproduce it?  To my knowledge it has not happened on jewel, so my first 
suggestion is to upgrade (hammer is pretty old now :).

Are you looking for help on resolving this specific issue for this cluster?  In 
the past we've used a combination of ceph-objectstore-tool import/export and/or 
osd_find_best_info_ignore_history_les on the primary to resolve it.

sage



On Thu, 13 Jul 2017, Chenyehua wrote:

> 
> Hi Sage
> 
> I find  the osd assert due to the  wrongly  generated past_intervals, 
> could you give me some advice and solutions to this problem?
> 
>  
> 
> Here is the detail:
> 
>  
> 
> Ceph version: 0.94.5
> 
>  
> 
> HOST-A   HOST-B    HOST-C
> 
> osd 7    osd 21   osd11
> 
> 1. osdmap epoch95, pg 1.20f on osd acting set [11,7]/ up 
> set[11,7],then shutdown HOST-C
> 
> 2. for a long time, cluster has only HOST A and HOST B, write data
> 
> 3. shutdown HOST-A , then start HOST-C, restart HOST-B about 4 times
> 
> 4. start HOST-A, osd 21 assert
> 
>  
> 
> Analysis:
> 
> when osd 11 start, it generate past_intervals wrongly, make [92~1000] 
> in the same interval
> 
> pg map 1673,osd11 become the primary,and pg 1.20f change from peering 
> to
> activating+undersized+degraded , modified last_epoch_start;
> 
> osd7 start, find_best_info will choose out bigger
> last_epoch_start,althought osd7 has the latest data;
> 
> past_intervals on osd 7:
> 
> ~95     [11,7]/[11,7]
> 
> 96~100    [7]/[7]
> 
> 101     [7,21]/[7,21]
> 
> 102~178     [7,21]/[7]
> 
> 179~1663  [7,21]/[7,21]
> 
> 1664~1672  [21]/[21]
> 
> 1673~1692  [11]/[11]
> 
>  
> 
> past_intervals on osd11:
> 
> 92~1000     [11,7]/[11,7]the wrong pi
> 
> 1001~1663   [7,21]/[7,21] no rw
> 
> 1664~1672   [21]/[21] no rw
> 
> 1673~1692    [11]/[11]
> 
>  
> 
>  
> 
>  
> 
> Logs:
> 
> Assert on osd7:
> 
> 2017-07-10 16:08:29.836722 7f4fac24a700 -1 osd/PGLog.cc: In function 
> 'void PGLog::rewind_divergent_log(ObjectStore::Transaction&, 
> eversion_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread 
> 7f4fac24a700 time
> 2017-07-10 16:08:29.833699
> 
> osd/PGLog.cc: 503: FAILED assert(newhead >= log.tail)
> 
> ceph version 0.94.5 (664cc0b54fdb496233a81ab19d42df3f46dcda50)
> 
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0xbd1ebb]
> 
> 2: (PGLog::rewind_divergent_log(ObjectStore::Transaction&, eversion_t, 
> pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x60b) [0x7840fb]
> 
> 3: (PG::rewind_divergent_log(ObjectStore::Transaction&, 
> eversion_t)+0x97) [0x7df4b7]
> 
> 4: (PG::RecoveryState::Stray::react(PG::MInfoRec const&)+0x22f) 
> [0x80109f]

Re: [ceph-users] libceph: auth method 'x' error -1

2017-07-14 Thread Ilya Dryomov
On Wed, Jul 12, 2017 at 7:11 PM,   wrote:
> Hi!
>
> I have installed Ceph using ceph-deploy.
> The Ceph Storage Cluster setup includes these nodes:
> ld4257 Monitor0 + Admin
> ld4258 Montor1
> ld4259 Monitor2
> ld4464 OSD0
> ld4465 OSD1
>
> Ceph Health status is OK.
>
> However, I cannot mount Ceph FS.
> When I enter this command on ld4257
> mount -t ceph ldcephmon1,ldcephmon2,ldcephmon3:/ /mnt/cephfs/ -o
> name=client.openattic,secret=[secretkey]
> I get this error:
> mount error 1 = Operation not permitted
> In syslog I find this entries:
> [ 3657.493337] libceph: client264233 fsid
> 5f6f168d-2ade-4d16-a7e6-3704f93ad94e
> [ 3657.493542] libceph: auth method 'x' error -1
>
> When I use another mount command on ld4257
> mount.ceph ld4257,ld4258,ld4259:/cephfs /mnt/cephfs/ -o
> name=client.openattic,secretfile=/etc/ceph/ceph.client.openattic.keyring
> I get this error:
> secret is not valid base64: Invalid argument.
> adding ceph secret key to kernel failed: Invalid argument.
> failed to parse ceph_options
>
> Question:
> Is mount-option "secretfile" not supported anymore?

It is.  The secret file should contain the base64-encoded key and
nothing else:

$ cat /tmp/secret
AQBgL15ZHHfVHhAAFz0Us4uGdIBvwfxEJ96OQQ==

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] autoconfigured haproxy service?

2017-07-14 Thread Wido den Hollander

> Op 11 juli 2017 om 22:35 schreef Sage Weil :
> 
> 
> On Tue, 11 Jul 2017, Wido den Hollander wrote:
> > > Op 11 juli 2017 om 17:03 schreef Sage Weil :
> > > 
> > > 
> > > Hi all,
> > > 
> > > Luminous features a new 'service map' that lets rgw's (and rgw nfs 
> > > gateways and iscsi gateways and rbd mirror daemons and ...) advertise 
> > > themselves to the cluster along with some metadata (like the addresses 
> > > they are binding to and the services the provide).
> > > 
> > > It should be pretty straightforward to build a service that 
> > > auto-configures haproxy based on this information so that you can deploy 
> > > an rgw front-end that dynamically reconfigures itself when additional 
> > > rgw's are deployed or removed.  haproxy has a facility to adjust its 
> > > backend configuration at runtime[1].
> > > 
> > > Anybody interested in tackling this?  Setting up the load balancer in 
> > > front of rgw is one of the more annoying pieces of getting ceph up and 
> > > running in production and until now has been mostly treated as out of 
> > > scope.  It would be awesome if there was an autoconfigured service that 
> > > did it out of the box (and had all the right haproxy options set).
> > > 
> > 
> > Are there easy Python bindings for this? I mean querying the service map.
> 
> Yes and no.  There are no special librados hooks (or python wrappers) to 
> get the map, but you can issue a mon_command for 'service dump' and get it 
> in JSON, which works just as well for python users.
> 
> > I'm personally a fan of running Varnish (with Hitch for SSL) in front of 
> > RGW. Some people might also prefer Traefik [0] since that also supports 
> > dynamic configs.
> 
> How would you go about autoconfiguring varnish via the rgw service map in 
> this case?
> 

Something like this works with RGW: 
https://gist.github.com/wido/d93f18810f40ecf405a5be0272821999

You see two backends configured there, but you can have more.

You can also replace that by:

include "backends.vcl"

Where the backends.vcl would then contain:

backend rgw1 {
.host = "rgw1";
.port = "7480";
.connect_timeout = 5s;
.first_byte_timeout = 15s;
.between_bytes_timeout = 5s;
.probe = {
.timeout   = 30s;
.interval  = 3s;
.window= 10;
.threshold = 3;
.request =
"GET / HTTP/1.1"
"Host: localhost"
"User-Agent: Varnish-health-check"
"Connection: close";
}
}

backend rgw2 {
.host = "rgw2";
.port = "7480";
.connect_timeout = 5s;
.first_byte_timeout = 15s;
.between_bytes_timeout = 5s;
.probe = {
.timeout   = 30s;
.interval  = 3s;
.window= 10;
.threshold = 3;
.request =
"GET / HTTP/1.1"
"Host: localhost"
"User-Agent: Varnish-health-check"
"Connection: close";
}
}

A very simple piece of code would generate these backends based on the 
servicemap in Ceph.

Wido

> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Ilya Dryomov
On Fri, Jul 14, 2017 at 11:29 AM, Riccardo Murri
 wrote:
> Hello,
>
> I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04.
>
> Everything looks fine, but the `mount.ceph` command fails (error 110, 
> timeout);
> kernel logs show a number of messages like these before the `mount`
> prog gives up:
>
> libceph: ... feature set mismatch, my 107b84a842aca < server's
> 40107b84a842aca, missing 400
>
> I read in [1] that this is feature
> CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING which is only supported in
> kernels 4.5 and up -- whereas Ubuntu 16.04 runs Linux 4.4.
>
> Is there some tunable or configuration file entry that I can set,
> which will make Luminous FS mounting work on the std Ubuntu 16.04
> Linux kernel?  I.e., is there a way I can avoid upgrading the kernel?
>
> [1]: 
> http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

Yes, you should be able to set your CRUSH tunables profile to hammer
with "ceph osd crush tunables hammer".

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Peter Maloney

according to some slide in https://www.youtube.com/watch?v=gp6if858HUI
the support is:
> TUNABLE  RELEASE   CEPH_VERSION  KERNEL
> CRUSH_TUNABLES   argonaut  v0.48.1   v3.6
> CRUSH_TUNABLES2  bobtail   v0.55 v3.9
> CRUSH_TUNABLES3  firefly   v0.78 v3.15
> CRUSH_V4 hammerv0.94 v4.1
> CRUSH_TUNABLES5  Jewel v10.0.2   v4.5

So go to hammer tunables:
> ceph osd crush tunables hammer


On 07/14/17 11:29, Riccardo Murri wrote:
> Hello,
>
> I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04.
>
> Everything looks fine, but the `mount.ceph` command fails (error 110, 
> timeout);
> kernel logs show a number of messages like these before the `mount`
> prog gives up:
>
> libceph: ... feature set mismatch, my 107b84a842aca < server's
> 40107b84a842aca, missing 400
>
> I read in [1] that this is feature
> CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING which is only supported in
> kernels 4.5 and up -- whereas Ubuntu 16.04 runs Linux 4.4.
>
> Is there some tunable or configuration file entry that I can set,
> which will make Luminous FS mounting work on the std Ubuntu 16.04
> Linux kernel?  I.e., is there a way I can avoid upgrading the kernel?
>
> [1]: 
> http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client
>
> Thanks,
> Riccardo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] missing feature 400000000000000 ?

2017-07-14 Thread Riccardo Murri
Hello,

I am trying to install a test CephFS "Luminous" system on Ubuntu 16.04.

Everything looks fine, but the `mount.ceph` command fails (error 110, timeout);
kernel logs show a number of messages like these before the `mount`
prog gives up:

libceph: ... feature set mismatch, my 107b84a842aca < server's
40107b84a842aca, missing 400

I read in [1] that this is feature
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING which is only supported in
kernels 4.5 and up -- whereas Ubuntu 16.04 runs Linux 4.4.

Is there some tunable or configuration file entry that I can set,
which will make Luminous FS mounting work on the std Ubuntu 16.04
Linux kernel?  I.e., is there a way I can avoid upgrading the kernel?

[1]: 
http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client

Thanks,
Riccardo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PGs per OSD guidance

2017-07-14 Thread Adrian Saul
Hi All,
   I have been reviewing the sizing of our PGs with a view to some intermittent 
performance issues.  When we have scrubs running, even when only a few are, we 
can sometimes get severe impacts on the performance of RBD images, enough to 
start causing VMs to appear stalled or unresponsive.When some of these 
scrubs are running I can see very high latency on some disks which I suspect is 
what is impacting the performance.  We currently have around 70 PGs per SATA 
OSD, and 140 PGs per SSD OSD.   These numbers are probably not really 
reflective as most of the data is in only really half of the pools, so some PGs 
would be fairly heavy while others are practically empty.   From what I have 
read we should be able to go significantly higher though.We are running 
10.2.1 if that matters in this context.

 My question is if we increase the numbers of PGs, is that likely to help 
reduce the scrub impact or spread it wider?  For example, does the mere act of 
scrubbing one PG mean the underlying disk is going to be hammered and so we 
will impact more PGs with that load, or would having more PGs mean the time to 
scrub the PG should be reduced and so the impact will be more disbursed?

I am also curious from a performance stand of view are we better off with more 
PGs to reduce PG lock contention etc?

Cheers,
 Adrian


Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stealth Jewel release?

2017-07-14 Thread ulembke

Hi,
10.2.9 is there:
apt list --upgradable
Listing... Done
ceph/stable 10.2.9-1~bpo80+1 amd64 [upgradable from: 10.2.8-1~bpo80+1]

Change-File??

Udo

Am 2017-07-14 09:26, schrieb Martin Palma:

So only the ceph-mds is affected? Let's say if we have mons and osds
on 10.2.8 and the MDS on 10.2.6 or 10.2.7 we would be "safe"?

I'm asking since we need to add new storage nodes to our production 
cluster.


Best,
Martin

On Wed, Jul 12, 2017 at 10:44 PM, Patrick Donnelly 
 wrote:
On Wed, Jul 12, 2017 at 11:31 AM, Dan van der Ster 
 wrote:

On Wed, Jul 12, 2017 at 5:51 PM, Abhishek L
 wrote:
On Wed, Jul 12, 2017 at 9:13 PM, Xiaoxi Chen 
 wrote:
+However, it also introduced a regression that could cause MDS 
damage.
+Therefore, we do *not* recommend that Jewel users upgrade to this 
version -
+instead, we recommend upgrading directly to v10.2.9 in which the 
regression is

+fixed.

It looks like this version is NOT production ready. Curious why we
want a not-recwaended version  to be released?


We found a regression in MDS right after packages were built, and 
the release
was about to be announced. This is why we didn't announce the 
release.

We're  currently running tests after the fix for MDS was merged.

So when we do announce the release we'll announce 10.2.9 so that 
users

can upgrade from 10.2.7->10.2.9


Suppose some users already upgraded their CephFS to 10.2.8 -- what is
the immediate recommended course of action? Downgrade or wait for the
10.2.9 ?


I'm not aware of or see any changes that would make downgrading back
to 10.2.7 a problem but the safest thing to do would be to replace the
v10.2.8 ceph-mds binaries with the v10.2.7 binary. If that's not
practical, I would recommend a cluster-wide downgrade to 10.2.7.

--
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mount rbd

2017-07-14 Thread Gonzalo Aguilar Delgado

Hi,

Why you would like to maintain copies by yourself. You replicate on ceph 
and then on different files inside ceph? Let ceph take care of counting. 
Create a pool with 3 or more copies and let ceph take care of what's 
stored and where.


Best regards,


El 13/07/17 a las 17:06, li...@marcelofrota.info escribió:


I will explain More about my system actual, in the moment i have 2 
machines using drbd in mode master/slave and i running the aplication 
in machine master, but existing 2 questions importants in my 
enviroment with drbd actualy :


1 - If machine one is master and mounting partitions, the slave don't 
can mount the system, Unless it happens one problem in machine master, 
this is one mode, to prevent write in filesystem incorrect


2 - When i write data in machine master in drbd, the drbd write datas 
in slave machine Automatically, with this, if one problem happens in 
node master, the machine slave have coppy the data.


In the moment, in my enviroment testing with ceph, using the version 
4.10 of kernel and i mount the system in two machines in the same 
time, in production enviroment, i could serious problem with this 
comportament.


How can i use the ceph and Ensure that I could get these 2 behaviors 
kept in a new environment with Ceph?


Thanks a lot,

Marcelo


Em 28/06/2017, Jason Dillaman  escreveu:
> ... additionally, the forthcoming 4.12 kernel release will support
> non-cooperative exclusive locking. By default, since 4.9, when the
> exclusive-lock feature is enabled, only a single client can write to 
the
> block device at a time -- but they will cooperatively pass the lock 
back
> and forth upon write request. With the new "rbd map" option, you can 
map a
> image on exactly one host and prevent other hosts from mapping the 
image.

> If that host should die, the exclusive-lock will automatically become
> available to other hosts for mapping.
>
> Of course, I always have to ask the use-case behind mapping the same 
image
> on multiple hosts. Perhaps CephFS would be a better fit if you are 
trying

> to serve out a filesystem?
>
> On Wed, Jun 28, 2017 at 6:25 PM, Maged Mokhtar 
 wrote:

>
> > On 2017-06-28 22:55, li...@marcelofrota.info wrote:
> >
> > Hi People,
> >
> > I am testing the new enviroment, with ceph + rbd with ubuntu 
16.04, and i

> > have one question.
> >
> > I have my cluster ceph and mount the using the comands to ceph in 
my linux

> > enviroment :
> >
> > rbd create veeamrepo --size 20480
> > rbd --image veeamrepo info
> > modprobe rbd
> > rbd map veeamrepo
> > rbd feature disable veeamrepo exclusive-lock object-map fast-diff
> > deep-flatten
> > mkdir /mnt/veeamrepo
> > mount /dev/rbd0 /mnt/veeamrepo
> >
> > The comands work fine, but i have one problem, in the moment, i 
can mount
> > the /mnt/veeamrepo in the same time in 2 machines, and this is a 
bad option
> > for me in the moment, because this could generate one filesystem 
corrupt.

> >
> > I need only one machine to be allowed to mount and write at a time.
> >
> > Example if machine1 mount the /mnt/veeamrepo and machine2 try 
mount, one
> > error would be displayed, show message the machine can not mount, 
because

> > the system already mounted in machine1.
> >
> > Someone, could help-me with this or give some tips, for solution my
> > problem. ?
> >
> > Thanks a lot
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > You can use Pacemaker to map the rbd and mount the filesystem on 1 
server

> > and in case of failure switch to another server.
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
>
> --
> Jason



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stealth Jewel release?

2017-07-14 Thread Martin Palma
So only the ceph-mds is affected? Let's say if we have mons and osds
on 10.2.8 and the MDS on 10.2.6 or 10.2.7 we would be "safe"?

I'm asking since we need to add new storage nodes to our production cluster.

Best,
Martin

On Wed, Jul 12, 2017 at 10:44 PM, Patrick Donnelly  wrote:
> On Wed, Jul 12, 2017 at 11:31 AM, Dan van der Ster  
> wrote:
>> On Wed, Jul 12, 2017 at 5:51 PM, Abhishek L
>>  wrote:
>>> On Wed, Jul 12, 2017 at 9:13 PM, Xiaoxi Chen  wrote:
 +However, it also introduced a regression that could cause MDS damage.
 +Therefore, we do *not* recommend that Jewel users upgrade to this version 
 -
 +instead, we recommend upgrading directly to v10.2.9 in which the 
 regression is
 +fixed.

 It looks like this version is NOT production ready. Curious why we
 want a not-recwaended version  to be released?
>>>
>>> We found a regression in MDS right after packages were built, and the 
>>> release
>>> was about to be announced. This is why we didn't announce the release.
>>> We're  currently running tests after the fix for MDS was merged.
>>>
>>> So when we do announce the release we'll announce 10.2.9 so that users
>>> can upgrade from 10.2.7->10.2.9
>>
>> Suppose some users already upgraded their CephFS to 10.2.8 -- what is
>> the immediate recommended course of action? Downgrade or wait for the
>> 10.2.9 ?
>
> I'm not aware of or see any changes that would make downgrading back
> to 10.2.7 a problem but the safest thing to do would be to replace the
> v10.2.8 ceph-mds binaries with the v10.2.7 binary. If that's not
> practical, I would recommend a cluster-wide downgrade to 10.2.7.
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: 答复: No "snapset" attribute for clone object

2017-07-14 Thread 许雪寒
Yes, I believe so. Is there any workarounds?

-邮件原件-
发件人: Jason Dillaman [mailto:jdill...@redhat.com] 
发送时间: 2017年7月13日 21:13
收件人: 许雪寒
抄送: ceph-users@lists.ceph.com
主题: Re: [ceph-users] 答复: No "snapset" attribute for clone object

Quite possibly the same as this issue? [1]

[1] http://tracker.ceph.com/issues/17445

On Thu, Jul 13, 2017 at 8:13 AM, 许雪寒  wrote:
> By the way, we are using hammer version's rbd command to export-diff rbd 
> images on Jewel version's cluster.
>
> -邮件原件-
> 发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 许雪寒
> 发送时间: 2017年7月13日 19:54
> 收件人: ceph-users@lists.ceph.com
> 主题: [ceph-users] No "snapset" attribute for clone object
>
> We are using rbd for block devices of VMs, and recently we found that after 
> we created snapshots for some rbd images, there existed such objects for 
> which there are clone objects who doesn't have "snapset" extensive attributes 
> with them.
>
> It seems that the lack of "snapset" attributes for clone objects has led to 
> segmentation faults when we try to do "export-diff".
>
> Is this a bug?
> We are using 10.2.5, jewel version.
>
> Thank you:-)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pg inactive when back filling?

2017-07-14 Thread Su, Zhan
Hi Ceph users,

I found that some pg are inactive after I added some osds and PGs.

ceph pg dump_stuck inactive:

PG_STAT STATE UP
 UP_PRIMARY ACTING ACTING_PRIMARY
10.9b undersized+degraded+remapped+backfilling+peered   [8,9]
 8[3]  3
10.167undersized+degraded+remapped+backfilling+peered   [2,0]
 2[3]  3
10.1c3undersized+degraded+remapped+backfilling+peered   [9,5]
 9[1]  1
10.15c  undersized+degraded+remapped+backfill_wait+peered   [0,2]
 0[6]  6
10.187  undersized+degraded+remapped+backfill_wait+peered   [9,5]
 9[6]  6
10.1bbundersized+degraded+remapped+backfilling+peered   [0,3]
 0[3]  3
10.1f7undersized+degraded+remapped+backfilling+peered   [2,1]
 2[0]  0
10.87   undersized+degraded+remapped+backfill_wait+peered   [0,3]
 0[6]  6
10.1aeundersized+degraded+remapped+backfilling+peered   [8,3]
 8[0]  0
10.e2 undersized+degraded+remapped+backfilling+peered   [5,8]
 5[1]  1
10.17e  undersized+degraded+remapped+backfill_wait+peered   [1,3]
 1[6]  6
10.11cundersized+degraded+remapped+backfilling+peered   [5,3]
 5[1]  1
10.1d2  undersized+degraded+remapped+backfill_wait+peered   [5,2]
 5[9]  9
10.13dundersized+degraded+remapped+backfilling+peered   [3,1]
 3[0]  0
10.1a2undersized+degraded+remapped+backfilling+peered   [5,1]
 5[1]  1
10.153undersized+degraded+remapped+backfilling+peered   [1,8]
 1[0]  0
10.13cundersized+degraded+remapped+backfilling+peered   [5,9]
 5[0]  0
10.133undersized+degraded+remapped+backfilling+peered   [6,5]
 6[8]  8
10.dc   undersized+degraded+remapped+backfill_wait+peered   [8,9]
 8[6]  6
10.1efundersized+degraded+remapped+backfilling+peered   [0,1]
 0[3]  3
10.123  undersized+degraded+remapped+backfill_wait+peered   [5,8]
 5[8]  8
8.36 remapped+peering [8,2,3]
 8 [8,15]  8
10.47 undersized+degraded+remapped+backfilling+peered  [1,16]
 1[1]  1

According to my understanding, undersized+degraded+remapped+backfilling
means pg are lack of enough replication but at least one copy is still on
some osd. Shouldn't ceph be able to serve this pg from that OSD, while it
is backfilling it? Or is there some thing that I need to do to activate
these pg?

This particular pool is replicated with size=2.

Thanks!
Zhan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com