Re: [ceph-users] Socket errors, CRC, lossy con messages

2017-04-13 Thread Ilya Dryomov
On Thu, Apr 13, 2017 at 5:39 AM, Alex Gorbachev  
wrote:
> On Wed, Apr 12, 2017 at 10:51 AM, Ilya Dryomov  wrote:
>> On Wed, Apr 12, 2017 at 4:28 PM, Alex Gorbachev  
>> wrote:
>>> Hi Ilya,
>>>
>>> On Wed, Apr 12, 2017 at 4:58 AM Ilya Dryomov  wrote:

 On Tue, Apr 11, 2017 at 3:10 PM, Alex Gorbachev 
 wrote:
 > Hi Ilya,
 >
 > On Tue, Apr 11, 2017 at 4:06 AM, Ilya Dryomov 
 > wrote:
 >> On Tue, Apr 11, 2017 at 4:01 AM, Alex Gorbachev
 >>  wrote:
 >>> On Mon, Apr 10, 2017 at 2:16 PM, Alex Gorbachev
 >>>  wrote:
  I am trying to understand the cause of a problem we started
  encountering a few weeks ago.  There are 30 or so per hour messages
  on
  OSD nodes of type:
 
  ceph-osd.33.log:2017-04-10 13:42:39.935422 7fd7076d8700  0 bad crc in
  data 2227614508 != exp 2469058201
 
  and
 
  2017-04-10 13:42:39.939284 7fd722c42700  0 -- 10.80.3.25:6826/5752
  submit_message osd_op_reply(1826606251
  rbd_data.922d95238e1f29.000101bf [set-alloc-hint object_size
  16777216 write_size 16777216,write 6328320~12288] v103574'18626765
  uv18626765 ondisk = 0) v6 remote, 10.80.3.216:0/1934733503, failed
  lossy con, dropping message 0x3b55600
 
  On a client sometimes, but not corresponding to the above:
 
  Apr 10 11:53:15 roc-5r-scd216 kernel: [4906599.023174] libceph: osd96
  10.80.3.25:6822 socket error on write
 
  And from time to time, slow requests:
 
  2017-04-10 13:00:04.280686 osd.91 10.80.3.45:6808/5665 231 : cluster
  [WRN] slow request 30.108325 seconds old, received at 2017-04-10
  12:59:34.172283: osd_op(client.11893449.1:324079247
  rbd_data.8fcdfb238e1f29.000187e7 [set-alloc-hint object_size
  16777216 write_size 16777216,write 10772480~8192] 14.ed0bcdec
  ondisk+write e103545) currently waiting for subops from 2,104
  2017-04-10 13:00:06.280949 osd.91 10.80.3.45:6808/5665 232 : cluster
  [WRN] 2 slow requests, 1 included below; oldest blocked for >
  32.108610 secs
 
  Questions:
 
  1. Is there any way to drill further into the "bad crc" message?
  sometimes they have nothing before or after them, but how to
  determine
  from/to what this came from - another OSD, client, which one?
 
  2. Network seems OK - no errors on NICs, regression testing does not
  show any issues.  I realize this can be disk response, but using
  Christian Balzer's atop recommendation shows a pretty normal system.
  What is my best course of troubleshooting here - dump historic ops on
  OSD, wireshark the links or anything else?
 
  3. Christian, if you are looking at this, what would be your red
  flags in atop?
 
 >>>
 >>> One more note: OSD nodes are running kernel 4.10.2-041002-generic and
 >>> clients - 4.4.23-040423-generic
 >>
 >> Hi Alex,
 >>
 >> Did you upgrade the kernel client from < 4.4 to 4.4 somewhere in that
 >> time
 >> frame by any chance?
 >
 > Yes, they were upgraded from 4.2.8 to 4.4.23 in October.

 There is a block layer bug in 4.4 and later kernels [1].  It
 effectively undoes krbd commit [2], which prevents pages from being
 further updated while in-flight.  The timeline doesn't fit though, so
 it's probably unrelated -- not every workload can trigger it...

 [1] http://tracker.ceph.com/issues/19275
 [2]
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bae818ee1577c27356093901a0ea48f672eda514
>>>
>>>
>>> Would the workaround be to drop to 4.3?
>>
>> Unfortunately, short of installing a custom kernel or disabling data
>> CRCs entirely, yes.  I'll poke the respective maintainer again later
>> today...
>
> I downgraded to 4.3.6 and so far all the CRC messages and lossy con
> went away completely.

Thanks for the confirmation.  I'll update #19275 with stable kernel
versions when it's fixed.

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Jogi Hofmüller
Dear David,

Am Mittwoch, den 12.04.2017, 13:46 + schrieb David Turner:
> I can almost guarantee what you're seeing is PG subfolder splitting. 

Evey day there's something new to learn about ceph ;)

> When the subfolders in a PG get X number of objects, it splits into
> 16 subfolders.  Every cluster I manage has blocked requests and OSDs
> that get marked down while this is happening.  To stop the OSDs
> getting marked down, I increase the osd_heartbeat_grace until the
> OSDs no longer mark themselves down during this process.

Thanks for the hint. I adjusted the values accordingly and will monitor
our cluster. This morning there were no troubles at all btw. Still
wondering what caused yesterday's mayhem ...

Regards,
-- 
J.Hofmüller

   Nisiti
   - Abie Nathan, 1927-2008



signature.asc
Description: This is a digitally signed message part
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Peter Maloney
On 04/13/17 10:34, Jogi Hofmüller wrote:
> Dear David,
>
> Am Mittwoch, den 12.04.2017, 13:46 + schrieb David Turner:
>> I can almost guarantee what you're seeing is PG subfolder splitting. 
> Evey day there's something new to learn about ceph ;)
>
>> When the subfolders in a PG get X number of objects, it splits into
>> 16 subfolders.  Every cluster I manage has blocked requests and OSDs
>> that get marked down while this is happening.  To stop the OSDs
>> getting marked down, I increase the osd_heartbeat_grace until the
>> OSDs no longer mark themselves down during this process.
> Thanks for the hint. I adjusted the values accordingly and will monitor
> our cluster. This morning there were no troubles at all btw. Still
> wondering what caused yesterday's mayhem ...
>
> Regards,
Also more things to consider...

Ceph snapshots relly slow things down. They aren't efficient like on
zfs and btrfs. Having one might take away some % performance, and having
2 snaps takes potentially double, etc. until it is crawling. And it's
not just the CoW... even just rbd snap rm, rbd diff, etc. starts to take
many times longer. See http://tracker.ceph.com/issues/10823 for
explanation of CoW. My goal is just to keep max 1 long term snapshot.

Also there's snap trimming, which I found to be far worse than directory
splitting. The settings I have for this and splitting are:
osd_pg_max_concurrent_snap_trims=1
osd_snap_trim_sleep=0
filestore_split_multiple=8

osd_snap_trim_sleep is bugged, holding a lock while sleeping, so make
sure it's 0.
filestore_split_multiple makes it split less often I think...not sure
how much this helps, but I subjectively think it improves it.

And I find that bcache makes little metadata operations like that (and
xattrs, leveldb, xfs journal, etc.) much less load on the osd disk.

I have not changed any timeouts and don't get any OSDs marked down. But
also I didn't before I tried bcache and other settings. I just got
blocked requests (and still do, but less), and hanging librbd client VMs
(disabling exclusive-lock fixes it).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failed lossy con, dropping message

2017-04-13 Thread Laszlo Budai

Hello Greg,

Thank you for the answer.
I'm still in doubt with the "lossy". What does it mean in this context? I can 
think of different variants:
1. The designer of the protocol from start is considering the connection to be "lossy" so 
the connection errors are handled in a higher layer. So the layer that has observed the failure of 
the connection is just logging this event and will let the upper layer to handle it. This would 
support your statement 'since it's a "lossy" connection we don't need to remember the 
message and resend it.'

2. A connection is not declared "lossy" as long as it is working properly. Once it ha lost some packets or some error 
threshold is reached, we declare the connection as being lossy, inform the higher layer, and let it decide what next. Compared 
with point 1. the actions are quite similar, but the usage of the "lossy" is different. At point 1. a connection is 
always "lossy" even if it is not losing any packet actually. In the second case the connection will became 
"lossy" when the errors will appear, so "lossy" is a runtime state of the connection.

Maybe both are wrong and the truth is a third variant ... :) This is what I 
would like to understand.

Kind regards,
Laszlo


On 13.04.2017 00:36, Gregory Farnum wrote:

On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai  wrote:

Hello,

yesterday one of our compute nodes has recorded the following message for
one of the ceph connections:

submit_message osd_op(client.28817736.0:690186
rbd_data.15c046b11ab57b7.00c4 [read 2097152~380928] 3.6f81364a
ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:6818/6623, failed
lossy con, dropping message


A read message, sent to the OSD at IP 10.12.68.71:6818/6623, is being
dropped because the connection has somehow failed; since it's a
"lossy" connection we don't need to remember the message and resend
it. That failure could be an actual TCP/IP stack error; it could be
because a different thread killed the connection and it's now closed.

If you've just got one of these and didn't see other problems, it's
innocuous — I expect the most common cause for this is an OSD getting
marked down while IO is pending to it. :)
-Greg



Can someone "decode" the above message, or direct me to some document where
I could read more about it?

We have ceph 0.94.10.

Thank you,
Laszlo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph with Clos IP fabric

2017-04-13 Thread Jan Marquardt
Hi,

I am currently working on Ceph with an underlying Clos IP fabric and I
am hitting some issues.

The setup looks as follows: There are 3 Ceph nodes which are running
OSDs and MONs. Each server has one /32 loopback ip, which it announces
via BGP to its uplink switches. Besides the loopback ip each server has
an management interface with a public (not to be confused with ceph's
public network) ip address. For BGP switches and servers are running
quagga/frr.

Loopback ips:

10.10.100.1 # switch1
10.10.100.2 # switch2
10.10.100.21# server1
10.10.100.22# server2
10.10.100.23# server3

Ceph's public network is 10.10.100.0/24.

Here comes the current main problem: There are two options for
configuring the loopback address.

1.) Configure it on lo. In this case the routing works as inteded, but,
as far as I found out, Ceph can not be run on lo interface.

root@server1:~# ip route get 10.10.100.22
10.10.100.22 via 169.254.0.1 dev enp4s0f1  src 10.10.100.21
cache

2.) Configure it on dummy0. In this case Ceph is able to start, but
quagga installs learned routes with wrong source addresses - the public
management address from each host. This results in network problems,
because Ceph uses the management ips to communicate to the other Ceph
servers.

root@server1:~# ip route get 10.10.100.22
10.10.100.22 via 169.254.0.1 dev enp4s0f1  src a.b.c.d
cache

(where a.b.c.d is the machine's public ip address on its management
interface)

Has already someone done something similar?

Please let me know, if you need any further information. Any help would
really be appreciated.

Best Regards

Jan

-- 
Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
Geschäftsführer: Harald Oltmanns | Tim Evers
Eingetragen im Handelsregister Hamburg - HRB 81478



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph activation error

2017-04-13 Thread gjprabu
Hi All,



  Anybody facing this similar issue.



Regards

Prabu GJ




 On Sat, 04 Mar 2017 09:50:35 +0530 gjprabu  
wrote 




Hi Team,



  I am installing new ceph setup(jewel) and here while activating OSD 
its throughing below error.



  I am using partition based osd like /home/osd1 not a entire disk. 
Earlier installation one month back all are working fine but this time i 
getting error like below.





[root@cphadmin mycluster]# ceph-deploy osd activate cphosd1:/home/osd1 
cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cphosd1:/home/osd1 cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : 

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cphosd1', 
'/home/osd1', None), ('cphosd2', '/home/osd2', None), ('cphosd3', '/home/osd3', 
None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks cphosd1:/home/osd1: 
cphosd2:/home/osd2: cphosd3:/home/osd3:

[cphosd1][DEBUG ] connected to host: cphosd1

[cphosd1][DEBUG ] detect platform information from remote host

[cphosd1][DEBUG ] detect machine type

[cphosd1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.1.1503 Core

[ceph_deploy.osd][DEBUG ] activating host cphosd1 disk /home/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cphosd1][DEBUG ] find the location of an executable

[cphosd1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate --mark-init 
systemd --mount /home/osd1

[cphosd1][WARNIN] main_activate: path = /home/osd1

[cphosd1][WARNIN] activate: Cluster uuid is 62b4f8c7-c00c-48d0-8262-549c9ef6074c

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cphosd1][WARNIN] activate: Cluster name is ceph

[cphosd1][WARNIN] activate: OSD uuid is 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] allocate_osd_id: Allocating OSD id...

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd 
create --concise 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] activate: OSD id is 0

[cphosd1][WARNIN] activate: Initializing OSD...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/home/osd1/activate.monmap

[cphosd1][WARNIN] got monmap epoch 2

[cphosd1][WARNIN] command: Running command: /usr/bin/timeout 300 ceph-osd 
--cluster ceph --mkfs --mkkey -i 0 --monmap /home/osd1/activate.monmap 
--osd-data /home/osd1 --osd-journal /home/osd1/journal --osd-uuid 
241b30d8-b2ba-4380-81f8-2e30e6913bb2 --keyring /home/osd1/keyring --setuser 
ceph --setgroup ceph

[cphosd1][WARNIN] activate: Marking with init system systemd

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/systemd

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/systemd

[cphosd1][WARNIN] activate: Authorizing OSD key...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /home/osd1/keyring 
osd allow * mon allow profile osd

[cphosd1][WARNIN] added key for osd.0

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/active.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/active.22462.tmp

[cphosd1][WARNIN] activate: ceph osd.0 data dir is ready at /home/osd1

[cphosd1][WARNIN] activate_dir: Creating symlink /var/lib/ceph/osd/ceph-0 -> 
/home/osd1

[cphosd1][WARNIN] start_daemon: Starting ceph osd.0...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/systemctl 
enable ceph-osd@0

[c

Re: [ceph-users] ceph activation error

2017-04-13 Thread gjprabu
Hi Tom,



Yes , its mounted . I am using centos7 and kernel version 
3.10.0-229.el7.x86_64.



  /dev/xvda3 xfs   138G   33M  138G   1% /home





Regards

Prabu GJ





 On Thu, 13 Apr 2017 17:20:34 +0530 Tom Verhaeg 
 wrote 




Hi,



Is your OSD mounted correctly on the OS? 



Tom




From: ceph-users  on behalf of gjprabu 

 Sent: Thursday, April 13, 2017 1:13:34 PM
 To: ceph-users
 Subject: Re: [ceph-users] ceph activation error
 


Hi All,



  Anybody facing this similar issue.



Regards

Prabu GJ





 On Sat, 04 Mar 2017 09:50:35 +0530 gjprabu  
wrote 




Hi Team,



  I am installing new ceph setup(jewel) and here while activating OSD 
its throughing below error.



  I am using partition based osd like /home/osd1 not a entire disk. 
Earlier installation one month back all are working fine but this time i 
getting error like below.





[root@cphadmin mycluster]# ceph-deploy osd activate cphosd1:/home/osd1 
cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cphosd1:/home/osd1 cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : 

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cphosd1', 
'/home/osd1', None), ('cphosd2', '/home/osd2', None), ('cphosd3', '/home/osd3', 
None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks cphosd1:/home/osd1: 
cphosd2:/home/osd2: cphosd3:/home/osd3:

[cphosd1][DEBUG ] connected to host: cphosd1

[cphosd1][DEBUG ] detect platform information from remote host

[cphosd1][DEBUG ] detect machine type

[cphosd1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.1.1503 Core

[ceph_deploy.osd][DEBUG ] activating host cphosd1 disk /home/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cphosd1][DEBUG ] find the location of an executable

[cphosd1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate --mark-init 
systemd --mount /home/osd1

[cphosd1][WARNIN] main_activate: path = /home/osd1

[cphosd1][WARNIN] activate: Cluster uuid is 62b4f8c7-c00c-48d0-8262-549c9ef6074c

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cphosd1][WARNIN] activate: Cluster name is ceph

[cphosd1][WARNIN] activate: OSD uuid is 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] allocate_osd_id: Allocating OSD id...

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd 
create --concise 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] activate: OSD id is 0

[cphosd1][WARNIN] activate: Initializing OSD...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/home/osd1/activate.monmap

[cphosd1][WARNIN] got monmap epoch 2

[cphosd1][WARNIN] command: Running command: /usr/bin/timeout 300 ceph-osd 
--cluster ceph --mkfs --mkkey -i 0 --monmap /home/osd1/activate.monmap 
--osd-data /home/osd1 --osd-journal /home/osd1/journal --osd-uuid 
241b30d8-b2ba-4380-81f8-2e30e6913bb2 --keyring /home/osd1/keyring --setuser 
ceph --setgroup ceph

[cphosd1][WARNIN] activate: Marking with init system systemd

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/systemd

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/systemd

[cphosd1][WARNIN] activate: Authorizing OSD key...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /home/osd1/keyring 
osd allow * mon allow

Re: [ceph-users] failed lossy con, dropping message

2017-04-13 Thread Gregory Farnum
On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai 
wrote:

> Hello Greg,
>
> Thank you for the answer.
> I'm still in doubt with the "lossy". What does it mean in this context? I
> can think of different variants:
> 1. The designer of the protocol from start is considering the connection
> to be "lossy" so the connection errors are handled in a higher layer. So
> the layer that has observed the failure of the connection is just logging
> this event and will let the upper layer to handle it. This would support
> your statement 'since it's a "lossy" connection we don't need to remember
> the message and resend it.'


This one. :)
The messenger subsystem can be configured as lossy or non-lossy; all the
RADOS connecrions are lossy since a failure frequently means we'll have for
etargwt the operation anyway (to a different OSD). CephFS uses the state
full connections a bit more.
-Greg



>
> 2. A connection is not declared "lossy" as long as it is working properly.
> Once it ha lost some packets or some error threshold is reached, we declare
> the connection as being lossy, inform the higher layer, and let it decide
> what next. Compared with point 1. the actions are quite similar, but the
> usage of the "lossy" is different. At point 1. a connection is always
> "lossy" even if it is not losing any packet actually. In the second case
> the connection will became "lossy" when the errors will appear, so "lossy"
> is a runtime state of the connection.
>
> Maybe both are wrong and the truth is a third variant ... :) This is what
> I would like to understand.
>
> Kind regards,
> Laszlo
>
>
> On 13.04.2017 00:36, Gregory Farnum wrote:
> > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai 
> wrote:
> >> Hello,
> >>
> >> yesterday one of our compute nodes has recorded the following message
> for
> >> one of the ceph connections:
> >>
> >> submit_message osd_op(client.28817736.0:690186
> >> rbd_data.15c046b11ab57b7.00c4 [read 2097152~380928]
> 3.6f81364a
> >> ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:6818/6623,
> failed
> >> lossy con, dropping message
> >
> > A read message, sent to the OSD at IP 10.12.68.71:6818/6623, is being
> > dropped because the connection has somehow failed; since it's a
> > "lossy" connection we don't need to remember the message and resend
> > it. That failure could be an actual TCP/IP stack error; it could be
> > because a different thread killed the connection and it's now closed.
> >
> > If you've just got one of these and didn't see other problems, it's
> > innocuous — I expect the most common cause for this is an OSD getting
> > marked down while IO is pending to it. :)
> > -Greg
> >
> >>
> >> Can someone "decode" the above message, or direct me to some document
> where
> >> I could read more about it?
> >>
> >> We have ceph 0.94.10.
> >>
> >> Thank you,
> >> Laszlo
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] IO pausing during failures

2017-04-13 Thread Matthew Stroud
When our clusters hits a failure (e.g. Node going down or osd dying) our vms 
pause all IO for about 10 – 20 seconds. I’m curious if there is a way to fix or 
mitigate this?

Here is my ceph.conf:

[global]
fsid = fb991e48-c425-4f82-a70e-5ce748ae186b
mon_initial_members = mon01, mon02, mon03
mon_host = 10.20.57.10,10.20.57.11,10.20.57.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.20.57.0/24
cluster_network = 10.20.58.0/24
filestore_xattr_use_omap = true
mon_clock_drift_allowed = .15
mon_clock_drift_warn_backoff = 30
mon_osd_down_out_interval = 30
mon_osd_report_timeout = 300
mon_osd_full_ratio = .95
mon_osd_nearfull_ratio = .85
mon_osd_allow_primary_affinity = true
osd_backfill_full_ratio = .90
osd_journal_size = 1
osd_pool_default_size = 3
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 4096
osd_pool_default_pgp_num = 4096
osd_crush_chooseleaf_type = 1
max_open_files = 131072
osd_op_threads = 10
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
osd_client_op_priority = 63

[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true

And here is our osd tree:

ID WEIGHT   TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 15.91589 root default
-2  3.97897 host osd01
 0  1.98949 osd.0   up  1.0  1.0
 3  1.98949 osd.3   up  1.0  1.0
-3  3.97897 host osd02
 1  1.98949 osd.1   up  1.0  1.0
 4  1.98949 osd.4   up  1.0  1.0
-4  3.97897 host osd03
 2  1.98949 osd.2   up  1.0  1.0
 5  1.98949 osd.5   up  1.0  1.0
-5  3.97897 host osd04
 7  1.98949 osd.7   up  1.0  1.0
 6  1.98949 osd.6   up  1.0  1.0


Thanks a head of time.



CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Lionel Bouton
Hi,

Le 13/04/2017 à 10:51, Peter Maloney a écrit :
> [...]
> Also more things to consider...
>
> Ceph snapshots relly slow things down.

We use rbd snapshots on Firefly (and Hammer now) and I didn't see any
measurable impact on performance... until we tried to remove them. We
usually have at least one snapshot per VM image, often 3 or 4.
Note that we use BTRFS filestores where IIRC the CoW is handled by the
filesystem so it might be faster compared to the default/recommended XFS
filestores.

>  They aren't efficient like on
> zfs and btrfs. Having one might take away some % performance, and having
> 2 snaps takes potentially double, etc. until it is crawling. And it's
> not just the CoW... even just rbd snap rm, rbd diff, etc. starts to take
> many times longer. See http://tracker.ceph.com/issues/10823 for
> explanation of CoW. My goal is just to keep max 1 long term snapshot.[...]

In my experience with BTRFS filestores, snap rm impact is proportional
to the amount of data specific to the snapshot being removed (ie: not
present on any other snapshot) but completely unrelated to the number of
existing snapshots. For example the first one removed can be handled
very fast and it can be the last one removed that takes the most time
and impacts the most the performance.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread David Turner
I wouldn't set the default for osd_heartbeat_grace to 5 minutes, but inject
it when you see this happening.  It's a good to know what your cluster is
up to.  The fact that you aren't seeing the blocked requests any more tells
me that this was your issue.  It will go through, split everything, go a
while and then do it again months from now.

On Thu, Apr 13, 2017 at 4:43 AM Jogi Hofmüller  wrote:

> Dear David,
>
> Am Mittwoch, den 12.04.2017, 13:46 + schrieb David Turner:
> > I can almost guarantee what you're seeing is PG subfolder splitting.
>
> Evey day there's something new to learn about ceph ;)
>
> > When the subfolders in a PG get X number of objects, it splits into
> > 16 subfolders.  Every cluster I manage has blocked requests and OSDs
> > that get marked down while this is happening.  To stop the OSDs
> > getting marked down, I increase the osd_heartbeat_grace until the
> > OSDs no longer mark themselves down during this process.
>
> Thanks for the hint. I adjusted the values accordingly and will monitor
> our cluster. This morning there were no troubles at all btw. Still
> wondering what caused yesterday's mayhem ...
>
> Regards,
> --
> J.Hofmüller
>
>Nisiti
>- Abie Nathan, 1927-2008
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread mj

Hi,

On 04/13/2017 04:53 PM, Lionel Bouton wrote:

We use rbd snapshots on Firefly (and Hammer now) and I didn't see any
measurable impact on performance... until we tried to remove them.


What exactly do you mean with that?

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and short OSD failures in small cluster

2017-04-13 Thread Lionel Bouton
Le 13/04/2017 à 17:47, mj a écrit :
> Hi,
>
> On 04/13/2017 04:53 PM, Lionel Bouton wrote:
>> We use rbd snapshots on Firefly (and Hammer now) and I didn't see any
>> measurable impact on performance... until we tried to remove them.
>
> What exactly do you mean with that?

Just what I said : having snapshots doesn't impact performance, only
removing them (obviously until Ceph is finished cleaning up).

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failed lossy con, dropping message

2017-04-13 Thread Laszlo Budai

Hello Greg,

Thank you for the clarification. One last thing: can you point me to some 
documents that describes these? I would like to better understand what's going 
on behind the curtains ...

Kind regards,
Laszlo

On 13.04.2017 16:22, Gregory Farnum wrote:


On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai mailto:las...@componentsoft.eu>> wrote:

Hello Greg,

Thank you for the answer.
I'm still in doubt with the "lossy". What does it mean in this context? I 
can think of different variants:
1. The designer of the protocol from start is considering the connection to be 
"lossy" so the connection errors are handled in a higher layer. So the layer that has 
observed the failure of the connection is just logging this event and will let the upper layer to 
handle it. This would support your statement 'since it's a "lossy" connection we don't 
need to remember the message and resend it.'


This one. :)
The messenger subsystem can be configured as lossy or non-lossy; all the RADOS 
connecrions are lossy since a failure frequently means we'll have for etargwt 
the operation anyway (to a different OSD). CephFS uses the state full 
connections a bit more.
-Greg




2. A connection is not declared "lossy" as long as it is working properly. Once it ha lost some packets or some 
error threshold is reached, we declare the connection as being lossy, inform the higher layer, and let it decide what next. 
Compared with point 1. the actions are quite similar, but the usage of the "lossy" is different. At point 1. a 
connection is always "lossy" even if it is not losing any packet actually. In the second case the connection will 
became "lossy" when the errors will appear, so "lossy" is a runtime state of the connection.

Maybe both are wrong and the truth is a third variant ... :) This is what I 
would like to understand.

Kind regards,
Laszlo


On 13.04.2017 00:36, Gregory Farnum wrote:
> On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai mailto:las...@componentsoft.eu>> wrote:
>> Hello,
>>
>> yesterday one of our compute nodes has recorded the following message for
>> one of the ceph connections:
>>
>> submit_message osd_op(client.28817736.0:690186
>> rbd_data.15c046b11ab57b7.00c4 [read 2097152~380928] 
3.6f81364a
>> ack+read+known_if_redirected e3617) v5 remote, 10.12.68.71:6818/6623 
, failed
>> lossy con, dropping message
>
> A read message, sent to the OSD at IP 10.12.68.71:6818/6623 
, is being
> dropped because the connection has somehow failed; since it's a
> "lossy" connection we don't need to remember the message and resend
> it. That failure could be an actual TCP/IP stack error; it could be
> because a different thread killed the connection and it's now closed.
>
> If you've just got one of these and didn't see other problems, it's
> innocuous — I expect the most common cause for this is an OSD getting
> marked down while IO is pending to it. :)
> -Greg
>
>>
>> Can someone "decode" the above message, or direct me to some document 
where
>> I could read more about it?
>>
>> We have ceph 0.94.10.
>>
>> Thank you,
>> Laszlo
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG calculator improvement

2017-04-13 Thread Michael Kidd
Hello Frédéric,
  Thank you very much for the input.  I would like to ask for some feedback
from you, as well as the ceph-users list at large.

The PGCalc tool was created to help steer new Ceph users in the right
direction, but it's certainly difficult to account for every possible
scenario.  I'm struggling to find a way to implement something that would
work better for the scenario that you (Frédéric) describe, while still
being a useful starting point for the novice / more mainstream use cases.
I've also gotten complaints at the other end of the spectrum, that the tool
expects the user to know too much already, so accounting for the number of
objects is bound to add to this sentiment.

As the Ceph user base expands and the use cases diverge, we are definitely
finding more edge cases that are causing pain.  I'd love to make something
to help prevent these types of issues, but again, I worry about the
complexity introduced.

With this, I see a few possible ways forward:
* Simply re-wroding the %data to be % object count -- but this seems more
abstract, again leading to more confusion of new users.
* Increase complexity of the PG Calc tool, at the risk of further
alienating novice/mainstream users
* Add a disclaimer about the tool being a base for decision making, but
that certain edge cases require adjustments to the recommended PG count
and/or ceph.conf & sysctl values.
* Add a disclaimer urging the end user to secure storage consulting if
their use case falls into certain categories or they are new to Ceph to
ensure the cluster will meet their needs.

Having been on the storage consulting team and knowing the expertise they
have, I strongly believe that newcomers to Ceph (or new use cases inside of
established customers) should secure consulting before final decisions are
made on hardware... let alone the cluster is deployed.  I know it seems a
bit self-serving to make this suggestion as I work at Red Hat, but there is
a lot on the line when any establishment is storing potentially business
critical data.

I suspect the answer lies in a combination of the above or in something
I've not thought of.  Please do weigh in as any and all suggestions are
more than welcome.

Thanks,
Michael J. Kidd
Principal Software Maintenance Engineer
Red Hat Ceph Storage
+1 919-442-8878


On Wed, Apr 12, 2017 at 6:35 AM, Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

>
> Hi,
>
> I wanted to share a bad experience we had due to how the PG calculator
> works.
>
> When we set our production cluster months ago, we had to decide on the
> number of PGs to give to each pool in the cluster.
> As you know, the PG calc would recommended to give a lot of PGs to heavy
> pools in size, regardless the number of objects in the pools. How bad...
>
> We essentially had 3 pools to set on 144 OSDs :
>
> 1. a EC5+4 pool for the radosGW (.rgw.buckets) that would hold 80% of all
> datas in the cluster. PG calc recommended 2048 PGs.
> 2. a EC5+4 pool for zimbra's data (emails) that would hold 20% of all
> datas. PG calc recommended 512 PGs.
> 3. a replicated pool for zimbra's metadata (null size objects holding
> xattrs - used for deduplication) that would hold 0% of all datas. PG calc
> recommended 128 PGs, but we decided on 256.
>
> With 120M of objects in pool #3, as soon as we upgraded to Jewel, we hit
> the Jewel scrubbing bug (OSDs flapping).
> Before we could upgrade to patched Jewel, scrub all the cluster again
> prior to increasing the number of PGs on this pool, we had to take more
> than a hundred of snapshots (for backup/restoration purposes), with the
> number of objects still increasing in the pool. Then when a snapshot was
> removed, we hit the current Jewel snap trimming bug affecting pools with
> too many objects for the number of PGs. The only way we could stop the
> trimming was to stop OSDs resulting in PGs being degraded and not trimming
> anymore (snap trimming only happens on active+clean PGs).
>
> We're now just getting out of this hole, thanks to Nick's post regarding
> osd_snap_trim_sleep and RHCS support expertise.
>
> If the PG calc had considered not only the pools weight but also the
> number of expected objects in the pool (which we knew by that time), we
> wouldn't have it these 2 bugs.
> We hope this will help improving the ceph.com and RHCS PG calculators.
>
> Regards,
>
> Frédéric.
>
> --
>
> Frédéric Nass
>
> Sous-direction Infrastructures
> Direction du Numérique
> Université de Lorraine
>
> Tél : +33 3 72 74 11 35
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hummer upgrade stuck all OSDs down

2017-04-13 Thread Siniša Denić
Hi Richard, thank you for answer,

it seems I can't go back
I removed all luminous packages from all of 3 nodes and install jewel ones, on 
monitor node mon won't  start, due disk features incompatibility

root@ceph-node01:~# /usr/bin/ceph-mon -f --cluster ceph --id ceph-node01 
--setuser ceph --setgroup ceph
2017-04-13 14:50:18.174662 7fe9ade9c700 -1 ERROR: on disk data includes 
unsupported features: compat={},rocompat={},incompat={8=support monmap features}
2017-04-13 14:50:18.174670 7fe9ade9c700 -1 error checking features: (1) 
Operation not permitted

If I try to get something from OSDs it goes like:

root@ceph-node01:~# export ms="/home/ceph/monstore"; for osd in 
/var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path $osd --op 
update-mon-db --mon-store-path "$ms";done
On-disk OSD incompatible features set 
compat={},rocompat={},incompat={14=explicit missing set,15=fastinfo pg attr}
On-disk OSD incompatible features set 
compat={},rocompat={},incompat={14=explicit missing set,15=fastinfo pg attr}
On-disk OSD incompatible features set 
compat={},rocompat={},incompat={14=explicit missing set,15=fastinfo pg attr}
On-disk OSD incompatible features set 
compat={},rocompat={},incompat={14=explicit missing set,15=fastinfo pg attr}

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG calculator improvement

2017-04-13 Thread David Turner
I think what fits the need of Frédéric while not impacting the complexity
of the tool for new users would be a list of known "gotchas" in PG counts.
Like not having a Base2 count of PGs will cause each PG to be variable
sized (for each PG past the last Base2, you have 2 PGs that are half the
size of the others); Having less than X number of PG's for so much data on
your amount of OSDs will cause balance problems; Having more than X number
of objects for the PG's selected will cause issues; Having more than X
number of PG's per OSD total (not just per pool) can cause high memory
requirements (this is especially important for people setting up multiple
RGW zones); etc.

On Thu, Apr 13, 2017 at 12:58 PM Michael Kidd  wrote:

> Hello Frédéric,
>   Thank you very much for the input.  I would like to ask for some
> feedback from you, as well as the ceph-users list at large.
>
> The PGCalc tool was created to help steer new Ceph users in the right
> direction, but it's certainly difficult to account for every possible
> scenario.  I'm struggling to find a way to implement something that would
> work better for the scenario that you (Frédéric) describe, while still
> being a useful starting point for the novice / more mainstream use cases.
> I've also gotten complaints at the other end of the spectrum, that the tool
> expects the user to know too much already, so accounting for the number of
> objects is bound to add to this sentiment.
>
> As the Ceph user base expands and the use cases diverge, we are definitely
> finding more edge cases that are causing pain.  I'd love to make something
> to help prevent these types of issues, but again, I worry about the
> complexity introduced.
>
> With this, I see a few possible ways forward:
> * Simply re-wroding the %data to be % object count -- but this seems more
> abstract, again leading to more confusion of new users.
> * Increase complexity of the PG Calc tool, at the risk of further
> alienating novice/mainstream users
> * Add a disclaimer about the tool being a base for decision making, but
> that certain edge cases require adjustments to the recommended PG count
> and/or ceph.conf & sysctl values.
> * Add a disclaimer urging the end user to secure storage consulting if
> their use case falls into certain categories or they are new to Ceph to
> ensure the cluster will meet their needs.
>
> Having been on the storage consulting team and knowing the expertise they
> have, I strongly believe that newcomers to Ceph (or new use cases inside of
> established customers) should secure consulting before final decisions are
> made on hardware... let alone the cluster is deployed.  I know it seems a
> bit self-serving to make this suggestion as I work at Red Hat, but there
> is a lot on the line when any establishment is storing potentially business
> critical data.
>
> I suspect the answer lies in a combination of the above or in something
> I've not thought of.  Please do weigh in as any and all suggestions are
> more than welcome.
>
> Thanks,
> Michael J. Kidd
> Principal Software Maintenance Engineer
> Red Hat Ceph Storage
> +1 919-442-8878 <(919)%20442-8878>
>
>
> On Wed, Apr 12, 2017 at 6:35 AM, Frédéric Nass <
> frederic.n...@univ-lorraine.fr> wrote:
>
>>
>> Hi,
>>
>> I wanted to share a bad experience we had due to how the PG calculator
>> works.
>>
>> When we set our production cluster months ago, we had to decide on the
>> number of PGs to give to each pool in the cluster.
>> As you know, the PG calc would recommended to give a lot of PGs to heavy
>> pools in size, regardless the number of objects in the pools. How bad...
>>
>> We essentially had 3 pools to set on 144 OSDs :
>>
>> 1. a EC5+4 pool for the radosGW (.rgw.buckets) that would hold 80% of all
>> datas in the cluster. PG calc recommended 2048 PGs.
>> 2. a EC5+4 pool for zimbra's data (emails) that would hold 20% of all
>> datas. PG calc recommended 512 PGs.
>> 3. a replicated pool for zimbra's metadata (null size objects holding
>> xattrs - used for deduplication) that would hold 0% of all datas. PG calc
>> recommended 128 PGs, but we decided on 256.
>>
>> With 120M of objects in pool #3, as soon as we upgraded to Jewel, we hit
>> the Jewel scrubbing bug (OSDs flapping).
>> Before we could upgrade to patched Jewel, scrub all the cluster again
>> prior to increasing the number of PGs on this pool, we had to take more
>> than a hundred of snapshots (for backup/restoration purposes), with the
>> number of objects still increasing in the pool. Then when a snapshot was
>> removed, we hit the current Jewel snap trimming bug affecting pools with
>> too many objects for the number of PGs. The only way we could stop the
>> trimming was to stop OSDs resulting in PGs being degraded and not trimming
>> anymore (snap trimming only happens on active+clean PGs).
>>
>> We're now just getting out of this hole, thanks to Nick's post regarding
>> osd_snap_trim_sleep and RHCS support expertise.
>>
>> If the PG calc had consider

Re: [ceph-users] failed lossy con, dropping message

2017-04-13 Thread Gregory Farnum
On Thu, Apr 13, 2017 at 9:27 AM, Laszlo Budai  wrote:
> Hello Greg,
>
> Thank you for the clarification. One last thing: can you point me to some
> documents that describes these? I would like to better understand what's
> going on behind the curtains ...

Unfortunately I don't think anything like that really exists outside
of developer discussions and the source code itself.

>
> Kind regards,
> Laszlo
>
> On 13.04.2017 16:22, Gregory Farnum wrote:
>>
>>
>> On Thu, Apr 13, 2017 at 2:17 AM Laszlo Budai > > wrote:
>>
>> Hello Greg,
>>
>> Thank you for the answer.
>> I'm still in doubt with the "lossy". What does it mean in this
>> context? I can think of different variants:
>> 1. The designer of the protocol from start is considering the
>> connection to be "lossy" so the connection errors are handled in a higher
>> layer. So the layer that has observed the failure of the connection is just
>> logging this event and will let the upper layer to handle it. This would
>> support your statement 'since it's a "lossy" connection we don't need to
>> remember the message and resend it.'
>>
>>
>> This one. :)
>> The messenger subsystem can be configured as lossy or non-lossy; all the
>> RADOS connecrions are lossy since a failure frequently means we'll have for
>> etargwt the operation anyway (to a different OSD). CephFS uses the state
>> full connections a bit more.
>> -Greg
>>
>>
>>
>>
>> 2. A connection is not declared "lossy" as long as it is working
>> properly. Once it ha lost some packets or some error threshold is reached,
>> we declare the connection as being lossy, inform the higher layer, and let
>> it decide what next. Compared with point 1. the actions are quite similar,
>> but the usage of the "lossy" is different. At point 1. a connection is
>> always "lossy" even if it is not losing any packet actually. In the second
>> case the connection will became "lossy" when the errors will appear, so
>> "lossy" is a runtime state of the connection.
>>
>> Maybe both are wrong and the truth is a third variant ... :) This is
>> what I would like to understand.
>>
>> Kind regards,
>> Laszlo
>>
>>
>> On 13.04.2017 00:36, Gregory Farnum wrote:
>> > On Wed, Apr 12, 2017 at 3:00 AM, Laszlo Budai
>> mailto:las...@componentsoft.eu>> wrote:
>> >> Hello,
>> >>
>> >> yesterday one of our compute nodes has recorded the following
>> message for
>> >> one of the ceph connections:
>> >>
>> >> submit_message osd_op(client.28817736.0:690186
>> >> rbd_data.15c046b11ab57b7.00c4 [read 2097152~380928]
>> 3.6f81364a
>> >> ack+read+known_if_redirected e3617) v5 remote,
>> 10.12.68.71:6818/6623 , failed
>> >> lossy con, dropping message
>> >
>> > A read message, sent to the OSD at IP 10.12.68.71:6818/6623
>> , is being
>> > dropped because the connection has somehow failed; since it's a
>> > "lossy" connection we don't need to remember the message and resend
>> > it. That failure could be an actual TCP/IP stack error; it could be
>> > because a different thread killed the connection and it's now
>> closed.
>> >
>> > If you've just got one of these and didn't see other problems, it's
>> > innocuous — I expect the most common cause for this is an OSD
>> getting
>> > marked down while IO is pending to it. :)
>> > -Greg
>> >
>> >>
>> >> Can someone "decode" the above message, or direct me to some
>> document where
>> >> I could read more about it?
>> >>
>> >> We have ceph 0.94.10.
>> >>
>> >> Thank you,
>> >> Laszlo
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com 
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread ceph . novice
Hey Trey.

Sounds great, we were discussing the same kind of requirements and couldn't 
agree on/find something "useful"... so THANK YOU for sharing!!!

It would be great if you could provide some more details or an example how you 
configure the "bucket user" and sub-users and all that stuff.
Even more interesting for me, how do the "different ppl or services" access 
that buckets/objects afterwards?! I mean via which tools (s3cmd, boto, 
cyberduck, mix of some, ...) and are there any ACLs set/in use as well?!
 
(sorry if this all sounds somehow dumb but I'm a just a novice ;) )
 
best
 Anton
 

Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr
Von: "Trey Palmer" 
An: ceph-us...@ceph.com
Betreff: [ceph-users] Question about RadosGW subusers

Probably a question for @yehuda :
 

We have fairly strict user accountability requirements.  The best way we have 
found to meet them with S3 object storage on Ceph is by using RadosGW subusers.
 
If we set up one user per bucket, then set up subusers to provide separate 
individual S3 keys and access rights for different people or services using 
that bucket, then we can track who did what via access key in the RadosGW logs 
(at debug_rgw = 10/10).
 
Of course, this is not a documented use case for subusers.  I'm wondering if 
Yehuda or anyone else could estimate our risk of future incompatibility if we 
implement user/key management around subusers in this manner?
 
Thanks,
 
Trey___ ceph-users mailing list 
ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fsping, why you no work no mo?

2017-04-13 Thread Dan van der Ster
Dear ceph-*,

A couple weeks ago I wrote this simple tool to measure the round-trip
latency of a shared filesystem.

   https://github.com/dvanders/fsping

In our case, the tool is to be run from two clients who mount the same
CephFS.

First, start the server (a.k.a. the ping reflector) on one machine in a
CephFS directory:

   ./fsping --server

Then, from another client machine and in the same directory, start the
fsping client (aka the ping emitter):

./fsping --prefix 

The idea is that the "client" writes a syn file, the reflector notices it,
and writes an ack file. The time for the client to notice the ack file is
what I call the rtt.

And the output looks like normal ping, so that's neat. (The README.md shows
a working example)


Anyway, two weeks ago when I wrote this, it was working very well on my
CephFS clusters (running 10.2.5, IIRC). I was seeing ~20ms rtt for small
files, which is more or less what I was expecting on my test cluster.

But when I run fsping today, it does one of two misbehaviours:

  1. Most of the time it just hangs, both on the reflector and on the
emitter. The fsping processes are stuck in some uninterruptible state --
only an MDS failover breaks them out. I tried with and without
fuse_disable_pagecache -- no big difference.

  2. When I increase the fsping --size to 512kB, it works a bit more
reliably. But there is a weird bimodal distribution with most "packets"
having 20-30ms rtt, some ~20% having ~5-6 seconds rtt, and some ~5% taking
~10-11s. I suspected the mds_tick_interval -- but decreasing that didn't
help.


In summary, if someone is curious, please give this tool a try on your
CephFS cluster -- let me know if its working or not (and what rtt you can
achieve with which configuration).
And perhaps a dev would understand why it is not working with latest jewel
ceph-fuse / ceph MDS's?

Best Regards,

Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] python3-rados

2017-04-13 Thread Josh Durgin

On 04/12/2017 09:26 AM, Gerald Spencer wrote:

Ah I'm running Jewel. Is there any information online about python3-rados
with Kraken? I'm having difficulties finding more then I initially posted.


What info are you looking for?

The interface for the python bindings is the same for python 2 and 3.
The python3-rados package was added in kraken to a) compile the cython
against python3 and b) put the module where python3 will find it.

The python-rados package still installs the python2 version of the
bindings.

Josh


On Mon, Apr 10, 2017 at 10:37 PM, Wido den Hollander  wrote:




Op 8 april 2017 om 4:03 schreef Gerald Spencer :


Do the rados bindings exist for python3?
I see this sprinkled in various areas..
https://github.com/ceph/ceph/pull/7621
https://github.com/ceph/ceph/blob/master/debian/python3-rados.install

This being said, I can not find said package


What version of Ceph do you have installed? You need at least Kraken for
Python 3.

Wido


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW lifecycle bucket stuck processing?

2017-04-13 Thread Ben Hines
I initiated a manual lifecycle cleanup with:

radosgw-admin lc process

It took over a day working on my bucket called 'bucket1'  (w/2 million
objects) and seems like it eventually got stuck with about 1.7 million objs
left, with uninformative errors like:  (notice the timestamps)


2017-04-12 18:50:15.706952 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:16.841254 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:17.153323 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:20.752924 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-12 18:50:25.400460 7f90aa5dcc80  0 ERROR: rgw_remove_object
2017-04-13 03:19:30.027773 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
10.29.16.53:6801/20291 conn(0x7f9084002990 :-1 s=STATE_OPEN pgs=167140106
cs=1 l=0).fault initiating reconnect
2017-04-13 03:36:30.721085 7f9099069700  0 -- 10.29.16.57:0/3392796805 >>
10.29.16.53:6801/20291 conn(0x7f90841d6ef0 :-1 s=STATE_OPEN pgs=167791627
cs=1 l=0).fault initiating reconnect
2017-04-13 03:46:46.143055 7f90aa5dcc80  0 ERROR: rgw_remove_object


This morning i aborted it with control-c. Now 'lc list' still shows the
bucket as processing, and lc process returns quickly, as if the bucket is
still locked:



radosgw-admin lc list

...
{
"bucket": ":bucket1:default.42048218.4",
"status": "PROCESSING"
},


-bash-4.2$ time radosgw-admin lc process
2017-04-13 11:07:48.482671 7f4fbeb87c80  0 System already converted

real0m17.785s



Is is possible it left behind a stale lock on the bucket due to the
control-c?


-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread Trey Palmer
Anton,

It turns out that Adam Emerson is trying to get bucket policies and roles
merged in time for Luminous:

https://github.com/ceph/ceph/pull/14307

Given this, I think we will only be using subusers temporarily as a method
to track which human or service did what in which bucket.  This seems to us
much easier than trying to deal with ACL's without any concept of groups,
roles, or policies, in buckets that can often have millions of objects.

Here is the general idea:


1.  Each bucket has a user ("master user"), but we don't use or issue that
set of keys at all.

radosgw-admin user create --uid=mybucket --display-name="My Bucket"

You can of course have multiple buckets per user but so far for us it has
been simple to have one user per bucket, with the username the same as the
bucket name.   If a human needs access to more than one bucket, we will
create multiple subusers for them.   That's not convenient, but it's
temporary.

So what we're doing is effectively making the user into the group, with the
subusers being the users, and each user only capable of being in one group.
  Very suboptimal, but better than the total chaos that would result from
giving everyone the same set of keys for a given bucket.


2.  For each human user or service/machine user of that bucket, we create
subusers.You can do this via:

## full-control ops user
radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice
--access=full --gen-access-key --gen-secret --key-type=s3

## write-only server user
radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon
--access=write --gen-access-key --gen-secret-key --key-type=s3

If you then do a "radosgw-admin metadata get user:mybucket", the JSON
output contains the subusers and their keys.


3.  Raise the RGW log level in ceph.conf to make an "access key id" line
available for each request, which you can then map to a subuser if/when you
need to track who did what after the fact.  In ceph.conf:

debug_rgw = 10/10

This will cause the logs to be VERY verbose, an order of magnitude and some
change more verbose than default.   We plan to discard most of the logs
while feeding them into ElasticSearch.

We might not need this much log verbosity once we have policies and are
using unique users rather than subusers.

Nevertheless, I hope we can eventually reduce the log level of the "access
key id" line, as we have a pretty mainstream use case and I'm certain that
tracking S3 request users will be required for many organizations for
accounting and forensic purposes just as it is for us.

-- Trey

On Thu, Apr 13, 2017 at 1:29 PM,  wrote:

> Hey Trey.
>
> Sounds great, we were discussing the same kind of requirements and
> couldn't agree on/find something "useful"... so THANK YOU for sharing!!!
>
> It would be great if you could provide some more details or an example how
> you configure the "bucket user" and sub-users and all that stuff.
> Even more interesting for me, how do the "different ppl or services"
> access that buckets/objects afterwards?! I mean via which tools (s3cmd,
> boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as
> well?!
>
> (sorry if this all sounds somehow dumb but I'm a just a novice ;) )
>
> best
>  Anton
>
>
> Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr
> Von: "Trey Palmer" 
> An: ceph-us...@ceph.com
> Betreff: [ceph-users] Question about RadosGW subusers
>
> Probably a question for @yehuda :
>
>
> We have fairly strict user accountability requirements.  The best way we
> have found to meet them with S3 object storage on Ceph is by using RadosGW
> subusers.
>
> If we set up one user per bucket, then set up subusers to provide separate
> individual S3 keys and access rights for different people or services using
> that bucket, then we can track who did what via access key in the RadosGW
> logs (at debug_rgw = 10/10).
>
> Of course, this is not a documented use case for subusers.  I'm wondering
> if Yehuda or anyone else could estimate our risk of future incompatibility
> if we implement user/key management around subusers in this manner?
>
> Thanks,
>
> Trey___ ceph-users mailing
> list ceph-users@lists.ceph.com http://lists.ceph.com/
> listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread ceph . novice
Thanks a lot, Trey.

I'll try that stuff next week, once back from Easter holidays.
And some "multi site" and "metasearch" is also still on my to-be-tested list. 
Need badly to free up some time for all the interesting "future of storage" 
things.

BTW., we are on Kraken and I'd hope to see more of the new and shiny stuff here 
soon (something like 11.2.X) instead of waiting for Luminous late 2017. Not 
sure how the CEPH release policy is usually?!

Anyhow, thanks and happy Easter everyone!
Anton
 

Gesendet: Donnerstag, 13. April 2017 um 20:15 Uhr
Von: "Trey Palmer" 
An: ceph.nov...@habmalnefrage.de
Cc: "Trey Palmer" , ceph-us...@ceph.com
Betreff: Re: [ceph-users] Question about RadosGW subusers

Anton, 
 
It turns out that Adam Emerson is trying to get bucket policies and roles 
merged in time for Luminous:
 
https://github.com/ceph/ceph/pull/14307
 
Given this, I think we will only be using subusers temporarily as a method to 
track which human or service did what in which bucket.  This seems to us much 
easier than trying to deal with ACL's without any concept of groups, roles, or 
policies, in buckets that can often have millions of objects.
 
Here is the general idea:
 
 
1.  Each bucket has a user ("master user"), but we don't use or issue that set 
of keys at all.   

 
radosgw-admin user create --uid=mybucket --display-name="My Bucket"
 
You can of course have multiple buckets per user but so far for us it has been 
simple to have one user per bucket, with the username the same as the bucket 
name.   If a human needs access to more than one bucket, we will create 
multiple subusers for them.   That's not convenient, but it's temporary.  
 
So what we're doing is effectively making the user into the group, with the 
subusers being the users, and each user only capable of being in one group.   
Very suboptimal, but better than the total chaos that would result from giving 
everyone the same set of keys for a given bucket.
 
 
2.  For each human user or service/machine user of that bucket, we create 
subusers.    You can do this via: 
 
## full-control ops user
radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice 
--access=full --gen-access-key --gen-secret --key-type=s3
 
## write-only server user
radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon 
--access=write --gen-access-key --gen-secret-key --key-type=s3
 
If you then do a "radosgw-admin metadata get user:mybucket", the JSON output 
contains the subusers and their keys.
 
 
3.  Raise the RGW log level in ceph.conf to make an "access key id" line 
available for each request, which you can then map to a subuser if/when you 
need to track who did what after the fact.  In ceph.conf:
 
debug_rgw = 10/10
 
This will cause the logs to be VERY verbose, an order of magnitude and some 
change more verbose than default.   We plan to discard most of the logs while 
feeding them into ElasticSearch.
 
We might not need this much log verbosity once we have policies and are using 
unique users rather than subusers. 
 
Nevertheless, I hope we can eventually reduce the log level of the "access key 
id" line, as we have a pretty mainstream use case and I'm certain that tracking 
S3 request users will be required for many organizations for accounting and 
forensic purposes just as it is for us.
 
    -- Trey
 
On Thu, Apr 13, 2017 at 1:29 PM, 
mailto:ceph.nov...@habmalnefrage.de]> wrote:Hey 
Trey.

Sounds great, we were discussing the same kind of requirements and couldn't 
agree on/find something "useful"... so THANK YOU for sharing!!!

It would be great if you could provide some more details or an example how you 
configure the "bucket user" and sub-users and all that stuff.
Even more interesting for me, how do the "different ppl or services" access 
that buckets/objects afterwards?! I mean via which tools (s3cmd, boto, 
cyberduck, mix of some, ...) and are there any ACLs set/in use as well?!
 
(sorry if this all sounds somehow dumb but I'm a just a novice ;) )
 
best
 Anton
 

Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr
Von: "Trey Palmer" mailto:t...@mailchimp.com]>
An: ceph-us...@ceph.com[mailto:ceph-us...@ceph.com]
Betreff: [ceph-users] Question about RadosGW subusers

Probably a question for @yehuda :
 

We have fairly strict user accountability requirements.  The best way we have 
found to meet them with S3 object storage on Ceph is by using RadosGW subusers.
 
If we set up one user per bucket, then set up subusers to provide separate 
individual S3 keys and access rights for different people or services using 
that bucket, then we can track who did what via access key in the RadosGW logs 
(at debug_rgw = 10/10).
 
Of course, this is not a documented use case for subusers.  I'm wondering if 
Yehuda or anyone else could estimate our risk of future incompatibility if we 
implement user/key management around subusers in this manner?
 
Thanks,
 Trey___ ceph-users mailing list 
c

Re: [ceph-users] fsping, why you no work no mo?

2017-04-13 Thread Andras Pataki

Hi Dan,

I don't have a solution to the problem, I can only second that we've 
also been seeing strange problems when more than one node accesses the 
same file in ceph and at least one of them opens it for writing.  I've 
tried verbose logging on the client (fuse), and it seems that the fuse 
client sends some cap request to the MDS and does not get a response 
sometimes.  And it looks like it has some 5 second polling interval, and 
that sometimes (but not always) saves the day and the client continues 
with a 5 second-ish delay.  This does not happen when multiple processes 
open the file for reading, but it does when processes open it for 
writing (even if they never write to the file and only read 
afterwards).  I have some earlier mailing list messages from a week or 
two ago describing what we see more in detail (including log outputs).  
I think the issue has in some way to do with cap requests being 
lost/miscommunicated between the client and the MDS.


Andras


On 04/13/2017 01:41 PM, Dan van der Ster wrote:

Dear ceph-*,

A couple weeks ago I wrote this simple tool to measure the round-trip 
latency of a shared filesystem.


https://github.com/dvanders/fsping

In our case, the tool is to be run from two clients who mount the same 
CephFS.


First, start the server (a.k.a. the ping reflector) on one machine in 
a CephFS directory:


   ./fsping --server

Then, from another client machine and in the same directory, start the 
fsping client (aka the ping emitter):


./fsping --prefix 

The idea is that the "client" writes a syn file, the reflector notices 
it, and writes an ack file. The time for the client to notice the ack 
file is what I call the rtt.


And the output looks like normal ping, so that's neat. (The README.md 
shows a working example)



Anyway, two weeks ago when I wrote this, it was working very well on 
my CephFS clusters (running 10.2.5, IIRC). I was seeing ~20ms rtt for 
small files, which is more or less what I was expecting on my test 
cluster.


But when I run fsping today, it does one of two misbehaviours:

  1. Most of the time it just hangs, both on the reflector and on the 
emitter. The fsping processes are stuck in some uninterruptible state 
-- only an MDS failover breaks them out. I tried with and without 
fuse_disable_pagecache -- no big difference.


  2. When I increase the fsping --size to 512kB, it works a bit more 
reliably. But there is a weird bimodal distribution with most 
"packets" having 20-30ms rtt, some ~20% having ~5-6 seconds rtt, and 
some ~5% taking ~10-11s. I suspected the mds_tick_interval -- but 
decreasing that didn't help.



In summary, if someone is curious, please give this tool a try on your 
CephFS cluster -- let me know if its working or not (and what rtt you 
can achieve with which configuration).
And perhaps a dev would understand why it is not working with latest 
jewel ceph-fuse / ceph MDS's?


Best Regards,

Dan




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-13 Thread ceph . novice

Ups... thanks for your efforts, Ben!

 

This could explain some bit's. Still I have lot's of question as it seems different S3 tools/clients behaive different. We need to stick on CyberDuck on Windows and s3cms and boto on Linux and many things are not the same with RadosGW :|

 

And more on my to-test-list ;)

 

Regards

 Anton

 

Gesendet: Mittwoch, 12. April 2017 um 06:49 Uhr
Von: "Ben Hines" 
An: ceph.nov...@habmalnefrage.de
Cc: ceph-users , "Yehuda Sadeh-Weinraub" 
Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."


After much banging on this and reading through the Ceph RGW source, i figured out Ceph RadosGW returns -13 ( EACCES - AcessDenied) if you dont pass in a 'Prefix' in your S3 lifecycle configuration setting. It also returns EACCES if the XML is invalid in any way, which is probably not the most correct /  user friendly result.
 

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html specifies 'Prefix' as Optional, so i'll put in a bug for this.

 

-Ben

 

 
On Mon, Apr 3, 2017 at 12:14 PM, Ben Hines  wrote:


Interesting.  
I'm wondering what the -13 return code for the op execution in my debug output is (can't find in the source..)

 

 

 

I just tried out setting the lifecycle with cyberduck and got this error, which is probably the other bug with AWSv4 auth, http://tracker.ceph.com/issues/17076   Not sure if cyberduck can be forced to use V2.


 


2017-04-03 12:07:15.093235 7f5617024700 10 op=20RGWPutLC_ObjStore_S3

2017-04-03 12:07:15.093248 7f5617024700  2 req 14:0.000438:s3:PUT /bentest/:put_lifecycle:authorizing


.


2017-04-03 12:07:15.093637 7f5617024700 10 delaying v4 auth

2017-04-03 12:07:15.093643 7f5617024700 10 ERROR: AWS4 completion for this operation NOT IMPLEMENTED

2017-04-03 12:07:15.093652 7f5617024700 10 failed to authorize request

2017-04-03 12:07:15.093658 7f5617024700 20 handler->ERRORHANDLER: err_no=-2201 new_err_no=-2201

2017-04-03 12:07:15.093844 7f5617024700  2 req 14:0.001034:s3:PUT /bentest/:put_lifecycle:op status=0

2017-04-03 12:07:15.093859 7f5617024700  2 req 14:0.001050:s3:PUT /bentest/:put_lifecycle:http status=501

2017-04-03 12:07:15.093884 7f5617024700  1 == req done req=0x7f561701e340 op status=0 http_status=501 ==

 

 


 

-Ben







 
On Mon, Apr 3, 2017 at 7:16 AM,  wrote:





... hmm, "modify" gives no error and may be the option to use, but I don't see anything related to an "expires" meta field

 


[root s3cmd-master]# ./s3cmd --no-ssl --verbose modify s3://Test/INSTALL --expiry-days=365
INFO: Summary: 1 remote files to modify
modify: 's3://Test/INSTALL'


[root s3cmd-master]# ./s3cmd --no-ssl --verbose info s3://Test/INSTALL
s3://Test/INSTALL (object):
   File size: 3123
   Last mod:  Mon, 03 Apr 2017 12:35:28 GMT
   MIME type: text/plain
   Storage:   STANDARD
   MD5 sum:   63834dbb20b32968505c4ebe768fc8c4
   SSE:   none
   policy:    http://s3.amazonaws.com/doc/2006-03-01/">Test1000falseINSTALL2017-04-03T12:35:28.533Z"63834dbb20b32968505c4ebe768fc8c4"3123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z"708efc3b9184c8b112e36062804aca1e"88STANDARD666First User
   cors:    none
   ACL:   First User: FULL_CONTROL
   x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root

 


 

Gesendet: Montag, 03. April 2017 um 14:13 Uhr
Von: ceph.nov...@habmalnefrage.de
An: ceph-users 



Betreff: Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."





... additional strange but a bit different info related to the "permission denied"
 
[root s3cmd-master]# ./s3cmd --no-ssl put INSTALL s3://Test/ --expiry-days=5
upload: 'INSTALL' -> 's3://Test/INSTALL' [1 of 1]
3123 of 3123 100% in 0s 225.09 kB/s done

[root s3cmd-master]# ./s3cmd info s3://Test/INSTALL
s3://Test/INSTALL (object):
File size: 3123
Last mod: Mon, 03 Apr 2017 12:01:47 GMT
MIME type: text/plain
Storage: STANDARD
MD5 sum: 63834dbb20b32968505c4ebe768fc8c4
SSE: none
policy: http://s3.amazonaws.com/doc/2006-03-01/">Test1000falseINSTALL2017-04-03T12:01:47.745Z"63834dbb20b32968505c4ebe768fc8c4"3123STANDARD666First UserREADME.TXT2017-03-31T22:36:38.380Z"708efc3b9184c8b112e36062804aca1e"88STANDARD666First User
cors: none
ACL: First User: FULL_CONTROL
x-amz-meta-s3cmd-attrs: atime:1491218263/ctime:1490998096/gid:0/gname:root/md5:63834dbb20b32968505c4ebe768fc8c4/mode:33188/mtime:1488021707/uid:0/uname:root

[root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/ --expiry-days=365
ERROR: Access to bucket 'Test' was denied
ERROR: S3 error: 403 (AccessDenied)

[root s3cmd-master]# ./s3cmd --no-ssl expire s3://Test/INSTALL --expiry-days=365
ERROR: Parameter problem: Expecting S3 U

Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Deepak Naidu
Ok, I tried strace to check why vi slows or pauses. It seems to slow on fsync(3)

I didn't see the issue with nano editor.

--
Deepak


From: Deepak Naidu
Sent: Wednesday, April 12, 2017 2:18 PM
To: 'ceph-users'
Subject: saving file on cephFS mount using vi takes pause/time

Folks,

This is bit weird issue. I am using the cephFS volume to read write files etc 
its quick less than seconds. But when editing a the file on cephFS volume using 
vi , when saving the file the save takes couple of seconds something like 
sync(flush). The same doesn't happen on local filesystem.

Any pointers is appreciated.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Chris Sarginson
Is it related to this the recovery behaviour of vim creating a swap file,
which I think nano does not do?

http://vimdoc.sourceforge.net/htmldoc/recover.html

A sync into cephfs I think needs the write to get confirmed all the way
down from the osds performing the write before it returns the confirmation
to the client calling the sync, though I stand to be corrected on that.

On Thu, 13 Apr 2017 at 22:04 Deepak Naidu  wrote:

> Ok, I tried strace to check why vi slows or pauses. It seems to slow on
> *fsync(3)*
>
>
>
> I didn’t see the issue with nano editor.
>
>
>
> --
>
> Deepak
>
>
>
>
>
> *From:* Deepak Naidu
> *Sent:* Wednesday, April 12, 2017 2:18 PM
> *To:* 'ceph-users'
> *Subject:* saving file on cephFS mount using vi takes pause/time
>
>
>
> Folks,
>
>
>
> This is bit weird issue. I am using the cephFS volume to read write files
> etc its quick less than seconds. But when editing a the file on cephFS
> volume using vi , when saving the file the save takes couple of seconds
> something like sync(flush). The same doesn’t happen on local filesystem.
>
>
>
> Any pointers is appreciated.
>
>
>
> --
>
> Deepak
> --
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread Ben Hines
Based on past LTS release dates would predict Luminous much sooner than
that, possibly even in May...  http://docs.ceph.com/docs/master/releases/

The docs also say "Spring" http://docs.ceph.com/docs/master/release-notes/

-Ben

On Thu, Apr 13, 2017 at 12:11 PM,  wrote:

> Thanks a lot, Trey.
>
> I'll try that stuff next week, once back from Easter holidays.
> And some "multi site" and "metasearch" is also still on my to-be-tested
> list. Need badly to free up some time for all the interesting "future of
> storage" things.
>
> BTW., we are on Kraken and I'd hope to see more of the new and shiny stuff
> here soon (something like 11.2.X) instead of waiting for Luminous late
> 2017. Not sure how the CEPH release policy is usually?!
>
> Anyhow, thanks and happy Easter everyone!
> Anton
>
>
> Gesendet: Donnerstag, 13. April 2017 um 20:15 Uhr
> Von: "Trey Palmer" 
> An: ceph.nov...@habmalnefrage.de
> Cc: "Trey Palmer" , ceph-us...@ceph.com
> Betreff: Re: [ceph-users] Question about RadosGW subusers
>
> Anton,
>
> It turns out that Adam Emerson is trying to get bucket policies and roles
> merged in time for Luminous:
>
> https://github.com/ceph/ceph/pull/14307
>
> Given this, I think we will only be using subusers temporarily as a method
> to track which human or service did what in which bucket.  This seems to us
> much easier than trying to deal with ACL's without any concept of groups,
> roles, or policies, in buckets that can often have millions of objects.
>
> Here is the general idea:
>
>
> 1.  Each bucket has a user ("master user"), but we don't use or issue that
> set of keys at all.
>
>
> radosgw-admin user create --uid=mybucket --display-name="My Bucket"
>
> You can of course have multiple buckets per user but so far for us it has
> been simple to have one user per bucket, with the username the same as the
> bucket name.   If a human needs access to more than one bucket, we will
> create multiple subusers for them.   That's not convenient, but it's
> temporary.
>
> So what we're doing is effectively making the user into the group, with
> the subusers being the users, and each user only capable of being in one
> group.   Very suboptimal, but better than the total chaos that would result
> from giving everyone the same set of keys for a given bucket.
>
>
> 2.  For each human user or service/machine user of that bucket, we create
> subusers.You can do this via:
>
> ## full-control ops user
> radosgw-admin subuser create --uid=mybucket --subuser=mybucket:alice
> --access=full --gen-access-key --gen-secret --key-type=s3
>
> ## write-only server user
> radosgw-admin subuser create --uid=mybucket --subuser=mybucket:daemon
> --access=write --gen-access-key --gen-secret-key --key-type=s3
>
> If you then do a "radosgw-admin metadata get user:mybucket", the JSON
> output contains the subusers and their keys.
>
>
> 3.  Raise the RGW log level in ceph.conf to make an "access key id" line
> available for each request, which you can then map to a subuser if/when you
> need to track who did what after the fact.  In ceph.conf:
>
> debug_rgw = 10/10
>
> This will cause the logs to be VERY verbose, an order of magnitude and
> some change more verbose than default.   We plan to discard most of the
> logs while feeding them into ElasticSearch.
>
> We might not need this much log verbosity once we have policies and are
> using unique users rather than subusers.
>
> Nevertheless, I hope we can eventually reduce the log level of the "access
> key id" line, as we have a pretty mainstream use case and I'm certain that
> tracking S3 request users will be required for many organizations for
> accounting and forensic purposes just as it is for us.
>
> -- Trey
>
> On Thu, Apr 13, 2017 at 1:29 PM,  ceph.nov...@habmalnefrage.de]> wrote:Hey Trey.
>
> Sounds great, we were discussing the same kind of requirements and
> couldn't agree on/find something "useful"... so THANK YOU for sharing!!!
>
> It would be great if you could provide some more details or an example how
> you configure the "bucket user" and sub-users and all that stuff.
> Even more interesting for me, how do the "different ppl or services"
> access that buckets/objects afterwards?! I mean via which tools (s3cmd,
> boto, cyberduck, mix of some, ...) and are there any ACLs set/in use as
> well?!
>
> (sorry if this all sounds somehow dumb but I'm a just a novice ;) )
>
> best
>  Anton
>
>
> Gesendet: Dienstag, 11. April 2017 um 00:17 Uhr
> Von: "Trey Palmer" mailto:t...@mailchimp.com]>
> An: ceph-us...@ceph.com[mailto:ceph-us...@ceph.com]
> Betreff: [ceph-users] Question about RadosGW subusers
>
> Probably a question for @yehuda :
>
>
> We have fairly strict user accountability requirements.  The best way we
> have found to meet them with S3 object storage on Ceph is by using RadosGW
> subusers.
>
> If we set up one user per bucket, then set up subusers to provide separate
> individual S3 keys and access rights for different people or services using
> that bucket, th

Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Deepak Naidu
Yes via creates a swap file and nano doesn’t. But when I try fio to write, I 
don’t see this happening.

--
Deepak

From: Chris Sarginson [mailto:csarg...@gmail.com]
Sent: Thursday, April 13, 2017 2:26 PM
To: Deepak Naidu; ceph-users
Subject: Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

Is it related to this the recovery behaviour of vim creating a swap file, which 
I think nano does not do?

http://vimdoc.sourceforge.net/htmldoc/recover.html

A sync into cephfs I think needs the write to get confirmed all the way down 
from the osds performing the write before it returns the confirmation to the 
client calling the sync, though I stand to be corrected on that.

On Thu, 13 Apr 2017 at 22:04 Deepak Naidu 
mailto:dna...@nvidia.com>> wrote:
Ok, I tried strace to check why vi slows or pauses. It seems to slow on fsync(3)

I didn’t see the issue with nano editor.

--
Deepak


From: Deepak Naidu
Sent: Wednesday, April 12, 2017 2:18 PM
To: 'ceph-users'
Subject: saving file on cephFS mount using vi takes pause/time

Folks,

This is bit weird issue. I am using the cephFS volume to read write files etc 
its quick less than seconds. But when editing a the file on cephFS volume using 
vi , when saving the file the save takes couple of seconds something like 
sync(flush). The same doesn’t happen on local filesystem.

Any pointers is appreciated.

--
Deepak

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] python3-rados

2017-04-13 Thread Gerald Spencer
We've wrote out own python3 bindings for rados, but would rather use a
community supported version. I'm looking to import rados in python3..

On Thu, Apr 13, 2017 at 11:01 AM, Josh Durgin  wrote:

> On 04/12/2017 09:26 AM, Gerald Spencer wrote:
>
>> Ah I'm running Jewel. Is there any information online about python3-rados
>> with Kraken? I'm having difficulties finding more then I initially posted.
>>
>
> What info are you looking for?
>
> The interface for the python bindings is the same for python 2 and 3.
> The python3-rados package was added in kraken to a) compile the cython
> against python3 and b) put the module where python3 will find it.
>
> The python-rados package still installs the python2 version of the
> bindings.
>
> Josh
>
>
> On Mon, Apr 10, 2017 at 10:37 PM, Wido den Hollander 
>> wrote:
>>
>>
>>> Op 8 april 2017 om 4:03 schreef Gerald Spencer :


 Do the rados bindings exist for python3?
 I see this sprinkled in various areas..
 https://github.com/ceph/ceph/pull/7621
 https://github.com/ceph/ceph/blob/master/debian/python3-rados.install

 This being said, I can not find said package

>>>
>>> What version of Ceph do you have installed? You need at least Kraken for
>>> Python 3.
>>>
>>> Wido
>>>
>>> ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph activation error

2017-04-13 Thread gjprabu
Hi Tom,

 

Is there any solution for this issue.



Regards

Prabu GJ




 On Thu, 13 Apr 2017 18:31:36 +0530 gjprabu  
wrote 




Hi Tom,



Yes , its mounted . I am using centos7 and kernel version 
3.10.0-229.el7.x86_64.



  /dev/xvda3 xfs   138G   33M  138G   1% /home





Regards

Prabu GJ





 On Thu, 13 Apr 2017 17:20:34 +0530 Tom Verhaeg 
 wrote 










___ 

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


Hi,



Is your OSD mounted correctly on the OS? 



Tom




From: ceph-users  on behalf of gjprabu 

 Sent: Thursday, April 13, 2017 1:13:34 PM
 To: ceph-users
 Subject: Re: [ceph-users] ceph activation error
 


Hi All,



  Anybody facing this similar issue.



Regards

Prabu GJ





 On Sat, 04 Mar 2017 09:50:35 +0530 gjprabu  
wrote 




Hi Team,



  I am installing new ceph setup(jewel) and here while activating OSD 
its throughing below error.



  I am using partition based osd like /home/osd1 not a entire disk. 
Earlier installation one month back all are working fine but this time i 
getting error like below.





[root@cphadmin mycluster]# ceph-deploy osd activate cphosd1:/home/osd1 
cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cphosd1:/home/osd1 cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : 

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cphosd1', 
'/home/osd1', None), ('cphosd2', '/home/osd2', None), ('cphosd3', '/home/osd3', 
None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks cphosd1:/home/osd1: 
cphosd2:/home/osd2: cphosd3:/home/osd3:

[cphosd1][DEBUG ] connected to host: cphosd1

[cphosd1][DEBUG ] detect platform information from remote host

[cphosd1][DEBUG ] detect machine type

[cphosd1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.1.1503 Core

[ceph_deploy.osd][DEBUG ] activating host cphosd1 disk /home/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cphosd1][DEBUG ] find the location of an executable

[cphosd1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate --mark-init 
systemd --mount /home/osd1

[cphosd1][WARNIN] main_activate: path = /home/osd1

[cphosd1][WARNIN] activate: Cluster uuid is 62b4f8c7-c00c-48d0-8262-549c9ef6074c

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cphosd1][WARNIN] activate: Cluster name is ceph

[cphosd1][WARNIN] activate: OSD uuid is 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] allocate_osd_id: Allocating OSD id...

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd 
create --concise 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] activate: OSD id is 0

[cphosd1][WARNIN] activate: Initializing OSD...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/home/osd1/activate.monmap

[cphosd1][WARNIN] got monmap epoch 2

[cphosd1][WARNIN] command: Running command: /usr/bin/timeout 300 ceph-osd 
--cluster ceph --mkfs --mkkey -i 0 --monmap /home/osd1/activate.monmap 
--osd-data /home/osd1 --osd-journal /home/osd1/journal --osd-uuid 
241b30d8-b2ba-4380-81f8-2e30e6913bb2 --keyring /home/osd1/keyring --setuser 
ceph --setgroup ceph

[cphosd1][WARNIN] activate: Marking with init system systemd

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/systemd

[cphosd1][WARNIN] command: Runn