[ceph-users] Cache data consistency among multiple RGW instances

2015-01-18 Thread ZHOU Yuan
Hi list,

I'm trying to understand the RGW cache consistency model. My Ceph
cluster has multiple RGW instances with HAProxy as the load balancer.
HAProxy would choose one RGW instance to serve the request(with
round-robin).
The question is if RGW cache was enabled, which is the default
behavior, there seem to be some cache inconsistency issue. e.g.,
object0 was cached in RGW-0 and RGW-1 at the same time. Sometime later
it was updated from RGW-0. In this case if the next read was issued to
RGW-1, the outdated cache would be served out then since RGW-1 wasn't
aware of the updates. Thus the data would be inconsistent. Is this
behavior expected or is there anything I missed?

Sincerely, Yuan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS aborted after recovery and active, FAILED assert (r =0)

2015-01-18 Thread Mohd Bazli Ab Karim
Hi John,

Good shot!
I've increased the osd_max_write_size to 1GB (still smaller than osd journal 
size) and now the mds still running fine after an hour.
Now checking if fs still accessible or not. Will update from time to time.

Thanks again John.

Regards,
Bazli


-Original Message-
From: john.sp...@inktank.com [mailto:john.sp...@inktank.com] On Behalf Of John 
Spray
Sent: Friday, January 16, 2015 11:58 PM
To: Mohd Bazli Ab Karim
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: MDS aborted after recovery and active, FAILED assert (r =0)

It has just been pointed out to me that you can also workaround this issue on 
your existing system by increasing the osd_max_write_size setting on your OSDs 
(default 90MB) to something higher, but still smaller than your osd journal 
size.  That might get you on a path to having an accessible filesystem before 
you consider an upgrade.

John

On Fri, Jan 16, 2015 at 10:57 AM, John Spray john.sp...@redhat.com wrote:
 Hmm, upgrading should help here, as the problematic data structure
 (anchortable) no longer exists in the latest version.  I haven't
 checked, but hopefully we don't try to write it during upgrades.

 The bug you're hitting is more or less the same as a similar one we
 have with the sessiontable in the latest ceph, but you won't hit it
 there unless you're very unlucky!

 John

 On Fri, Jan 16, 2015 at 7:37 AM, Mohd Bazli Ab Karim
 bazli.abka...@mimos.my wrote:
 Dear Ceph-Users, Ceph-Devel,

 Apologize me if you get double post of this email.

 I am running a ceph cluster version 0.72.2 and one MDS (in fact, it's 3, 2 
 down and only 1 up) at the moment.
 Plus I have one CephFS client mounted to it.

 Now, the MDS always get aborted after recovery and active for 4 secs.
 Some parts of the log are as below:

 -3 2015-01-15 14:10:28.464706 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.19 10.4.118.32:6821/243161 73 
 osd_op_re
 ply(3742 1000240c57e. [create 0~0,setxattr (99)]
 v56640'1871414 uv1871414 ondisk = 0) v6  221+0+0 (261801329 0 0)
 0x
 7770bc80 con 0x69c7dc0
 -2 2015-01-15 14:10:28.464730 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.18 10.4.118.32:6818/243072 67 
 osd_op_re
 ply(3645 107941c. [tmapup 0~0] v56640'1769567 uv1769567
 ondisk = 0) v6  179+0+0 (3759887079 0 0) 0x7757ec80 con
 0x1c6bb00
 -1 2015-01-15 14:10:28.464754 7fbcc8226700  1 --
 10.4.118.21:6800/5390 == osd.47 10.4.118.35:6809/8290 79 
 osd_op_repl
 y(3419 mds_anchortable [writefull 0~94394932] v0'0 uv0 ondisk = -90
 (Message too long)) v6  174+0+0 (3942056372 0 0) 0x69f94
 a00 con 0x1c6b9a0
  0 2015-01-15 14:10:28.471684 7fbcc8226700 -1 mds/MDSTable.cc:
 In function 'void MDSTable::save_2(int, version_t)' thread 7
 fbcc8226700 time 2015-01-15 14:10:28.46
 mds/MDSTable.cc: 83: FAILED assert(r = 0)

  ceph version  ()
  1: (MDSTable::save_2(int, unsigned long)+0x325) [0x769e25]
  2: (Context::complete(int)+0x9) [0x568d29]
  3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x1097) [0x7c15d7]
  4: (MDS::handle_core_message(Message*)+0x5a0) [0x588900]
  5: (MDS::_dispatch(Message*)+0x2f) [0x58908f]
  6: (MDS::ms_dispatch(Message*)+0x1e3) [0x58ab93]
  7: (DispatchQueue::entry()+0x549) [0x975739]
  8: (DispatchQueue::DispatchThread::entry()+0xd) [0x8902dd]
  9: (()+0x7e9a) [0x7fbcccb0de9a]
  10: (clone()+0x6d) [0x7fbccb4ba3fd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Is there any workaround/patch to fix this issue? Let me know if need to see 
 the log with debug-mds of certain level as well.
 Any helps would be very much appreciated.

 Thanks.
 Bazli
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note that any dealing, review, distribution, printing, 
copying or use of this e-mail is strictly prohibited. If you have received this 
email in error, please notify the sender immediately and delete the original 
message (including any attachments).


MIMOS Berhad is a research and development institution under the purview of the 
Malaysian Ministry of Science, Technology and Innovation. Opinions, conclusions 
and other information in this e-mail that do not relate to the official 
business of MIMOS Berhad and/or its subsidiaries shall be understood as neither 
given nor endorsed by MIMOS Berhad and/or its subsidiaries and neither MIMOS 
Berhad nor its subsidiaries accepts responsibility for the same. All liability 
arising from or in connection with computer viruses and/or corrupted e-mails is 
excluded to the fullest extent permitted by law.

[ceph-users] rgw-agent copy file failed

2015-01-18 Thread baijia...@126.com
when I write a file named 1234% in the master region, and rgw-agent send copy 
obj request which contains  x-amz-copy-source:nofilter_bucket_1/1234%   to 
the rep region fail 404 error;

I analysis that rgw-agent can't encode url 
x-amz-copy-source:nofilter_bucket_1/1234% , but rgw could decode 
x-amz-copy-source in the function RGWCopyObj::parse_copy_location.
so 1234% decode to 1234, and fail.

can you check this question?




baijia...@126.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache pool tiering SSD journal

2015-01-18 Thread lidc...@redhat.com
No, if you used cache tiering, It is no need to use ssd journal again.

From: Florent MONTHEL
Date: 2015-01-17 23:43
To: ceph-users
Subject: [ceph-users] Cache pool tiering  SSD journal
Hi list,

With cache pool tiering (in write back mode) enhancement, should I keep to use 
SSD journal on SSD ?
Can we have 1 big SSD pool for caching for all low cost storage pools ?
Thanks

Florent Monthel





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-18 Thread Jiri Kanicky

Hi George,

List disks available:
# $ ceph-deploy disk list {node-name [node-name]...}

Add OSD using osd create:
# $ ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]

Or you can use the manual steps to prepare and activate disk described 
at 
http://ceph.com/docs/master/start/quick-ceph-deploy/#expanding-your-cluster


Jiri

On 15/01/2015 06:36, Georgios Dimitrakakis wrote:

Hi all!

I would like to expand our CEPH Cluster and add a second OSD node.

In this node I will have ten 4TB disks dedicated to CEPH.

What is the proper way of putting them in the already available CEPH 
node?


I guess that the first thing to do is to prepare them with ceph-deploy 
and mark them as out at preparation.


I should then restart the services and add (mark as in) one of them. 
Afterwards, I have to wait for the rebalance
to occur and upon finishing I will add the second and so on. Is this 
safe enough?



How long do you expect the rebalancing procedure to take?


I already have ten more 4TB disks at another node and the amount of 
data is around 40GB with 2x replication factor.

The connection is over Gigabit.


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache pool tiering SSD journal

2015-01-18 Thread Mark Nelson

On 01/17/2015 08:17 PM, lidc...@redhat.com wrote:

No, if you used cache tiering, It is no need to use ssd journal again.


The cache tiering and SSD journals serve a somewhat different purpose. 
In Ceph, all of the data for every single write is written to both the 
journal and to the data storage device.  SSD journals allow you to avoid 
additional coalesced O_DSYNC sequential writes to the data disk.  In 
some situations this can provide up to a 2X write performance 
improvement on the base tier OSDs.  Cache pool tiering may also provide 
some coalescing of writes to the base pool, but doesn't help you avoid 
the additional journal write penalty on the base tier OSDs.  It does 
however provide the benefit of allowing you to read hot data from the 
cache tier and potentially avoid read/write head seek contention if you 
have spinning disks on the base tier.


Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Giant on Centos 7 with custom cluster name

2015-01-18 Thread Jiri Kanicky

Hi,

I have upgraded Firefly to Giant on Debian Wheezy and it went without 
any problems.


Jiri


On 16/01/2015 06:49, Erik McCormick wrote:

Hello all,

I've got an existing Firefly cluster on Centos 7 which I deployed with 
ceph-deploy. In the latest version of ceph-deploy, it refuses to 
handle commands issued with a cluster name.


[ceph_deploy.install][ERROR ] custom cluster names are not supported 
on sysvinit hosts


This is a production cluster. Small, but still production. Is it safe 
to go through manually upgrading the packages? I'd hate to do the 
upgrade and find out I can no longer start the cluster because it 
can't be called anything other than ceph.


Thanks,
Erik


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cache pool tiering SSD journal

2015-01-18 Thread Lindsay Mathieson
On Sun, 18 Jan 2015 10:17:50 AM lidc...@redhat.com wrote:
 No, if you used cache tiering, It is no need to use ssd journal again.


Really? writes are as fast as with ssd journals?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] two mount points, two diffrent data

2015-01-18 Thread RafaƂ Michalak

 Because you are not using a cluster aware filesystem - the respective
 mounts
 don't know when changes are made to the underlying block device (rbd) by
 the
 other mount. What you are doing *will* lead to file corruption.

 Your need to use a distributed filesystem such as GFS2 or cephfs.

 CephFS would be probably be the easiest to setup.


Thanks for help
I use cephfs and it's working great !
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-18 Thread Georgios Dimitrakakis

Hi Jiri,

thanks for the feedback.

My main concern is if it's better to add each OSD one-by-one and wait 
for the cluster to rebalance every time or do it all-together at once.


Furthermore an estimate of the time to rebalance would be great!

Regards,


George


Hi George,

 List disks available:
 # $ ceph-deploy disk list {node-name [node-name]...}

 Add OSD using osd create:
 # $ ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]

 Or you can use the manual steps to prepare and activate disk
described at

http://ceph.com/docs/master/start/quick-ceph-deploy/#expanding-your-cluster
[3]

 Jiri

On 15/01/2015 06:36, Georgios Dimitrakakis wrote:


Hi all!

I would like to expand our CEPH Cluster and add a second OSD node.

In this node I will have ten 4TB disks dedicated to CEPH.

What is the proper way of putting them in the already available
CEPH node?

I guess that the first thing to do is to prepare them with
ceph-deploy and mark them as out at preparation.

I should then restart the services and add (mark as in) one of
them. Afterwards, I have to wait for the rebalance
to occur and upon finishing I will add the second and so on. Is
this safe enough?

How long do you expect the rebalancing procedure to take?

I already have ten more 4TB disks at another node and the amount of
data is around 40GB with 2x replication factor.
The connection is over Gigabit.

Best,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3]

http://ceph.com/docs/master/start/quick-ceph-deploy/#expanding-your-cluster


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com