Re: cephfs survey results

2014-11-04 Thread Sage Weil
On Tue, 4 Nov 2014, Blair Bethwaite wrote:
 On 4 November 2014 01:50, Sage Weil s...@newdream.net wrote:
  In the Ceph session at the OpenStack summit someone asked what the CephFS
  survey results looked like.
 
 Thanks Sage, that was me!
 
   Here's the link:
 
  https://www.surveymonkey.com/results/SM-L5JV7WXL/
 
  In short, people want
 
  fsck
  multimds
  snapshots
  quotas
 
 TBH I'm a bit surprised by a couple of these and hope maybe you guys
 will apply a certain amount of filtering on this...
 
 fsck and quotas were there for me, but multimds and snapshots are what
 I'd consider icing features - they're nice to have but not on the
 critical path to using cephfs instead of e.g. nfs in a production
 setting. I'd have thought stuff like small file performance and
 gateway support was much more relevant to uptake and
 positive/pain-free UX. Interested to hear others rationale here.

Yeah, I agree, and am taking the results with a grain of salt.  I 
think the results are heavily influenced by the order they were 
originally listed (I whish surveymonkey would randomize is for each 
person or something).

fsck is a clear #1.  Everybody wants multimds, but I think very few 
actually need it at this point.  We'll be merging a soft quota patch 
shortly, and things like performance (adding the inline data support to 
the kernel client, for instance) will probably compete with getting 
snapshots working (as part of a larger subvolume infrastructure).  That's 
my guess at least; for now, we're really focused on fsck and hard 
usability edges and haven't set priorities beyond that.

We're definitely interested in hearing feedback on this strategy, and on 
peoples' experiences with giant so far...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] libceph: message signature support

2014-11-04 Thread Yan, Zheng
Signed-off-by: Yan, Zheng z...@redhat.com
---
 fs/ceph/mds_client.c   | 16 +++
 include/linux/ceph/auth.h  | 26 +
 include/linux/ceph/ceph_features.h |  1 +
 include/linux/ceph/messenger.h |  9 +-
 include/linux/ceph/msgr.h  |  8 ++
 net/ceph/auth_x.c  | 58 ++
 net/ceph/messenger.c   | 32 +++--
 net/ceph/osd_client.c  | 16 +++
 8 files changed, 162 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 2eab332..14ca763 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3668,6 +3668,20 @@ static struct ceph_msg *mds_alloc_msg(struct 
ceph_connection *con,
return msg;
 }
 
+static int sign_message(struct ceph_connection *con, struct ceph_msg *msg)
+{
+   struct ceph_mds_session *s = con-private;
+   struct ceph_auth_handshake *auth = s-s_auth;
+   return ceph_auth_sign_message(auth, msg);
+}
+
+static int check_message_signature(struct ceph_connection *con, struct 
ceph_msg *msg)
+{
+   struct ceph_mds_session *s = con-private;
+   struct ceph_auth_handshake *auth = s-s_auth;
+   return ceph_auth_check_message_signature(auth, msg);
+}
+
 static const struct ceph_connection_operations mds_con_ops = {
.get = con_get,
.put = con_put,
@@ -3677,6 +3691,8 @@ static const struct ceph_connection_operations 
mds_con_ops = {
.invalidate_authorizer = invalidate_authorizer,
.peer_reset = peer_reset,
.alloc_msg = mds_alloc_msg,
+   .sign_message = sign_message,
+   .check_message_signature = check_message_signature,
 };
 
 /* eof */
diff --git a/include/linux/ceph/auth.h b/include/linux/ceph/auth.h
index 5f33868..260d78b 100644
--- a/include/linux/ceph/auth.h
+++ b/include/linux/ceph/auth.h
@@ -13,6 +13,7 @@
 
 struct ceph_auth_client;
 struct ceph_authorizer;
+struct ceph_msg;
 
 struct ceph_auth_handshake {
struct ceph_authorizer *authorizer;
@@ -20,6 +21,10 @@ struct ceph_auth_handshake {
size_t authorizer_buf_len;
void *authorizer_reply_buf;
size_t authorizer_reply_buf_len;
+   int (*sign_message)(struct ceph_auth_handshake *auth,
+   struct ceph_msg *msg);
+   int (*check_message_signature)(struct ceph_auth_handshake *auth,
+  struct ceph_msg *msg);
 };
 
 struct ceph_auth_client_ops {
@@ -66,6 +71,11 @@ struct ceph_auth_client_ops {
void (*reset)(struct ceph_auth_client *ac);
 
void (*destroy)(struct ceph_auth_client *ac);
+
+   int (*sign_message)(struct ceph_auth_handshake *auth,
+   struct ceph_msg *msg);
+   int (*check_message_signature)(struct ceph_auth_handshake *auth,
+  struct ceph_msg *msg);
 };
 
 struct ceph_auth_client {
@@ -113,4 +123,20 @@ extern int ceph_auth_verify_authorizer_reply(struct 
ceph_auth_client *ac,
 extern void ceph_auth_invalidate_authorizer(struct ceph_auth_client *ac,
int peer_type);
 
+static inline int ceph_auth_sign_message(struct ceph_auth_handshake *auth,
+struct ceph_msg *msg)
+{
+   if (auth-sign_message)
+   return auth-sign_message(auth, msg);
+   return 0;
+}
+
+static inline
+int ceph_auth_check_message_signature(struct ceph_auth_handshake *auth,
+ struct ceph_msg *msg)
+{
+   if (auth-check_message_signature)
+   return auth-check_message_signature(auth, msg);
+   return 0;
+}
 #endif
diff --git a/include/linux/ceph/ceph_features.h 
b/include/linux/ceph/ceph_features.h
index d12659c..71e05bb 100644
--- a/include/linux/ceph/ceph_features.h
+++ b/include/linux/ceph/ceph_features.h
@@ -84,6 +84,7 @@ static inline u64 ceph_sanitize_features(u64 features)
 CEPH_FEATURE_PGPOOL3 | \
 CEPH_FEATURE_OSDENC |  \
 CEPH_FEATURE_CRUSH_TUNABLES |  \
+CEPH_FEATURE_MSG_AUTH |\
 CEPH_FEATURE_CRUSH_TUNABLES2 | \
 CEPH_FEATURE_REPLY_CREATE_INODE |  \
 CEPH_FEATURE_OSDHASHPSPOOL |   \
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 40ae58e..d9d396c 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -42,6 +42,10 @@ struct ceph_connection_operations {
struct ceph_msg * (*alloc_msg) (struct ceph_connection *con,
struct ceph_msg_header *hdr,
int *skip);
+   int (*sign_message) (struct ceph_connection *con, struct ceph_msg *msg);
+
+   int (*check_message_signature) (struct ceph_connection *con,
+   struct ceph_msg *msg);
 };
 
 /* use 

[PATCH 1/2] libceph: store session key in cephx authorizer

2014-11-04 Thread Yan, Zheng
Session key is required when calculating message signature. Save the session key
in authorizer, this avoid lookup ticket handler for each message

Signed-off-by: Yan, Zheng z...@redhat.com
---
 net/ceph/auth_x.c | 18 +++---
 net/ceph/auth_x.h |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index de6662b..8da8568 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -298,6 +298,11 @@ static int ceph_x_build_authorizer(struct ceph_auth_client 
*ac,
dout(build_authorizer for %s %p\n,
 ceph_entity_type_name(th-service), au);
 
+   ceph_crypto_key_destroy(au-session_key);
+   ret = ceph_crypto_key_clone(au-session_key, th-session_key);
+   if (ret)
+   return ret;
+
maxlen = sizeof(*msg_a) + sizeof(msg_b) +
ceph_x_encrypt_buflen(ticket_blob_len);
dout(  need len %d\n, maxlen);
@@ -307,8 +312,10 @@ static int ceph_x_build_authorizer(struct ceph_auth_client 
*ac,
}
if (!au-buf) {
au-buf = ceph_buffer_new(maxlen, GFP_NOFS);
-   if (!au-buf)
+   if (!au-buf) {
+   ceph_crypto_key_destroy(au-session_key);
return -ENOMEM;
+   }
}
au-service = th-service;
au-secret_id = th-secret_id;
@@ -334,7 +341,7 @@ static int ceph_x_build_authorizer(struct ceph_auth_client 
*ac,
get_random_bytes(au-nonce, sizeof(au-nonce));
msg_b.struct_v = 1;
msg_b.nonce = cpu_to_le64(au-nonce);
-   ret = ceph_x_encrypt(th-session_key, msg_b, sizeof(msg_b),
+   ret = ceph_x_encrypt(au-session_key, msg_b, sizeof(msg_b),
 p, end - p);
if (ret  0)
goto out_buf;
@@ -593,17 +600,13 @@ static int ceph_x_verify_authorizer_reply(struct 
ceph_auth_client *ac,
  struct ceph_authorizer *a, size_t len)
 {
struct ceph_x_authorizer *au = (void *)a;
-   struct ceph_x_ticket_handler *th;
int ret = 0;
struct ceph_x_authorize_reply reply;
void *preply = reply;
void *p = au-reply_buf;
void *end = p + sizeof(au-reply_buf);
 
-   th = get_ticket_handler(ac, au-service);
-   if (IS_ERR(th))
-   return PTR_ERR(th);
-   ret = ceph_x_decrypt(th-session_key, p, end, preply, sizeof(reply));
+   ret = ceph_x_decrypt(au-session_key, p, end, preply, sizeof(reply));
if (ret  0)
return ret;
if (ret != sizeof(reply))
@@ -623,6 +626,7 @@ static void ceph_x_destroy_authorizer(struct 
ceph_auth_client *ac,
 {
struct ceph_x_authorizer *au = (void *)a;
 
+   ceph_crypto_key_destroy(au-session_key);
ceph_buffer_put(au-buf);
kfree(au);
 }
diff --git a/net/ceph/auth_x.h b/net/ceph/auth_x.h
index 65ee720..e8b7c69 100644
--- a/net/ceph/auth_x.h
+++ b/net/ceph/auth_x.h
@@ -26,6 +26,7 @@ struct ceph_x_ticket_handler {
 
 
 struct ceph_x_authorizer {
+   struct ceph_crypto_key session_key;
struct ceph_buffer *buf;
unsigned int service;
u64 nonce;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Async messenger benchmark with latest master

2014-11-04 Thread Haomai Wang
OK, I will solve it if free. It's a problem when one connection want
to replace existing connection for the same endpoint.

Thank you!


On Tue, Nov 4, 2014 at 10:41 AM, Alexandre DERUMIER aderum...@odiso.com wrote:
could you run ceph -w --debug-ms=10/10 ?

 Here the output.  (monitor ips are 10.11.1.27,10.11.1.28,10.11.1.29)

 # ceph -w --debug-ms=10/10
 2014-11-04 10:38:16.155461 7fdf4414b700 10 EpollDriver.add_event add event to 
 fd=4 mask=1
 2014-11-04 10:38:16.155474 7fdf4414b700 10 Event create_file_event create 
 event fd=4 mask=1 now mask is 1
 2014-11-04 10:38:16.155626 7fdf4414b700 10 EpollDriver.add_event add event to 
 fd=7 mask=1
 2014-11-04 10:38:16.155630 7fdf4414b700 10 Event create_file_event create 
 event fd=7 mask=1 now mask is 1
 2014-11-04 10:38:16.155774 7fdf4414b700 10 -- :/0 ready :/0
 2014-11-04 10:38:16.155783 7fdf4414b700  1  Processor -- start start
 2014-11-04 10:38:16.155785 7fdf4414b700  1 -- :/0 start start
 2014-11-04 10:38:16.155841 7fdf419e2700 10 --entry starting
 2014-11-04 10:38:16.155834 7fdf411e1700 10 --entry starting
 2014-11-04 10:38:16.155883 7fdf419e2700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.155899 7fdf411e1700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.156711 7fdf4414b700 10 -- :/1009064 create_connect 
 10.11.1.29:6789/0, creating connection and registering
 2014-11-04 10:38:16.156747 7fdf4414b700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=-1 :0 s=STATE_NONE pgs=0 cs=0 l=1)._connect 0
 2014-11-04 10:38:16.156761 7fdf4414b700  1 Event wakeup
 2014-11-04 10:38:16.156774 7fdf4414b700 10 -- :/1009064 get_connection mon.2 
 10.11.1.29:6789/0 new 0x1f56090
 2014-11-04 10:38:16.156793 7fdf4414b700  1 Event wakeup
 2014-11-04 10:38:16.156812 7fdf4414b700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=-1 :0 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_message
 2014-11-04 10:38:16.157084 7fdf419e2700 10 EpollDriver.add_event add event to 
 fd=9 mask=1
 2014-11-04 10:38:16.157106 7fdf419e2700 10 Event create_file_event create 
 event fd=9 mask=1 now mask is 1
 2014-11-04 10:38:16.157136 7fdf419e2700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 
 l=1).handle_write started.
 2014-11-04 10:38:16.157199 7fdf419e2700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.157206 7fdf4414b700  1 Event wakeup
 2014-11-04 10:38:16.157259 7fdf419e2700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 
 l=1).handle_write started.
 2014-11-04 10:38:16.157284 7fdf419e2700 10 EpollDriver.add_event add event to 
 fd=9 mask=3
 2014-11-04 10:38:16.157286 7fdf419e2700 10 Event create_file_event create 
 event fd=9 mask=2 now mask is 3
 2014-11-04 10:38:16.157290 7fdf419e2700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.157306 7fdf419e2700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_BANNER pgs=0 cs=0 
 l=1)._process_connection get banner, ready to send banner
 2014-11-04 10:38:16.157348 7fdf419e2700 10 -- :/1009064  10.11.1.29:6789/0 
 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_IDENTIFY_PEER pgs=0 cs=0 
 l=1)._process_connection connect write banner done: 10.11.1.29:6789/0
 2014-11-04 10:38:16.157376 7fdf419e2700  1 -- 10.11.1.27:0/1009064 
 learned_addr learned my addr 10.11.1.27:0/1009064
 2014-11-04 10:38:16.157394 7fdf419e2700 10 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_SEND_CONNECT_MSG 
 pgs=0 cs=0 l=1)._process_connection connect sent my addr 10.11.1.27:0/1009064
 2014-11-04 10:38:16.157415 7fdf419e2700 10 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_SEND_CONNECT_MSG 
 pgs=0 cs=0 l=1)._process_connection connect sending gseq=1 cseq=0 proto=15
 2014-11-04 10:38:16.157434 7fdf419e2700 10 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 
 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=1).handle_write started.
 2014-11-04 10:38:16.157442 7fdf419e2700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.157446 7fdf419e2700 10 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 
 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=1).handle_write started.
 2014-11-04 10:38:16.157451 7fdf419e2700 10 Event process_events wait second 
 30 usec 0
 2014-11-04 10:38:16.157560 7fdf419e2700 10 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 
 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 
 l=1).handle_connect_replygot CEPH_MSGR_TAG_SEQ, reading acked_seq and writing 
 in_seq
 2014-11-04 10:38:16.157580 7fdf419e2700  2 -- 10.11.1.27:0/1009064  
 10.11.1.29:6789/0 conn(0x1f56090 sd=9 :0 s=STATE_CONNECTING_WAIT_ACK_SEQ 
 pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 18446744073709551615 
 vs out_seq 0
 2014-11-04 10:38:16.157591 7fdf419e2700  2 -- 

[PATCH 2/7] ceph: remove unused `map_waiters` from osdc client

2014-11-04 Thread John Spray
This is initialized but never used.

Signed-off-by: John Spray john.sp...@redhat.com
---
 include/linux/ceph/osd_client.h | 1 -
 net/ceph/osd_client.c   | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 03aeb27..7cb5cea 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -197,7 +197,6 @@ struct ceph_osd_client {
 
struct ceph_osdmap *osdmap;   /* current map */
struct rw_semaphoremap_sem;
-   struct completion  map_waiters;
u64last_requested_map;
 
struct mutex   request_mutex;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 5a75395..75ab07c 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2537,7 +2537,6 @@ int ceph_osdc_init(struct ceph_osd_client *osdc, struct 
ceph_client *client)
osdc-client = client;
osdc-osdmap = NULL;
init_rwsem(osdc-map_sem);
-   init_completion(osdc-map_waiters);
osdc-last_requested_map = 0;
mutex_init(osdc-request_mutex);
osdc-last_tid = 0;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] ceph: update ceph_msg_header structure

2014-11-04 Thread John Spray
2 bytes of what was reserved space is now used by
userspace for the compat_version field.

Signed-off-by: John Spray john.sp...@redhat.com
---
 include/linux/ceph/msgr.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/ceph/msgr.h b/include/linux/ceph/msgr.h
index cac4b28..1c18872 100644
--- a/include/linux/ceph/msgr.h
+++ b/include/linux/ceph/msgr.h
@@ -152,7 +152,8 @@ struct ceph_msg_header {
 receiver: mask against ~PAGE_MASK */
 
struct ceph_entity_name src;
-   __le32 reserved;
+   __le16 compat_version;
+   __le16 reserved;
__le32 crc;   /* header crc32c */
 } __attribute__ ((packed));
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] ceph: include osd epoch barrier in debugfs

2014-11-04 Thread John Spray
This is useful in our automated testing, so that
we can verify that the barrier is propagating
correctly between servers and clients.

Signed-off-by: John Spray john.sp...@redhat.com
---
 fs/ceph/debugfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 5d5a4c8..60db629 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -174,6 +174,9 @@ static int mds_sessions_show(struct seq_file *s, void *ptr)
/* The -o name mount argument */
seq_printf(s, name \%s\\n, opt-name ? opt-name : );
 
+   /* The latest OSD epoch barrier known to this client */
+   seq_printf(s, osd_epoch_barrier \%d\\n, mdsc-cap_epoch_barrier);
+
/* The list of MDS session rank+state */
for (mds = 0; mds  mdsc-max_sessions; mds++) {
struct ceph_mds_session *session =
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] ceph: update CAPRELEASE message format

2014-11-04 Thread John Spray
Version 2 includes the new osd epoch barrier
field.

This allows clients to inform servers that their
released caps may not be used until a particular
OSD map epoch.

Signed-off-by: John Spray john.sp...@redhat.com
---
 fs/ceph/mds_client.c | 13 +
 fs/ceph/mds_client.h |  8 ++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index dce7977..3f5bc23 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1508,12 +1508,25 @@ void ceph_send_cap_releases(struct ceph_mds_client 
*mdsc,
struct ceph_mds_session *session)
 {
struct ceph_msg *msg;
+   u32 *cap_barrier;
 
dout(send_cap_releases mds%d\n, session-s_mds);
spin_lock(session-s_cap_lock);
while (!list_empty(session-s_cap_releases_done)) {
msg = list_first_entry(session-s_cap_releases_done,
 struct ceph_msg, list_head);
+
+   BUG_ON(msg-front.iov_len + sizeof(*cap_barrier)  \
+  PAGE_CACHE_SIZE);
+
+   // Append cap_barrier field
+   cap_barrier = msg-front.iov_base + msg-front.iov_len;
+   *cap_barrier = cpu_to_le32(mdsc-cap_epoch_barrier);
+   msg-front.iov_len += sizeof(*cap_barrier);
+
+   msg-hdr.version = cpu_to_le16(2);
+   msg-hdr.compat_version = cpu_to_le16(1);
+
list_del_init(msg-list_head);
spin_unlock(session-s_cap_lock);
msg-hdr.front_len = cpu_to_le32(msg-front.iov_len);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 7b40568..b9412a8 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -92,10 +92,14 @@ struct ceph_mds_reply_info_parsed {
 
 /*
  * cap releases are batched and sent to the MDS en masse.
+ *
+ * Account for per-message overhead of mds_cap_release header
+ * and u32 for osd epoch barrier trailing field.
  */
 #define CEPH_CAPS_PER_RELEASE ((PAGE_CACHE_SIZE -  \
-   sizeof(struct ceph_mds_cap_release)) /  \
-  sizeof(struct ceph_mds_cap_item))
+   sizeof(struct ceph_mds_cap_release) -   \
+   sizeof(u32)) /  \
+   sizeof(struct ceph_mds_cap_item))
 
 
 /*
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] ceph: add ceph_osdc_cancel_writes

2014-11-04 Thread John Spray
To allow us to abort writes in ENOSPC conditions, instead
of having them block indefinitely.

Signed-off-by: John Spray john.sp...@redhat.com
---
 include/linux/ceph/osd_client.h |  8 +
 net/ceph/osd_client.c   | 67 +
 2 files changed, 75 insertions(+)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 7cb5cea..f82000c 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -21,6 +21,7 @@ struct ceph_authorizer;
 /*
  * completion callback for async writepages
  */
+typedef void (*ceph_osdc_full_callback_t)(struct ceph_osd_client *, void *);
 typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *,
 struct ceph_msg *);
 typedef void (*ceph_osdc_unsafe_callback_t)(struct ceph_osd_request *, bool);
@@ -226,6 +227,9 @@ struct ceph_osd_client {
u64 event_count;
 
struct workqueue_struct *notify_wq;
+
+ceph_osdc_full_callback_t map_cb;
+void *map_p;
 };
 
 extern int ceph_osdc_setup(void);
@@ -331,6 +335,7 @@ extern void ceph_osdc_put_request(struct ceph_osd_request 
*req);
 extern int ceph_osdc_start_request(struct ceph_osd_client *osdc,
   struct ceph_osd_request *req,
   bool nofail);
+extern u32 ceph_osdc_cancel_writes(struct ceph_osd_client *osdc, int r);
 extern void ceph_osdc_cancel_request(struct ceph_osd_request *req);
 extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
  struct ceph_osd_request *req);
@@ -361,5 +366,8 @@ extern int ceph_osdc_create_event(struct ceph_osd_client 
*osdc,
  void *data, struct ceph_osd_event **pevent);
 extern void ceph_osdc_cancel_event(struct ceph_osd_event *event);
 extern void ceph_osdc_put_event(struct ceph_osd_event *event);
+
+extern void ceph_osdc_register_map_cb(struct ceph_osd_client *osdc,
+ ceph_osdc_full_callback_t cb, void *data);
 #endif
 
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 75ab07c..eb7e735 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -836,6 +836,59 @@ __lookup_request_ge(struct ceph_osd_client *osdc,
return NULL;
 }
 
+/*
+ * Drop all pending write/modify requests and complete
+ * them with the `r` as return code.
+ *
+ * Returns the highest OSD map epoch of a request that was
+ * cancelled, or 0 if none were cancelled.
+ */
+u32 ceph_osdc_cancel_writes(
+struct ceph_osd_client *osdc,
+int r)
+{
+struct ceph_osd_request *req;
+struct rb_node *n = osdc-requests.rb_node;
+u32 latest_epoch = 0;
+
+   dout(enter cancel_writes r=%d, r);
+
+mutex_lock(osdc-request_mutex);
+
+while (n) {
+req = rb_entry(n, struct ceph_osd_request, r_node);
+n = rb_next(n);
+
+ceph_osdc_get_request(req);
+if (req-r_flags  CEPH_OSD_FLAG_WRITE) {
+req-r_result = r;
+complete_all(req-r_completion);
+complete_all(req-r_safe_completion);
+
+if (req-r_callback) {
+// Requires callbacks used for write ops are 
+// amenable to being called with NULL msg
+// (e.g. writepages_finish)
+req-r_callback(req, NULL);
+}
+
+__unregister_request(osdc, req);
+
+if (*req-r_request_osdmap_epoch  latest_epoch) {
+latest_epoch = *req-r_request_osdmap_epoch;
+}
+}
+ceph_osdc_put_request(req);
+}
+
+mutex_unlock(osdc-request_mutex);
+
+   dout(complete cancel_writes latest_epoch=%ul, latest_epoch);
+
+return latest_epoch;
+}
+EXPORT_SYMBOL(ceph_osdc_cancel_writes);
+
 static void __kick_linger_request(struct ceph_osd_request *req)
 {
struct ceph_osd_client *osdc = req-r_osdc;
@@ -2102,6 +2155,10 @@ done:
downgrade_write(osdc-map_sem);
ceph_monc_got_osdmap(osdc-client-monc, osdc-osdmap-epoch);
 
+   if (osdc-map_cb) {
+   osdc-map_cb(osdc, osdc-map_p);
+   }
+
/*
 * subscribe to subsequent osdmap updates if full to ensure
 * we find out when we are no longer full and stop returning
@@ -2125,6 +2182,14 @@ bad:
up_write(osdc-map_sem);
 }
 
+void ceph_osdc_register_map_cb(struct ceph_osd_client *osdc,
+ceph_osdc_full_callback_t cb, void *data)
+{
+osdc-map_cb = cb;
+osdc-map_p = data;
+}
+EXPORT_SYMBOL(ceph_osdc_register_map_cb);
+
 /*
  * watch/notify callback event infrastructure
  *
@@ -2553,6 +2618,8 @@ int ceph_osdc_init(struct ceph_osd_client *osdc, struct 
ceph_client *client)
spin_lock_init(osdc-event_lock);
osdc-event_tree = RB_ROOT;
osdc-event_count = 0;
+   osdc-map_cb = NULL;
+   osdc-map_p = NULL;
 
schedule_delayed_work(osdc-osds_timeout_work,
 

[PATCH 4/7] ceph: handle full condition by cancelling ops

2014-11-04 Thread John Spray
While cancelling, we store the OSD epoch at time
of cancellation, this will be used later in
CAPRELEASE messages.

Signed-off-by: John Spray john.sp...@redhat.com
---
 fs/ceph/mds_client.c | 21 +
 fs/ceph/mds_client.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 9f00853..dce7977 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3265,6 +3265,23 @@ static void delayed_work(struct work_struct *work)
schedule_delayed(mdsc);
 }
 
+/**
+ * Call this with map_sem held for read
+ */
+static void handle_osd_map(struct ceph_osd_client *osdc, void *p)
+{
+   struct ceph_mds_client *mdsc = (struct ceph_mds_client*)p;
+   u32 cancelled_epoch = 0;
+
+   if (osdc-osdmap-flags  CEPH_OSDMAP_FULL) {
+   cancelled_epoch = ceph_osdc_cancel_writes(osdc, -ENOSPC);
+   if (cancelled_epoch) {
+   mdsc-cap_epoch_barrier = max(cancelled_epoch + 1,
+ mdsc-cap_epoch_barrier);
+   }
+   }
+}
+
 int ceph_mdsc_init(struct ceph_fs_client *fsc)
 
 {
@@ -3311,6 +3328,10 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 
ceph_caps_init(mdsc);
ceph_adjust_min_caps(mdsc, fsc-min_caps);
+   mdsc-cap_epoch_barrier = 0;
+
+   ceph_osdc_register_map_cb(fsc-client-osdc,
+ handle_osd_map, (void*)mdsc);
 
return 0;
 }
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 230bda7..7b40568 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -298,6 +298,7 @@ struct ceph_mds_client {
int   num_cap_flushing; /* # caps we are flushing */
spinlock_tcap_dirty_lock;   /* protects above items */
wait_queue_head_t cap_flushing_wq;
+   u32   cap_epoch_barrier;
 
/*
 * Cap reservations
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] ceph: add ceph_osdc_cancel_writes

2014-11-04 Thread Ilya Dryomov
On Tue, Nov 4, 2014 at 5:34 PM, John Spray john.sp...@redhat.com wrote:
 To allow us to abort writes in ENOSPC conditions, instead
 of having them block indefinitely.

I just saw the word cancel, and as we've had trouble in this area in
libceph in the past, a couple nit-pickings.

First, in my mind at least, cancel is sort of reserved for the get
rid of this request and don't call any completions or callbacks kind
of thing - see ceph_osdc_cancel_request().  Here you do both of those,
so maybe rename to something like ceph_osdc_complete_writes() to keep
the distinction?

Second, you should go through Documentation/CodingStyle in the kernel
tree.  Mostly indentation is what's wrong, also don't indent dout()s,
drop braces around single-statement if and the two parameters of
ceph_osdc_cancel_writes() will fit on a single line.

Third, this patch should have a libceph: prefix.

Finally, this patch seems to also introduce a concept of a osdmap
callback.  Either it should be a separate libceph patch or you should
mention this somewhere in the description.


 Signed-off-by: John Spray john.sp...@redhat.com
 ---
  include/linux/ceph/osd_client.h |  8 +
  net/ceph/osd_client.c   | 67 
 +
  2 files changed, 75 insertions(+)

 diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
 index 7cb5cea..f82000c 100644
 --- a/include/linux/ceph/osd_client.h
 +++ b/include/linux/ceph/osd_client.h
 @@ -21,6 +21,7 @@ struct ceph_authorizer;
  /*
   * completion callback for async writepages
   */
 +typedef void (*ceph_osdc_full_callback_t)(struct ceph_osd_client *, void *);
  typedef void (*ceph_osdc_callback_t)(struct ceph_osd_request *,
  struct ceph_msg *);
  typedef void (*ceph_osdc_unsafe_callback_t)(struct ceph_osd_request *, bool);
 @@ -226,6 +227,9 @@ struct ceph_osd_client {
 u64 event_count;

 struct workqueue_struct *notify_wq;
 +
 +ceph_osdc_full_callback_t map_cb;
 +void *map_p;
  };

  extern int ceph_osdc_setup(void);
 @@ -331,6 +335,7 @@ extern void ceph_osdc_put_request(struct ceph_osd_request 
 *req);
  extern int ceph_osdc_start_request(struct ceph_osd_client *osdc,
struct ceph_osd_request *req,
bool nofail);
 +extern u32 ceph_osdc_cancel_writes(struct ceph_osd_client *osdc, int r);
  extern void ceph_osdc_cancel_request(struct ceph_osd_request *req);
  extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc,
   struct ceph_osd_request *req);
 @@ -361,5 +366,8 @@ extern int ceph_osdc_create_event(struct ceph_osd_client 
 *osdc,
   void *data, struct ceph_osd_event **pevent);
  extern void ceph_osdc_cancel_event(struct ceph_osd_event *event);
  extern void ceph_osdc_put_event(struct ceph_osd_event *event);
 +
 +extern void ceph_osdc_register_map_cb(struct ceph_osd_client *osdc,
 + ceph_osdc_full_callback_t cb, void *data);
  #endif

 diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
 index 75ab07c..eb7e735 100644
 --- a/net/ceph/osd_client.c
 +++ b/net/ceph/osd_client.c
 @@ -836,6 +836,59 @@ __lookup_request_ge(struct ceph_osd_client *osdc,
 return NULL;
  }

 +/*
 + * Drop all pending write/modify requests and complete
 + * them with the `r` as return code.
 + *
 + * Returns the highest OSD map epoch of a request that was
 + * cancelled, or 0 if none were cancelled.
 + */
 +u32 ceph_osdc_cancel_writes(
 +struct ceph_osd_client *osdc,
 +int r)
 +{
 +struct ceph_osd_request *req;
 +struct rb_node *n = osdc-requests.rb_node;
 +u32 latest_epoch = 0;
 +
 +   dout(enter cancel_writes r=%d, r);
 +
 +mutex_lock(osdc-request_mutex);
 +
 +while (n) {
 +req = rb_entry(n, struct ceph_osd_request, r_node);
 +n = rb_next(n);
 +
 +ceph_osdc_get_request(req);
 +if (req-r_flags  CEPH_OSD_FLAG_WRITE) {

req-r_flags  CEPH_OSD_FLAG_WRITE ?

 +req-r_result = r;
 +complete_all(req-r_completion);
 +complete_all(req-r_safe_completion);
 +
 +if (req-r_callback) {
 +// Requires callbacks used for write ops are
 +// amenable to being called with NULL msg
 +// (e.g. writepages_finish)
 +req-r_callback(req, NULL);
 +}
 +
 +__unregister_request(osdc, req);
 +
 +if (*req-r_request_osdmap_epoch  latest_epoch) {
 +latest_epoch = *req-r_request_osdmap_epoch;
 +}
 +}
 +ceph_osdc_put_request(req);
 +}
 +
 +mutex_unlock(osdc-request_mutex);
 +
 +   dout(complete cancel_writes latest_epoch=%ul, latest_epoch);
 +
 +return latest_epoch;
 +}
 +EXPORT_SYMBOL(ceph_osdc_cancel_writes);
 +
  static void 

Re: 11/4/2014 Weekly Ceph Performance Meeting IS ON!

2014-11-04 Thread Alexandre DERUMIER
Hi Mark,

Is it today ? (11/4/2014 from you mail subject)

or tommorow (2014-11-05 in etherpad)


- Mail original - 

De: Mark Nelson mark.nel...@inktank.com 
À: ceph-devel@vger.kernel.org 
Envoyé: Mardi 4 Novembre 2014 16:16:32 
Objet: 11/4/2014 Weekly Ceph Performance Meeting IS ON! 

Hi All, 

8AM PST as usual! We are going to try a little experiment and leave the 
Agenda blank for you guys to fill in. If you are planning on attending 
and have something you want to discuss, please add it! 

We've also added a projects and backlog section at the top of the 
etherpad for on-going efforts and the folks that are working on them. 
For any projects or names of people we've missed, please update! 

Here's the links: 

Etherpad URL: 
http://pad.ceph.com/p/performance_weekly 

To join the Meeting: 
https://bluejeans.com/268261044 

To join via Browser: 
https://bluejeans.com/268261044/browser 

To join with Lync: 
https://bluejeans.com/268261044/lync 


To join via Room System: 
Video Conferencing System: bjn.vc -or- 199.48.152.152 
Meeting ID: 268261044 

To join via Phone: 
1) Dial: 
+1 408 740 7256 
+1 888 240 2560(US Toll Free) 
+1 408 317 9253(Alternate Number) 
(see all numbers - http://bluejeans.com/numbers) 
2) Enter Conference ID: 268261044 

Mark 
-- 
To unsubscribe from this list: send the line unsubscribe ceph-devel in 
the body of a message to majord...@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 11/4/2014 Weekly Ceph Performance Meeting IS ON!

2014-11-04 Thread Mark Nelson

Hi Alex,

Doh!  Sorry everyone, Alex is right!  11/5/2014!

I will resend and fix.

Mark

On 11/04/2014 09:21 AM, Alexandre DERUMIER wrote:

Hi Mark,

Is it today ? (11/4/2014 from you mail subject)

or tommorow (2014-11-05 in etherpad)


- Mail original -

De: Mark Nelson mark.nel...@inktank.com
À: ceph-devel@vger.kernel.org
Envoyé: Mardi 4 Novembre 2014 16:16:32
Objet: 11/4/2014 Weekly Ceph Performance Meeting IS ON!

Hi All,

8AM PST as usual! We are going to try a little experiment and leave the
Agenda blank for you guys to fill in. If you are planning on attending
and have something you want to discuss, please add it!

We've also added a projects and backlog section at the top of the
etherpad for on-going efforts and the folks that are working on them.
For any projects or names of people we've missed, please update!

Here's the links:

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FIXED: 11/5/2014 Weekly Ceph Performance Meeting IS ON!

2014-11-04 Thread Mark Nelson

Hi All,

Got the date wrong the first time.  11/5/2014 is correct!  8AM PST as 
usual.  We are going to try a little experiment and leave the Agenda 
blank for you guys to fill in. If you are planning on attending and have 
something you want to discuss, please add it!


We've also added a projects and backlog section at the top of the 
etherpad for on-going efforts and the folks that are working on them. 
For any projects or names of people we've missed, please update!


Here's the links:

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
  +1 408 740 7256
  +1 888 240 2560(US Toll Free)
  +1 408 317 9253(Alternate Number)
  (see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] cephfs survey results

2014-11-04 Thread Mariusz Gronczewski
On Tue, 4 Nov 2014 10:36:07 +1100, Blair Bethwaite
blair.bethwa...@gmail.com wrote:

 
 TBH I'm a bit surprised by a couple of these and hope maybe you guys
 will apply a certain amount of filtering on this...
 
 fsck and quotas were there for me, but multimds and snapshots are what
 I'd consider icing features - they're nice to have but not on the
 critical path to using cephfs instead of e.g. nfs in a production
 setting. I'd have thought stuff like small file performance and
 gateway support was much more relevant to uptake and
 positive/pain-free UX. Interested to hear others rationale here.
 

Those are related; if small file performance will be enough for one
MDS to handle high load with a lot of small files (typical case of
webserver), having multiple acive MDS will be less of a priority;

And if someone currently have OSD on bunch of relatively weak nodes,
again, having active-active setup with MDS will be more interesting to
him than someone that can just buy new fast machine for it.


-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com
mailto:mariusz.gronczew...@efigence.com


signature.asc
Description: PGP signature


Re: [ceph-users] cephfs survey results

2014-11-04 Thread Scottix
Agreed Multi-MDS is a nice to have but not required for full production use.
TBH stability and recovery will win any IT person dealing with filesystems.

On Tue, Nov 4, 2014 at 7:33 AM, Mariusz Gronczewski
mariusz.gronczew...@efigence.com wrote:
 On Tue, 4 Nov 2014 10:36:07 +1100, Blair Bethwaite
 blair.bethwa...@gmail.com wrote:


 TBH I'm a bit surprised by a couple of these and hope maybe you guys
 will apply a certain amount of filtering on this...

 fsck and quotas were there for me, but multimds and snapshots are what
 I'd consider icing features - they're nice to have but not on the
 critical path to using cephfs instead of e.g. nfs in a production
 setting. I'd have thought stuff like small file performance and
 gateway support was much more relevant to uptake and
 positive/pain-free UX. Interested to hear others rationale here.


 Those are related; if small file performance will be enough for one
 MDS to handle high load with a lot of small files (typical case of
 webserver), having multiple acive MDS will be less of a priority;

 And if someone currently have OSD on bunch of relatively weak nodes,
 again, having active-active setup with MDS will be more interesting to
 him than someone that can just buy new fast machine for it.


 --
 Mariusz Gronczewski, Administrator

 Efigence S. A.
 ul. Wołoska 9a, 02-583 Warszawa
 T: [+48] 22 380 13 13
 F: [+48] 22 380 13 14
 E: mariusz.gronczew...@efigence.com
 mailto:mariusz.gronczew...@efigence.com

 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Follow Me: @Scottix
http://about.me/scottix
scot...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cephfs survey results

2014-11-04 Thread Patrick Hahn
On Mon, Nov 3, 2014 at 6:36 PM, Blair Bethwaite
blair.bethwa...@gmail.com wrote:
 On 4 November 2014 01:50, Sage Weil s...@newdream.net wrote:
 In the Ceph session at the OpenStack summit someone asked what the CephFS
 survey results looked like.

 Thanks Sage, that was me!

  Here's the link:

 https://www.surveymonkey.com/results/SM-L5JV7WXL/

 In short, people want

 fsck
 multimds
 snapshots
 quotas

 TBH I'm a bit surprised by a couple of these and hope maybe you guys
 will apply a certain amount of filtering on this...

 fsck and quotas were there for me, but multimds and snapshots are what
 I'd consider icing features - they're nice to have but not on the
 critical path to using cephfs instead of e.g. nfs in a production
 setting. I'd have thought stuff like small file performance and
 gateway support was much more relevant to uptake and
 positive/pain-free UX. Interested to hear others rationale here.

For the use case we're looking at cephfs for at $dayjob we really need
snapshots. I think anyone building a cheap-and-deep cluster for
archival storage would like to be more than one errant rm -rf away
from a *very* long weekend.

Thanks,
-- 
Patrick Hahn
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cephfs survey results

2014-11-04 Thread Mark Kirkwood

On 04/11/14 22:02, Sage Weil wrote:

On Tue, 4 Nov 2014, Blair Bethwaite wrote:

On 4 November 2014 01:50, Sage Weil s...@newdream.net wrote:

In the Ceph session at the OpenStack summit someone asked what the CephFS
survey results looked like.


Thanks Sage, that was me!


  Here's the link:

 https://www.surveymonkey.com/results/SM-L5JV7WXL/

In short, people want

fsck
multimds
snapshots
quotas


TBH I'm a bit surprised by a couple of these and hope maybe you guys
will apply a certain amount of filtering on this...

fsck and quotas were there for me, but multimds and snapshots are what
I'd consider icing features - they're nice to have but not on the
critical path to using cephfs instead of e.g. nfs in a production
setting. I'd have thought stuff like small file performance and
gateway support was much more relevant to uptake and
positive/pain-free UX. Interested to hear others rationale here.


Yeah, I agree, and am taking the results with a grain of salt.  I
think the results are heavily influenced by the order they were
originally listed (I whish surveymonkey would randomize is for each
person or something).

fsck is a clear #1.  Everybody wants multimds, but I think very few
actually need it at this point.  We'll be merging a soft quota patch
shortly, and things like performance (adding the inline data support to
the kernel client, for instance) will probably compete with getting
snapshots working (as part of a larger subvolume infrastructure).  That's
my guess at least; for now, we're really focused on fsck and hard
usability edges and haven't set priorities beyond that.

We're definitely interested in hearing feedback on this strategy, and on
peoples' experiences with giant so far...



Heh, not necessarily - I put multi mds in there, as we want the cephfs 
part to be of similar to the rest of ceph in its availability.


Maybe its because we are looking at plugging it in with an Openstack 
setup and for that you want everything to 'just look after itself'. If 
on the other hand we were wanting merely an nfs replacement, then sure 
multi mds not so important there.


regards

Mark

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Consul

2014-11-04 Thread Loic Dachary
Hi Ceph,

While at the OpenStack summit Dan Bode spoke highly of Consul ( 
https://consul.io/intro/index.html ). Its scope is new to me. Each individual 
feature is familiar but I'm not entirely sure if combining them into a single 
software is necessary. And I wonder how it could relate to Ceph. It is entirely 
possible that it does not even make sense to ask theses questions ;-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


ceph.devel

2014-11-04 Thread jianpeng
subject all ceph-dev maillist

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Consul

2014-11-04 Thread Hunter Nield
I concur with Dan on Consul. It's a great tool.

We use Consul in our Ceph environments but only as a layer above an
installed Ceph installation. Health checks (for the mons/osds
processes and ceph health) and service discovery (for the
apps/services that run in Docker containers on top). We've started on
an alerting tool if anyone has use for it -
https://github.com/AcalephStorage/consul-alerts

There is definitely some overlap on the cluster consensus side (Paxos
vs Raft) and would be nice to reduce another moving part in our
cluster but I would imagine the projects are too different internally
to really combine the two of them.

The one thing that we'd wished for in Ceph before Consul existed was
an easily accessible distributed KV store. Ceph has parts of it but
exposing something like that with an easy CLI/REST API might provide
the primitives for building higher level functionality that Consul
provides. More than likely a distraction though since Consul does such
a good job now.

On a side note, I haven't spoken to Dan in a while but curious on his
thoughts on the overlap on Consul in config management land. Service
discovery, remote execution, etc have some overlap in Puppet, Chef,
etc. Related to Ceph we're pondering it as alternative for deploying
mons/osds (larger scale ceph-deploy perhaps)

On Wed, Nov 5, 2014 at 8:38 AM, Loic Dachary l...@dachary.org wrote:
 Hi Ceph,

 While at the OpenStack summit Dan Bode spoke highly of Consul ( 
 https://consul.io/intro/index.html ). Its scope is new to me. Each individual 
 feature is familiar but I'm not entirely sure if combining them into a single 
 software is necessary. And I wonder how it could relate to Ceph. It is 
 entirely possible that it does not even make sense to ask theses questions ;-)

 Cheers

 --
 Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about Transaction::get_data_alignment

2014-11-04 Thread Dong Yuan
Hi Sage,

I am now working with the BP osd: update Transaction encoding, but the
Transaction::get_data_alignment make me confused.

This method give the alignment which is used by FileJournal to do
better buffer build. It calculate the alignment by largest_data_off
and get_data_offset() while the first is an offset of some object and
the second is an offset of the transaction encode result. I am not
sure there is any reason to do calculation between them.

The code works fine, probably because any result is fine for
Transaction::get_data_alignment, while
FileJournal::prepare_single_write can use any alignment value to build
logical bufferlist.

Can you give me some explanation?  Thank you.


-- 
Dong Yuan
UnitedStack
Room 302, Block C, Building. 4, Zongguancun Software Park, 100193
Haidian Dist. Beijing. China
yuand...@unitedstack.com


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html