Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-11-30 Thread Raghavendra Gowdappa
adding gluster-users and glusterdevel as the discussion has some generic
points

+Gluster-users  +Gluster Devel


On Mon, Mar 4, 2019 at 11:43 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Mar 4, 2019 at 11:26 PM Yaniv Kaul  wrote:
>
>> Is it that busy that it cannot reply for so many seconds to a simple
>> ping? That starved of CPU resources / threads, or are locks taken?
>>
>
While I had answered giving some specific examples, my previous answer
missed some general observations:
ping timer is designed with following goals:

1. identifying ungraceful shutdown of bricks (hard power shutdown, pulling
of n/w cable etc)
2. identifying slow bricks and avoiding traffic to them (If connection is
shutdown, clients will not use it till reconnection happens)

When we consider point 2, ping packet is not a special packet. Instead it
experiences same delays as any other regular fs traffic. So, the question
asked above doesn't apply (as ping is not treated "specially" wrt fs
traffic, if the brick is overloaded processing fs traffic, it is overloaded
to respond a "simple ping" too)

While the implementation/design neatly caters to the first goal, second
goal is not completely met and sort of partly implemented/designed. If goal
2 is to be completely fulfilled, I can think of at least 2 things to be
done currently:

1. reconnect logic in client should make intelligent decision whether to
reconnect to an overloaded brick and I guess identifying whether the brick
is no longer overloaded is not a trivial task
2. As a "fix" to ping-timer expiries, we introduced a "request queue" to
glusterfs program. Event threads, which read requests from network will
only queue them to this "request queue" and processing of them is done in
different threads. This (at least in theory) makes event-threads to read
and respond ping packets faster. However, this means ping responses are no
longer indicative of "load" on brick. With hindsight, to me this fix is
more symptomatic (users no longer see ENOTCONN errors - which pushes the
responsibility out of module handling "ping" :), but they might experience
slow performance) than address the underlying root cause of what made
bricks unresponsive at first. For eg., Manoj suggested suboptimal
event-threads might be one reason and in fact that was a correct
observation (at least for some setups).


> It looks to be the case. I remember following scenarios from various
> users/customers:
>
> * One customer implemented the suggestion of increasing the number of
> event-threads (that read msgs from network) and reported back it fixed
> ping-timer-expiry issues
> * Once Pranith and Krutika identified that an xattrop (basically a
> getxattr and setxattr done atomically holding a global inode lock) took
> more than 42 seconds (default ping timeout value) - identified through
> strace output - ping timer expiries were co-related at around the same
> time. In this case, event-threads were contending on the same lock and
> hence were blocked from reading further messages from network (that
> included ping requests). This issue (of global critical sections) are fixed
> currently.
> * Preventing event-threads from executing any code from glusterfs brick
> stack seem to have solved the issue. This code is present in master and is
> targeted for 3.4.4
> * I've heard reports saying when rebalance/quota heal is in progress ping
> timer expiries are more likely to occur.
>
>> Y.
>>
>> On Mon, Mar 4, 2019, 7:47 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>> +Gluster Devel , +Gluster-users
>>> 
>>>
>>> I would like to point out another issue. Even if what I suggested
>>> prevents disconnects, part of the solution would be only symptomatic
>>> treatment and doesn't address the root cause of the problem. In most of the
>>> ping-timer-expiry issues, the root cause is the increased load on bricks
>>> and the inability of bricks to be responsive under high load. So, the
>>> actual solution would be doing any or both of the following:
>>> * identify the source of increased load and if possible throttle it.
>>> Internal heal processes like self-heal, rebalance, quota heal are known to
>>> pump traffic into bricks without much throttling (io-threads _might_ do
>>> some throttling, but my understanding is its not sufficient).
>>> * identify the reason for bricks to become unresponsive during load.
>>> This may be fixable issues like not enough event-threads to read from
>>> network or difficult to fix issues like fsync on backend fs freezing the
>>> process or semi fixable issues (in code) like lock contention.
>>>
>>> So any genuine effort to fix ping-time

Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

2019-05-08 Thread Raghavendra Gowdappa
Thanks!!

On Thu, May 9, 2019 at 8:34 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> Ok, It is posted to https://review.gluster.org/#/c/glusterfs/+/22687/
>
>
>
>
>
>
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Wednesday, May 08, 2019 7:35 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* Amar Tumballi Suryanarayan ;
> gluster-devel@gluster.org
> *Subject:* Re: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
>
>
>
>
> On Wed, May 8, 2019 at 1:29 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi 'Milind Changire' ,
>
> The leak is getting more and more clear to me now. the unsolved memory
> leak is because of in gluterfs version 3.12.15 (in my env)the ssl context
> is a shared one, while we do ssl_acept, ssl will allocate some read/write
> buffer to ssl object, however, ssl_free in socket_reset or fini function of
> socket.c, the buffer is returened back to ssl context free list instead of
> completely freed.
>
>
>
> Thanks Cynthia for your efforts in identifying and fixing the leak. If you
> post a patch to gerrit, I'll be happy to merge it and get the fix into the
> codebase.
>
>
>
>
>
> So following patch is able to fix the memory leak issue
> completely.(created for gluster master branch)
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
> +++ b/rpc/rpc-transport/socket/src/socket.c
> @@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
>  gf_log(this->name, GF_LOG_DEBUG,
> "SSL verification succeeded (client: %s) (server: %s)",
> this->peerinfo.identifier, this->myinfo.identifier);
> +X509_free(peer);
>  return gf_strdup(peer_CN);
>
>  /* Error paths. */
> @@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
>  memset(>incoming, 0, sizeof(priv->incoming));
>
>  event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
> -
> +if(priv->use_ssl&& priv->ssl_ssl)
> +{
> +  gf_log(this->name, GF_LOG_TRACE,
> + "clear and reset for socket(%d), free ssl ",
> + priv->sock);
> +   if(priv->ssl_ctx)
> + {
> +   SSL_CTX_free(priv->ssl_ctx);
> +   priv->ssl_ctx = NULL;
> + }
> +  SSL_shutdown(priv->ssl_ssl);
> +  SSL_clear(priv->ssl_ssl);
> +  SSL_free(priv->ssl_ssl);
> +  priv->ssl_ssl = NULL;
> +}
>  priv->sock = -1;
>  priv->idx = -1;
>  priv->connected = -1;
> @@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
>  pthread_mutex_destroy(>out_lock);
>  pthread_mutex_destroy(>cond_lock);
>  pthread_cond_destroy(>cond);
> +   if(priv->use_ssl&& priv->ssl_ssl)
> +   {
> + gf_log(this->name, GF_LOG_TRACE,
> +"clear and reset for socket(%d), free ssl
> ",
> +priv->sock);
> + if(priv->ssl_ctx)
> + {
> +   SSL_CTX_free(priv->ssl_ctx);
> +   priv->ssl_ctx = NULL;
> + }
> + SSL_shutdown(priv->ssl_ssl);
> + SSL_clear(priv->ssl_ssl);
> + SSL_free(priv->ssl_ssl);
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Monday, May 06, 2019 2:12 PM
> *To:* 'Amar Tumballi Suryanarayan' 
> *Cc:* 'Milind Changire' ; 'gluster-devel@gluster.org'
> 
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Hi,
>
> From our test valgrind and libleak all blame ssl3_accept
>
> ///from valgrind attached to
> glusterfds///
>
> ==16673== 198,720 bytes in 12 blocks are definitely lost in loss record
> 1,114 of 1,123
> ==16673==at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
> ==16673==by 0x63E1977: CRYPTO_malloc (in /usr/lib64/
> *libcrypto.so.1.0.2p*)
> ==16673==by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA8485D9: ssl3_accept (in /usr/lib64/*libssl.so.1.0.2p*)
> ==16673==by 0xA610DDF: ssl_complete_connection (socket.c:400)
> ==16673==by 0xA617F38: ssl_handle_server_connection_attempt
> (socket.c:2409)
> ==16673==by 0xA61842

Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

2019-05-08 Thread Raghavendra Gowdappa
On Wed, May 8, 2019 at 1:29 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi 'Milind Changire' ,
>
> The leak is getting more and more clear to me now. the unsolved memory
> leak is because of in gluterfs version 3.12.15 (in my env)the ssl context
> is a shared one, while we do ssl_acept, ssl will allocate some read/write
> buffer to ssl object, however, ssl_free in socket_reset or fini function of
> socket.c, the buffer is returened back to ssl context free list instead of
> completely freed.
>

Thanks Cynthia for your efforts in identifying and fixing the leak. If you
post a patch to gerrit, I'll be happy to merge it and get the fix into the
codebase.


>
> So following patch is able to fix the memory leak issue
> completely.(created for gluster master branch)
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
> +++ b/rpc/rpc-transport/socket/src/socket.c
> @@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
>  gf_log(this->name, GF_LOG_DEBUG,
> "SSL verification succeeded (client: %s) (server: %s)",
> this->peerinfo.identifier, this->myinfo.identifier);
> +X509_free(peer);
>  return gf_strdup(peer_CN);
>
>  /* Error paths. */
> @@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
>  memset(>incoming, 0, sizeof(priv->incoming));
>
>  event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
> -
> +if(priv->use_ssl&& priv->ssl_ssl)
> +{
> +  gf_log(this->name, GF_LOG_TRACE,
> + "clear and reset for socket(%d), free ssl ",
> + priv->sock);
> +   if(priv->ssl_ctx)
> + {
> +   SSL_CTX_free(priv->ssl_ctx);
> +   priv->ssl_ctx = NULL;
> + }
> +  SSL_shutdown(priv->ssl_ssl);
> +  SSL_clear(priv->ssl_ssl);
> +  SSL_free(priv->ssl_ssl);
> +  priv->ssl_ssl = NULL;
> +}
>  priv->sock = -1;
>  priv->idx = -1;
>  priv->connected = -1;
> @@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
>  pthread_mutex_destroy(>out_lock);
>  pthread_mutex_destroy(>cond_lock);
>  pthread_cond_destroy(>cond);
> +   if(priv->use_ssl&& priv->ssl_ssl)
> +   {
> + gf_log(this->name, GF_LOG_TRACE,
> +"clear and reset for socket(%d), free ssl
> ",
> +priv->sock);
> + if(priv->ssl_ctx)
> + {
> +   SSL_CTX_free(priv->ssl_ctx);
> +   priv->ssl_ctx = NULL;
> + }
> + SSL_shutdown(priv->ssl_ssl);
> + SSL_clear(priv->ssl_ssl);
> + SSL_free(priv->ssl_ssl);
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Monday, May 06, 2019 2:12 PM
> *To:* 'Amar Tumballi Suryanarayan' 
> *Cc:* 'Milind Changire' ; 'gluster-devel@gluster.org'
> 
> *Subject:* RE: [Gluster-devel] glusterfsd memory leak issue found after
> enable ssl
>
>
>
> Hi,
>
> From our test valgrind and libleak all blame ssl3_accept
>
> ///from valgrind attached to
> glusterfds///
>
> ==16673== 198,720 bytes in 12 blocks are definitely lost in loss record
> 1,114 of 1,123
> ==16673==at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
> ==16673==by 0x63E1977: CRYPTO_malloc (in /usr/lib64/
> *libcrypto.so.1.0.2p*)
> ==16673==by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA8485D9: ssl3_accept (in /usr/lib64/*libssl.so.1.0.2p*)
> ==16673==by 0xA610DDF: ssl_complete_connection (socket.c:400)
> ==16673==by 0xA617F38: ssl_handle_server_connection_attempt
> (socket.c:2409)
> ==16673==by 0xA618420: socket_complete_connection (socket.c:2554)
> ==16673==by 0xA618788: socket_event_handler (socket.c:2613)
> ==16673==by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
> ==16673==by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
> ==16673==by 0x615C5D9: start_thread (in /usr/lib64/*libpthread-2.27.so
> *)
> ==16673==
> ==16673== 200,544 bytes in 12 blocks are definitely lost in loss record
> 1,115 of 1,123
> ==16673==at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
> ==16673==by 0x63E1977: CRYPTO_malloc (in /usr/lib64/
> *libcrypto.so.1.0.2p*)
> ==16673==by 0xA855D12: ssl3_setup_read_buffer (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/
> *libssl.so.1.0.2p*)
> ==16673==by 0xA8485D9: ssl3_accept (in /usr/lib64/*libssl.so.1.0.2p*)
> ==16673==by 0xA610DDF: ssl_complete_connection (socket.c:400)
> ==16673==by 0xA617F38: ssl_handle_server_connection_attempt
> (socket.c:2409)
> ==16673==by 0xA618420: 

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-25 Thread Raghavendra Gowdappa
On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> The reason why I move event_handled to the end of socket_event_poll_in is
> because if event_handled is called before rpc_transport_pollin_destroy, it
> allowed another round of event_dispatch_epoll_handler happen before
>  rpc_transport_pollin_destroy, in this way, when the latter poll in goes to
> rpc_transport_pollin_destroy , there is a chance that the pollin->iobref
> has already been destroyed by the first one(there is no lock destroy for
> iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK
> (>lock);”
>

But, priv->incoming.iobref (from which pollin->iobref is initialized from)
is set to NULL in __socket_proto_state_machine:

if (in->record_state == SP_STATE_COMPLETE) {
in->record_state = SP_STATE_NADA;
__socket_reset_priv (priv);
}

And since pollin is an allocated object only one instance of
socket_event_poll_in will be aware of this object. IOW, multiple instances
of socket_event_poll_in will get different pollin objects. So, the only way
pollin->iobref could be shared across multiple invocations of
socket_event_poll_in is due to common shared object priv->incoming.iobref.
But that too is sanitized by the time __socket_proto_state_machine
completes and __socket_proto_state_machine is executed under lock. So, I
don't see how two different concurrent codepaths can get hold of same
iobref.

I find the one of recent patch
>
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
> I think the point is to avoid the concurrent handling of the same socket
> at the same time, but after my test with this patch this problem also
> exists, so I think event_handled is still called too early to allow
> concurrent handling of the same socket happen, and after move it to the end
> of socket_event_poll this glusterd stuck issue disappeared.
>
> cynthia
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 2:36 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Ok, thanks for your comment!
>
>
>
> cynthia
>
>
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 11:52 AM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Cynthia,
>
>
>
> On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> I made a patch and according to my test, this glusterd stuck issue
> disappear with my patch. Only need to move  event_handled to the end of
> socket_event_poll_in function.
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>
>
> -if (notify_handled && (ret != -1))
>
> -event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> -   priv->gen);
>
> @@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>  pthread_mutex_unlock (>notify.lock);
>
>  }
>
> +if (notify_handled && (ret != -1))
>
> +event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> +   priv->gen);
>
>
>
> Thanks for this tip. Though this helps in fixing the hang, this change has
> performance impact. Moving event_handled to end of poll_in means, socket
> will be added back for polling of new events only _after_ the rpc is msg is
> processed by higher layers (like EC) and higher layers can have significant
> latency  for processing the msg. Which means, socket will be out of polling
> for longer periods of time which decreases the throughput (number of msgs
> read per second) affecting performance. However, this experiment definitely
> indicates there is a codepath where event_handled is not called (and hence
> causing the hang). I'll go through this codepath again.
>
>
>
> Can you check whether patch [1] fixes the issue you are seeing?
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/22566
>

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-15 Thread Raghavendra Gowdappa
On Mon, Apr 15, 2019 at 12:52 PM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi,
>
> The reason why I move event_handled to the end of socket_event_poll_in is
> because if event_handled is called before rpc_transport_pollin_destroy, it
> allowed another round of event_dispatch_epoll_handler happen before
>  rpc_transport_pollin_destroy, in this way, when the latter poll in goes to
> rpc_transport_pollin_destroy , there is a chance that the pollin->iobref
> has already been destroyed by the first one(there is no lock destroy for
> iobref->lock in iobref_destroy by the way). That may cause stuck in “LOCK
> (>lock);”
>
> I find the one of recent patch
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
> I think the point is to avoid the concurrent handling of the same socket
> at the same time, but after my test with this patch this problem also
> exists, so I think event_handled is still called too early to allow
> concurrent handling of the same socket happen,
>

But concurrent handling is required for performance. So, we cannot
serialize it.

and after move it to the end of socket_event_poll this glusterd stuck issue
> disappeared.
>

Ideally there shouldn't be a single instance of datastructure that should
be shared between two instances of pollin handling. My initial code-reading
didn't find any issues with the way iobref is handled even when there is
concurrent reading when the previous message was still not notified. I'll
continue to investigate how objects are shared across two instances of
pollin. Will post if I find anything interesting.

cynthia
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 2:36 PM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
>
>
>
>
> On Mon, Apr 15, 2019 at 11:08 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Ok, thanks for your comment!
>
>
>
> cynthia
>
>
>
> *From:* Raghavendra Gowdappa 
> *Sent:* Monday, April 15, 2019 11:52 AM
> *To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
> *Cc:* gluster-devel@gluster.org
> *Subject:* Re: glusterd stuck for glusterfs with version 3.12.15
>
>
>
> Cynthia,
>
>
>
> On Mon, Apr 15, 2019 at 8:10 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.z...@nokia-sbell.com> wrote:
>
> Hi,
>
> I made a patch and according to my test, this glusterd stuck issue
> disappear with my patch. Only need to move  event_handled to the end of
> socket_event_poll_in function.
>
>
>
> --- a/rpc/rpc-transport/socket/src/socket.c
>
> +++ b/rpc/rpc-transport/socket/src/socket.c
>
> @@ -2305,9 +2305,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>
>
> -if (notify_handled && (ret != -1))
>
> -event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> -   priv->gen);
>
> @@ -2330,6 +2327,9 @@ socket_event_poll_in (rpc_transport_t *this,
> gf_boolean_t notify_handled)
>
>  }
>
>  pthread_mutex_unlock (>notify.lock);
>
>  }
>
> +if (notify_handled && (ret != -1))
>
> +event_handled (ctx->event_pool, priv->sock, priv->idx,
>
> +   priv->gen);
>
>
>
> Thanks for this tip. Though this helps in fixing the hang, this change has
> performance impact. Moving event_handled to end of poll_in means, socket
> will be added back for polling of new events only _after_ the rpc is msg is
> processed by higher layers (like EC) and higher layers can have significant
> latency  for processing the msg. Which means, socket will be out of polling
> for longer periods of time which decreases the throughput (number of msgs
> read per second) affecting performance. However, this experiment definitely
> indicates there is a codepath where event_handled is not called (and hence
> causing the hang). I'll go through this codepath again.
>
>
>
> Can you check whether patch [1] fixes the issue you are seeing?
>
>
>
> [1] https://review.gluster.org/#/c/glusterfs/+/22566
>
>
>
>
>
> Thanks for that experiment :).
>
>
>
>  return ret;
>
> }
>
>
>
> cynthia
>
> *From:* Zhou, Cynthia (NSB - CN/Hangzhou)
> *Sent:* Tuesday, April 09, 2019 3:57 PM
> *To:* 'Raghavendra Gowdappa' 
> *Cc:* gluster-devel@gluster.org
> *Subject:* RE: glusterd stuck for gluster

Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-09 Thread Raghavendra Gowdappa
On Mon, Apr 8, 2019 at 7:42 AM Zhou, Cynthia (NSB - CN/Hangzhou) <
cynthia.z...@nokia-sbell.com> wrote:

> Hi glusterfs experts,
>
> Good day!
>
> In my test env, sometimes glusterd stuck issue happened, and it is not
> responding to any gluster commands, when I checked this issue I find that
> glusterd thread 9 and thread 8 is dealing with the same socket, I thought
> following patch should be able to solve this issue, however after I merged
> this patch this issue still exist. When I looked into this code, it seems
> socket_event_poll_in called event_handled before
> rpc_transport_pollin_destroy, I think this gives the chance for another
> poll for the exactly the same socket. And caused this glusterd stuck issue,
> also, I find there is no   LOCK_DESTROY(>lock)
>
> In iobref_destroy, I think it is better to add destroy lock.
>
> Following is the gdb info when this issue happened, I would like to know
> your opinion on this issue, thanks!
>
>
>
> SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
>
>
>
> ** socket: fix issue on concurrent handle of a socket*
>
>
>
>
>
>
>
> *GDB INFO:*
>
> Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in
> iobref_unref, I think
>
> Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from
> /lib64/libpthread.so.0
>
> #2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
>
> #3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0,
> gen=4, idx=27) at socket.c:1201
>
> #4  0x7f9ee4fbf99c in socket_event_handler (*fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0*) at socket.c:2480
>
> #5  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
>
> #6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at
> event-epoll.c:659
>
> #7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
>
>
> (gdb) thread 9
>
> [Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> (gdb) bt
>
> #0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
>
> #1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> #3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy
> (pollin=0x7f9ed00452d0) at rpc-transport.c:123
>
> #4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0,
> notify_handled=_gf_true) at socket.c:2322
>
> #5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4,
> data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
>
> #6  0x7f9eeb2825d4 in event_dispatch_epoll_handler
> (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
>
> #7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at
> event-epoll.c:659
>
> #8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
>
> #9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
>
> (gdb) frame 2
>
> #2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at
> iobuf.c:944
>
> 944 iobuf.c: No such file or directory.
>
> (gdb) print *iobref
>
> $1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222,
> __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
>
> __elision = 0, __list = {__prev = 0x4000, __next =
> 0x7f9ed00063b000}},
>
>   __size =
> "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
> __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced
> = -1, used = -1}
>

looks like the iobref 

Re: [Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-28 Thread Raghavendra Gowdappa
On Thu, Mar 28, 2019 at 2:37 PM Xavi Hernandez  wrote:

> On Thu, Mar 28, 2019 at 3:05 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>>>> wrote:
>>>>
>>>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
>>>>> pkara...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>>>>>>> rgowd...@redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <
>>>>>>>> jaher...@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Raghavendra,
>>>>>>>>>
>>>>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>>>>>>>> rgowd...@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> All,
>>>>>>>>>>
>>>>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the
>>>>>>>>>> client/mount through which those locks are held disconnects from
>>>>>>>>>> bricks/server. This helps Glusterfs to not run into a stale lock 
>>>>>>>>>> problem
>>>>>>>>>> later (For eg., if application unlocks while the connection was still
>>>>>>>>>> down). However, this means the lock is no longer exclusive as other
>>>>>>>>>> applications/clients can acquire the same lock. To communicate that 
>>>>>>>>>> locks
>>>>>>>>>> are no longer valid, we are planning to mark the fd (which has POSIX 
>>>>>>>>>> locks)
>>>>>>>>>> bad on a disconnect so that any future operations on that fd will 
>>>>>>>>>> fail,
>>>>>>>>>> forcing the application to re-open the fd and re-acquire locks it 
>>>>>>>>>> needs [1].
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wouldn't it be better to retake the locks when the brick is
>>>>>>>>> reconnected if the lock is still in use ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> There is also  a possibility that clients may never reconnect.
>>>>>>>> That's the primary reason why bricks assume the worst (client will not
>>>>>>>> reconnect) and cleanup the locks.
>>>>>>>>
>>>>>>>
>>>>>>> True, so it's fine to cleanup the locks. I'm not saying that locks
>>>>>>> shouldn't be released on disconnect. The assumption is that if the 
>>>>>>> client
>>>>>>> has really died, it will also disconnect from other bricks, who will
>>>>>>> release the locks. So, eventually, another client will have enough 
>>>>>>> quorum
>>>>>>> to attempt a lock that will succeed. In other words, if a client gets
>>>>>>> disconnected from too many bricks simultaneously (loses Quorum), then 
>>>>>>> that
>>>>>>> client can be considered as bad and can return errors to the 
>>>>>>> application.
>>>>>>> This should also cause to release the locks on the remaining connected
>>>>>>> bricks.
>>>>>>>
>>>>>>> On the other hand, if the disconnection is very short and the client
>>>>>>> has not died, it will keep enough locked files (it has quorum) to avoid
>>>>>>> other clients to successfully acquire a lock. In this case, if the 
>>>>>>> brick is
>>>>>>> reconnected, all existing locks should be reacquired to recover the
>>>>>>> original stat

Re: [Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 8:38 PM Xavi Hernandez  wrote:

> On Wed, Mar 27, 2019 at 2:20 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 27, 2019 at 6:38 PM Xavi Hernandez 
>> wrote:
>>
>>> On Wed, Mar 27, 2019 at 1:13 PM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez 
>>>> wrote:
>>>>
>>>>> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <
>>>>> rgowd...@redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Raghavendra,
>>>>>>>
>>>>>>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <
>>>>>>> rgowd...@redhat.com> wrote:
>>>>>>>
>>>>>>>> All,
>>>>>>>>
>>>>>>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>>>>>>> through which those locks are held disconnects from bricks/server. This
>>>>>>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>>>>>>> application unlocks while the connection was still down). However, this
>>>>>>>> means the lock is no longer exclusive as other applications/clients can
>>>>>>>> acquire the same lock. To communicate that locks are no longer valid, 
>>>>>>>> we
>>>>>>>> are planning to mark the fd (which has POSIX locks) bad on a 
>>>>>>>> disconnect so
>>>>>>>> that any future operations on that fd will fail, forcing the 
>>>>>>>> application to
>>>>>>>> re-open the fd and re-acquire locks it needs [1].
>>>>>>>>
>>>>>>>
>>>>>>> Wouldn't it be better to retake the locks when the brick is
>>>>>>> reconnected if the lock is still in use ?
>>>>>>>
>>>>>>
>>>>>> There is also  a possibility that clients may never reconnect. That's
>>>>>> the primary reason why bricks assume the worst (client will not 
>>>>>> reconnect)
>>>>>> and cleanup the locks.
>>>>>>
>>>>>
>>>>> True, so it's fine to cleanup the locks. I'm not saying that locks
>>>>> shouldn't be released on disconnect. The assumption is that if the client
>>>>> has really died, it will also disconnect from other bricks, who will
>>>>> release the locks. So, eventually, another client will have enough quorum
>>>>> to attempt a lock that will succeed. In other words, if a client gets
>>>>> disconnected from too many bricks simultaneously (loses Quorum), then that
>>>>> client can be considered as bad and can return errors to the application.
>>>>> This should also cause to release the locks on the remaining connected
>>>>> bricks.
>>>>>
>>>>> On the other hand, if the disconnection is very short and the client
>>>>> has not died, it will keep enough locked files (it has quorum) to avoid
>>>>> other clients to successfully acquire a lock. In this case, if the brick 
>>>>> is
>>>>> reconnected, all existing locks should be reacquired to recover the
>>>>> original state before the disconnection.
>>>>>
>>>>>
>>>>>>
>>>>>>> BTW, the referenced bug is not public. Should we open another bug to
>>>>>>> track this ?
>>>>>>>
>>>>>>
>>>>>> I've just opened up the comment to give enough context. I'll open a
>>>>>> bug upstream too.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Note that with AFR/replicate in picture we can prevent errors to
>>>>>>>> application as long as Quorum number of children "never ever" lost
>>>>>>>> connection with bricks after locks have been acquired. I am using the 
>>>>>>>> term
>>>>>>>> "never

Re: [Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez 
> wrote:
>
>> Hi Raghavendra,
>>
>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
>> wrote:
>>
>>> All,
>>>
>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>>> through which those locks are held disconnects from bricks/server. This
>>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>>> application unlocks while the connection was still down). However, this
>>> means the lock is no longer exclusive as other applications/clients can
>>> acquire the same lock. To communicate that locks are no longer valid, we
>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>>> that any future operations on that fd will fail, forcing the application to
>>> re-open the fd and re-acquire locks it needs [1].
>>>
>>
>> Wouldn't it be better to retake the locks when the brick is reconnected
>> if the lock is still in use ?
>>
>
> There is also  a possibility that clients may never reconnect. That's the
> primary reason why bricks assume the worst (client will not reconnect) and
> cleanup the locks.
>
>
>> BTW, the referenced bug is not public. Should we open another bug to
>> track this ?
>>
>
> I've just opened up the comment to give enough context. I'll open a bug
> upstream too.
>
>
>>
>>
>>>
>>> Note that with AFR/replicate in picture we can prevent errors to
>>> application as long as Quorum number of children "never ever" lost
>>> connection with bricks after locks have been acquired. I am using the term
>>> "never ever" as locks are not healed back after re-connection and hence
>>> first disconnect would've marked the fd bad and the fd remains so even
>>> after re-connection happens. So, its not just Quorum number of children
>>> "currently online", but Quorum number of children "never having
>>> disconnected with bricks after locks are acquired".
>>>
>>
>> I think this requisite is not feasible. In a distributed file system,
>> sooner or later all bricks will be disconnected. It could be because of
>> failures or because an upgrade is done, but it will happen.
>>
>> The difference here is how long are fd's kept open. If applications open
>> and close files frequently enough (i.e. the fd is not kept open more time
>> than it takes to have more than Quorum bricks disconnected) then there's no
>> problem. The problem can only appear on applications that open files for a
>> long time and also use posix locks. In this case, the only good solution I
>> see is to retake the locks on brick reconnection.
>>
>
> Agree. But lock-healing should be done only by HA layers like AFR/EC as
> only they know whether there are enough online bricks to have prevented any
> conflicting lock. Protocol/client itself doesn't have enough information to
> do that. If its a plain distribute, I don't see a way to heal locks without
> loosing the property of exclusivity of locks.
>
> What I proposed is a short term solution. mid to long term solution should
> be lock healing feature implemented in AFR/EC. In fact I had this
> conversation with +Karampuri, Pranith  before
> posting this msg to ML.
>
>
>>
>>> However, this use case is not affected if the application don't acquire
>>> any POSIX locks. So, I am interested in knowing
>>> * whether your use cases use POSIX locks?
>>> * Is it feasible for your application to re-open fds and re-acquire
>>> locks on seeing EBADFD errors?
>>>
>>
>> I think that many applications are not prepared to handle that.
>>
>
> I too suspected that and in fact not too happy with the solution. But went
> ahead with this mail as I heard implementing lock-heal  in AFR will take
> time and hence there are no alternative short term solutions.
>

Also failing loudly is preferred to silently dropping locks.


>
>
>> Xavi
>>
>>
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>>
>>> regards,
>>> Raghavendra
>>>
>>> ___
>>> Gluster-users mailing list
>>> gluster-us...@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] POSIX locks and disconnections between clients and bricks

2019-03-27 Thread Raghavendra Gowdappa
On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez  wrote:

> Hi Raghavendra,
>
> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> Glusterfs cleans up POSIX locks held on an fd when the client/mount
>> through which those locks are held disconnects from bricks/server. This
>> helps Glusterfs to not run into a stale lock problem later (For eg., if
>> application unlocks while the connection was still down). However, this
>> means the lock is no longer exclusive as other applications/clients can
>> acquire the same lock. To communicate that locks are no longer valid, we
>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so
>> that any future operations on that fd will fail, forcing the application to
>> re-open the fd and re-acquire locks it needs [1].
>>
>
> Wouldn't it be better to retake the locks when the brick is reconnected if
> the lock is still in use ?
>

There is also  a possibility that clients may never reconnect. That's the
primary reason why bricks assume the worst (client will not reconnect) and
cleanup the locks.


> BTW, the referenced bug is not public. Should we open another bug to track
> this ?
>

I've just opened up the comment to give enough context. I'll open a bug
upstream too.


>
>
>>
>> Note that with AFR/replicate in picture we can prevent errors to
>> application as long as Quorum number of children "never ever" lost
>> connection with bricks after locks have been acquired. I am using the term
>> "never ever" as locks are not healed back after re-connection and hence
>> first disconnect would've marked the fd bad and the fd remains so even
>> after re-connection happens. So, its not just Quorum number of children
>> "currently online", but Quorum number of children "never having
>> disconnected with bricks after locks are acquired".
>>
>
> I think this requisite is not feasible. In a distributed file system,
> sooner or later all bricks will be disconnected. It could be because of
> failures or because an upgrade is done, but it will happen.
>
> The difference here is how long are fd's kept open. If applications open
> and close files frequently enough (i.e. the fd is not kept open more time
> than it takes to have more than Quorum bricks disconnected) then there's no
> problem. The problem can only appear on applications that open files for a
> long time and also use posix locks. In this case, the only good solution I
> see is to retake the locks on brick reconnection.
>

Agree. But lock-healing should be done only by HA layers like AFR/EC as
only they know whether there are enough online bricks to have prevented any
conflicting lock. Protocol/client itself doesn't have enough information to
do that. If its a plain distribute, I don't see a way to heal locks without
loosing the property of exclusivity of locks.

What I proposed is a short term solution. mid to long term solution should
be lock healing feature implemented in AFR/EC. In fact I had this
conversation with +Karampuri, Pranith  before posting
this msg to ML.


>
>> However, this use case is not affected if the application don't acquire
>> any POSIX locks. So, I am interested in knowing
>> * whether your use cases use POSIX locks?
>> * Is it feasible for your application to re-open fds and re-acquire locks
>> on seeing EBADFD errors?
>>
>
> I think that many applications are not prepared to handle that.
>

I too suspected that and in fact not too happy with the solution. But went
ahead with this mail as I heard implementing lock-heal  in AFR will take
time and hence there are no alternative short term solutions.


> Xavi
>
>
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7
>>
>> regards,
>> Raghavendra
>>
>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] POSIX locks and disconnections between clients and bricks

2019-03-26 Thread Raghavendra Gowdappa
All,

Glusterfs cleans up POSIX locks held on an fd when the client/mount through
which those locks are held disconnects from bricks/server. This helps
Glusterfs to not run into a stale lock problem later (For eg., if
application unlocks while the connection was still down). However, this
means the lock is no longer exclusive as other applications/clients can
acquire the same lock. To communicate that locks are no longer valid, we
are planning to mark the fd (which has POSIX locks) bad on a disconnect so
that any future operations on that fd will fail, forcing the application to
re-open the fd and re-acquire locks it needs [1].

Note that with AFR/replicate in picture we can prevent errors to
application as long as Quorum number of children "never ever" lost
connection with bricks after locks have been acquired. I am using the term
"never ever" as locks are not healed back after re-connection and hence
first disconnect would've marked the fd bad and the fd remains so even
after re-connection happens. So, its not just Quorum number of children
"currently online", but Quorum number of children "never having
disconnected with bricks after locks are acquired".

However, this use case is not affected if the application don't acquire any
POSIX locks. So, I am interested in knowing
* whether your use cases use POSIX locks?
* Is it feasible for your application to re-open fds and re-acquire locks
on seeing EBADFD errors?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:16 PM Atin Mukherjee  wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>
> What do others think about it?
>

I too am afraid of unknown effects of this change as much of the codebase
relies on the assumption of zero-initialized data structures. I vote for
reverting these patches unless it can be demonstrated that performance
benefits are indeed significant. Otherwise the trade off in stability is
not worth the cost.


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici  wrote:

> Hi Raghavendra,
>
> the number of errors reduced, but during last days I received some error
> notifications from Nagios server similar to the following one:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ** Nagios *Notification Type: PROBLEMService: Brick -
> /gluster/mnt5/brickHost: s04Address: s04-stgState: CRITICALDate/Time: Mon
> Mar 18 19:56:36 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket
> timeout after 10 seconds.*
>
> The error was related only to s04 gluster server.
>
> So, following your suggestions,  I executed, on s04 node, the top command.
> In attachment, you can find the related output.
>

top output doesn't contain cmd/thread names. Was there anything wrong.


> Thank you very much for your help.
> Regards,
> Mauro
>
>
>
> On 14 Mar 2019, at 13:31, Raghavendra Gowdappa 
> wrote:
>
> Thanks Mauro.
>
> On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> I just changed the client option value to 8.
>> I will check the volume behaviour during the next hours.
>>
>> The GlusterFS version is 3.12.14.
>>
>> I will provide you the logs as soon as the activity load will be high.
>> Thank you,
>> Mauro
>>
>> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> Yes, server.event-thread has been changed from 4 to 8.
>>>
>>
>> Was client.event-thread value too changed to 8? If not, I would like to
>> know the results of including this tuning too. Also, if possible, can you
>> get the output of following command from problematic clients and bricks
>> (during the duration when load tends to be high and ping-timer-expiry is
>> seen)?
>>
>> # top -bHd 3
>>
>> This will help us to know  CPU utilization of event-threads.
>>
>> And I forgot to ask, what version of Glusterfs are you using?
>>
>> During last days, I noticed that the error events are still here although
>>> they have been considerably reduced.
>>>
>>> So, I used grep command against the log files in order to provide you a
>>> global vision about the warning, error and critical events appeared today
>>> at 06:xx (may be useful I hope).
>>> I collected the info from s06 gluster server, but the behaviour is the
>>> the almost the same on the other gluster servers.
>>>
>>> *ERRORS:  *
>>> *CWD: /var/log/glusterfs *
>>> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"*
>>>
>>> (I can see a lot of this kind of message in the same period but I'm
>>> notifying you only one record for each type of error)
>>>
>>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042]
>>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of
>>> /var/run/gluster/tier2_quota_list/
>>>
>>> glustershd.log:[2019-03-13 06:14:28.666562] E
>>> [rpc-clnt.c:350:saved_frames_unwind] (-->
>>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (-->
>>> /lib64/libgfr
>>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (-->
>>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup
>>> +0x90)[0x7f4a71ba3640] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] )
>>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3)
>>> op(INODELK(29))
>>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50)
>>>
>>> glustershd.log:[2019-03-13 06:17:48.883825] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:19:58.931798] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:08.979829] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [T

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-14 Thread Raghavendra Gowdappa
Thanks Mauro.

On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici  wrote:

> Hi Raghavendra,
>
> I just changed the client option value to 8.
> I will check the volume behaviour during the next hours.
>
> The GlusterFS version is 3.12.14.
>
> I will provide you the logs as soon as the activity load will be high.
> Thank you,
> Mauro
>
> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa 
> wrote:
>
>
>
> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> Yes, server.event-thread has been changed from 4 to 8.
>>
>
> Was client.event-thread value too changed to 8? If not, I would like to
> know the results of including this tuning too. Also, if possible, can you
> get the output of following command from problematic clients and bricks
> (during the duration when load tends to be high and ping-timer-expiry is
> seen)?
>
> # top -bHd 3
>
> This will help us to know  CPU utilization of event-threads.
>
> And I forgot to ask, what version of Glusterfs are you using?
>
> During last days, I noticed that the error events are still here although
>> they have been considerably reduced.
>>
>> So, I used grep command against the log files in order to provide you a
>> global vision about the warning, error and critical events appeared today
>> at 06:xx (may be useful I hope).
>> I collected the info from s06 gluster server, but the behaviour is the
>> the almost the same on the other gluster servers.
>>
>> *ERRORS:  *
>> *CWD: /var/log/glusterfs *
>> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"*
>>
>> (I can see a lot of this kind of message in the same period but I'm
>> notifying you only one record for each type of error)
>>
>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042]
>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of
>> /var/run/gluster/tier2_quota_list/
>>
>> glustershd.log:[2019-03-13 06:14:28.666562] E
>> [rpc-clnt.c:350:saved_frames_unwind] (-->
>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (-->
>> /lib64/libgfr
>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (-->
>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup
>> +0x90)[0x7f4a71ba3640] (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] )
>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3)
>> op(INODELK(29))
>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50)
>>
>> glustershd.log:[2019-03-13 06:17:48.883825] E
>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>> 192.168.0.55:49158 failed (Connection timed out); disco
>> nnecting socket
>> glustershd.log:[2019-03-13 06:19:58.931798] E
>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>> 192.168.0.55:49158 failed (Connection timed out); disco
>> nnecting socket
>> glustershd.log:[2019-03-13 06:22:08.979829] E
>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>> 192.168.0.55:49158 failed (Connection timed out); disco
>> nnecting socket
>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031]
>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>> operation failed [Transport endpoint
>> is not connected]
>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031]
>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>> operation failed [Transport endpoint
>> is not connected]
>> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031]
>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>> operation failed [Transport endpoint
>> is not connected]
>>
>> *WARNINGS:*
>> *CWD: /var/log/glusterfs *
>> *COMMAND: grep " W " *.log |grep "2019-03-13 06:"*
>>
>> (I can see a lot of this kind of message in the same period but I'm
>> notifying you only one record for each type of warnings)
>>
>> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031]
>> [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 0-tier2-client-55: remote
>> operation failed. Path: > 0f-f34d-4c25-bbe8-74bde0248d7e> (b6b35d0f-f34d-4c25-bbe8-74bde0248d7e).
>> Key: (null) [Transport endpoint is not connected]
>>
>> glustershd.log:[2019-03-13 06:14:31.421576] W [MSGID: 122035]
>> [ec-common.c:571:ec_child_select] 0-tier2-disperse-9: Executing operation
>> with some subvolumes una

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-13 Thread Raghavendra Gowdappa
luste
> rfs_sigwaiter+0xe5) [0x55ef010164b5]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55ef0101632b] ) 0-:
> received signum (15), shutting down
>
> *CRITICALS:*
> *CWD: /var/log/glusterfs *
> *COMMAND: grep " C " *.log |grep "2019-03-13 06:"*
>
> no critical errors at 06:xx
> only one critical error during the day
>
> *[root@s06 glusterfs]# grep " C " *.log |grep "2019-03-13"*
> glustershd.log:[2019-03-13 02:21:29.126279] C
> [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-tier2-client-55: server
> 192.168.0.55:49158 has not responded in the last 42 seconds,
> disconnecting.
>
>
> Thank you very much for your help.
> Regards,
> Mauro
>
> On 12 Mar 2019, at 05:17, Raghavendra Gowdappa 
> wrote:
>
> Was the suggestion to increase server.event-thread values tried? If yes,
> what were the results?
>
> On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici 
> wrote:
>
>> Dear All,
>>
>> do you have any suggestions about the right way to "debug" this issue?
>> In attachment, the updated logs of “s06" gluster server.
>>
>> I noticed a lot of intermittent warning and error messages.
>>
>> Thank you in advance,
>> Mauro
>>
>>
>>
>> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa 
>> wrote:
>>
>>
>> +Gluster Devel , +Gluster-users
>> 
>>
>> I would like to point out another issue. Even if what I suggested
>> prevents disconnects, part of the solution would be only symptomatic
>> treatment and doesn't address the root cause of the problem. In most of the
>> ping-timer-expiry issues, the root cause is the increased load on bricks
>> and the inability of bricks to be responsive under high load. So, the
>> actual solution would be doing any or both of the following:
>> * identify the source of increased load and if possible throttle it.
>> Internal heal processes like self-heal, rebalance, quota heal are known to
>> pump traffic into bricks without much throttling (io-threads _might_ do
>> some throttling, but my understanding is its not sufficient).
>> * identify the reason for bricks to become unresponsive during load. This
>> may be fixable issues like not enough event-threads to read from network or
>> difficult to fix issues like fsync on backend fs freezing the process or
>> semi fixable issues (in code) like lock contention.
>>
>> So any genuine effort to fix ping-timer-issues (to be honest most of the
>> times they are not issues related to rpc/network) would involve performance
>> characterization of various subsystems on bricks and clients. Various
>> subsystems can include (but not necessarily limited to), underlying
>> OS/filesystem, glusterfs processes, CPU consumption etc
>>
>> regards,
>> Raghavendra
>>
>> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici 
>> wrote:
>>
>>> Thank you, let’s try!
>>> I will inform you about the effects of the change.
>>>
>>> Regards,
>>> Mauro
>>>
>>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa 
>>> wrote:
>>>
>>>
>>>
>>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici 
>>> wrote:
>>>
>>>> Hi Raghavendra,
>>>>
>>>> thank you for your reply.
>>>> Yes, you are right. It is a problem that seems to happen randomly.
>>>> At this moment, server.event-threads value is 4. I will try to increase
>>>> this value to 8. Do you think that it could be a valid value ?
>>>>
>>>
>>> Yes. We can try with that. You should see at least frequency of
>>> ping-timer related disconnects  reduce with this value (even if it doesn't
>>> eliminate the problem completely).
>>>
>>>
>>>> Regards,
>>>> Mauro
>>>>
>>>>
>>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa 
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran 
>>>> wrote:
>>>>
>>>>> Hi Mauro,
>>>>>
>>>>> It looks like some problem on s06. Are all your other nodes ok? Can
>>>>> you send us the gluster logs from this node?
>>>>>
>>>>> @Raghavendra G  , do you have any idea as to
>>>>> how this can be debugged? Maybe running top ? Or debug brick logs?
>>>>>
>>>>
>>>> If we can reproduce the problem, collecting tcpdump on both ends of
>>>> 

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-11 Thread Raghavendra Gowdappa
Was the suggestion to increase server.event-thread values tried? If yes,
what were the results?

On Mon, Mar 11, 2019 at 2:40 PM Mauro Tridici  wrote:

> Dear All,
>
> do you have any suggestions about the right way to "debug" this issue?
> In attachment, the updated logs of “s06" gluster server.
>
> I noticed a lot of intermittent warning and error messages.
>
> Thank you in advance,
> Mauro
>
>
>
> On 4 Mar 2019, at 18:45, Raghavendra Gowdappa  wrote:
>
>
> +Gluster Devel , +Gluster-users
> 
>
> I would like to point out another issue. Even if what I suggested prevents
> disconnects, part of the solution would be only symptomatic treatment and
> doesn't address the root cause of the problem. In most of the
> ping-timer-expiry issues, the root cause is the increased load on bricks
> and the inability of bricks to be responsive under high load. So, the
> actual solution would be doing any or both of the following:
> * identify the source of increased load and if possible throttle it.
> Internal heal processes like self-heal, rebalance, quota heal are known to
> pump traffic into bricks without much throttling (io-threads _might_ do
> some throttling, but my understanding is its not sufficient).
> * identify the reason for bricks to become unresponsive during load. This
> may be fixable issues like not enough event-threads to read from network or
> difficult to fix issues like fsync on backend fs freezing the process or
> semi fixable issues (in code) like lock contention.
>
> So any genuine effort to fix ping-timer-issues (to be honest most of the
> times they are not issues related to rpc/network) would involve performance
> characterization of various subsystems on bricks and clients. Various
> subsystems can include (but not necessarily limited to), underlying
> OS/filesystem, glusterfs processes, CPU consumption etc
>
> regards,
> Raghavendra
>
> On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici 
> wrote:
>
>> Thank you, let’s try!
>> I will inform you about the effects of the change.
>>
>> Regards,
>> Mauro
>>
>> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> thank you for your reply.
>>> Yes, you are right. It is a problem that seems to happen randomly.
>>> At this moment, server.event-threads value is 4. I will try to increase
>>> this value to 8. Do you think that it could be a valid value ?
>>>
>>
>> Yes. We can try with that. You should see at least frequency of
>> ping-timer related disconnects  reduce with this value (even if it doesn't
>> eliminate the problem completely).
>>
>>
>>> Regards,
>>> Mauro
>>>
>>>
>>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa 
>>> wrote:
>>>
>>>
>>>
>>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran 
>>> wrote:
>>>
>>>> Hi Mauro,
>>>>
>>>> It looks like some problem on s06. Are all your other nodes ok? Can you
>>>> send us the gluster logs from this node?
>>>>
>>>> @Raghavendra G  , do you have any idea as to
>>>> how this can be debugged? Maybe running top ? Or debug brick logs?
>>>>
>>>
>>> If we can reproduce the problem, collecting tcpdump on both ends of
>>> connection will help. But, one common problem is these bugs are
>>> inconsistently reproducible and hence we may not be able to capture tcpdump
>>> at correct intervals. Other than that, we can try to collect some evidence
>>> that poller threads were busy (waiting on locks). But, not sure what debug
>>> data provides that information.
>>>
>>> From what I know, its difficult to collect evidence for this issue and
>>> we could only reason about it.
>>>
>>> We can try a workaround though - try increasing server.event-threads and
>>> see whether ping-timer expiry issues go away with an optimal value. If
>>> that's the case, it kind of provides proof for our hypothesis.
>>>
>>>
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici 
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> some minutes ago I received this message from NAGIOS server
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Re: [Gluster-devel] [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019

2019-03-07 Thread Raghavendra Gowdappa
On Thu, Mar 7, 2019 at 4:51 PM Strahil  wrote:

> Thanks,
>
> I have nothing in mind - but I know from experience that live sessions are
> much more interesting and going in deep.
>

I'll schedule a Bluejeans session on this. Will update the thread with a
date and time.

Best Regards,
> Strahil Nikolov
> On Mar 7, 2019 08:54, Raghavendra Gowdappa  wrote:
>
> Unfortunately, there is no recording. However, we are willing to discuss
> our findings if you've specific questions. We can do that in this thread.
>
> On Thu, Mar 7, 2019 at 10:33 AM Strahil  wrote:
>
> Thanks a lot.
> Is there a recording of that ?
>
> Best Regards,
> Strahil Nikolov
> On Mar 5, 2019 11:13, Raghavendra Gowdappa  wrote:
>
> All,
>
> Recently me, Manoj and Csaba presented on positives and negatives of
> implementing File systems in userspace using FUSE [1]. We had based the
> talk on our experiences with Glusterfs having FUSE as the native interface.
> The slides can also be found at [1].
>
> [1] https://www.usenix.org/conference/vault19/presentation/pillai
>
> regards,
> Raghavendra
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Experiences with FUSE in real world - Presentationat Vault 2019

2019-03-06 Thread Raghavendra Gowdappa
Unfortunately, there is no recording. However, we are willing to discuss
our findings if you've specific questions. We can do that in this thread.

On Thu, Mar 7, 2019 at 10:33 AM Strahil  wrote:

> Thanks a lot.
> Is there a recording of that ?
>
> Best Regards,
> Strahil Nikolov
> On Mar 5, 2019 11:13, Raghavendra Gowdappa  wrote:
>
> All,
>
> Recently me, Manoj and Csaba presented on positives and negatives of
> implementing File systems in userspace using FUSE [1]. We had based the
> talk on our experiences with Glusterfs having FUSE as the native interface.
> The slides can also be found at [1].
>
> [1] https://www.usenix.org/conference/vault19/presentation/pillai
>
> regards,
> Raghavendra
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-04 Thread Raghavendra Gowdappa
+Gluster Devel , +Gluster-users


I would like to point out another issue. Even if what I suggested prevents
disconnects, part of the solution would be only symptomatic treatment and
doesn't address the root cause of the problem. In most of the
ping-timer-expiry issues, the root cause is the increased load on bricks
and the inability of bricks to be responsive under high load. So, the
actual solution would be doing any or both of the following:
* identify the source of increased load and if possible throttle it.
Internal heal processes like self-heal, rebalance, quota heal are known to
pump traffic into bricks without much throttling (io-threads _might_ do
some throttling, but my understanding is its not sufficient).
* identify the reason for bricks to become unresponsive during load. This
may be fixable issues like not enough event-threads to read from network or
difficult to fix issues like fsync on backend fs freezing the process or
semi fixable issues (in code) like lock contention.

So any genuine effort to fix ping-timer-issues (to be honest most of the
times they are not issues related to rpc/network) would involve performance
characterization of various subsystems on bricks and clients. Various
subsystems can include (but not necessarily limited to), underlying
OS/filesystem, glusterfs processes, CPU consumption etc

regards,
Raghavendra

On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici  wrote:

> Thank you, let’s try!
> I will inform you about the effects of the change.
>
> Regards,
> Mauro
>
> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa  wrote:
>
>
>
> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> thank you for your reply.
>> Yes, you are right. It is a problem that seems to happen randomly.
>> At this moment, server.event-threads value is 4. I will try to increase
>> this value to 8. Do you think that it could be a valid value ?
>>
>
> Yes. We can try with that. You should see at least frequency of ping-timer
> related disconnects  reduce with this value (even if it doesn't eliminate
> the problem completely).
>
>
>> Regards,
>> Mauro
>>
>>
>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran 
>> wrote:
>>
>>> Hi Mauro,
>>>
>>> It looks like some problem on s06. Are all your other nodes ok? Can you
>>> send us the gluster logs from this node?
>>>
>>> @Raghavendra G  , do you have any idea as to
>>> how this can be debugged? Maybe running top ? Or debug brick logs?
>>>
>>
>> If we can reproduce the problem, collecting tcpdump on both ends of
>> connection will help. But, one common problem is these bugs are
>> inconsistently reproducible and hence we may not be able to capture tcpdump
>> at correct intervals. Other than that, we can try to collect some evidence
>> that poller threads were busy (waiting on locks). But, not sure what debug
>> data provides that information.
>>
>> From what I know, its difficult to collect evidence for this issue and we
>> could only reason about it.
>>
>> We can try a workaround though - try increasing server.event-threads and
>> see whether ping-timer expiry issues go away with an optimal value. If
>> that's the case, it kind of provides proof for our hypothesis.
>>
>>
>>>
>>> Regards,
>>> Nithya
>>>
>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici 
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> some minutes ago I received this message from NAGIOS server
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ** Nagios *Notification Type: PROBLEMService: Brick -
>>>> /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon
>>>> Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket
>>>> timeout after 10 seconds.*
>>>>
>>>> I checked the network, RAM and CPUs usage on s06 node and everything
>>>> seems to be ok.
>>>> No bricks are in error state. In /var/log/messages, I detected again a
>>>> crash of “check_vol_utili” that I think it is a module used by NRPE
>>>> executable (that is the NAGIOS client).
>>>>
>>>> Mar  4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general
>>>> protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in
>>>> libglusterfs.so.0.0.1[7

Re: [Gluster-devel] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Wed, Feb 13, 2019 at 11:16 AM Manoj Pillai  wrote:

>
>
> On Wed, Feb 13, 2019 at 10:51 AM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
>> wrote:
>>
>>> All,
>>>
>>> We've found perf xlators io-cache and read-ahead not adding any
>>> performance improvement. At best read-ahead is redundant due to kernel
>>> read-ahead
>>>
>>
>> One thing we are still figuring out is whether kernel read-ahead is
>> tunable. From what we've explored, it _looks_ like (may not be entirely
>> correct), ra is capped at 128KB. If that's the case, I am interested in few
>> things:
>> * Are there any realworld applications/usecases, which would benefit from
>> larger read-ahead (Manoj says block devices can do ra of 4MB)?
>>
>
> kernel read-ahead is adaptive but influenced by the read-ahead setting on
> the block device (/sys/block//queue/read_ahead_kb), which can be
> tuned. For RHEL specifically, the default is 128KB (last I checked) but the
> default RHEL tuned-profile, throughput-performance, bumps that up to 4MB.
> It should be fairly easy to rig up a test  where 4MB read-ahead on the
> block device gives better performance than 128KB read-ahead.
>

Thanks Manoj. To add to what Manoj said and give more context here,
Glusterfs being a fuse-based fs is not exposed as a block device. So,
that's the first problem of where/how to tune and I've listed other
problems earlier.


> -- Manoj
>
> * Is the limit on kernel ra tunable a hard one? IOW, what does it take to
>> make it to do higher ra? If its difficult, can glusterfs read-ahead provide
>> the expected performance improvement for these applications that would
>> benefit from aggressive ra (as glusterfs can support larger ra sizes)?
>>
>> I am still inclined to prefer kernel ra as I think its more intelligent
>> and can identify more sequential patterns than Glusterfs read-ahead [1][2].
>> [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
>> [2] https://lwn.net/Articles/155510/
>>
>> and at worst io-cache is degrading the performance for workloads that
>>> doesn't involve re-read. Given that VFS already have both these
>>> functionalities, I am proposing to have these two translators turned off by
>>> default for native fuse mounts.
>>>
>>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>>> these xlators on by having custom profiles. Comments?
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>>
>>> regards,
>>> Raghavendra
>>>
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
wrote:

> All,
>
> We've found perf xlators io-cache and read-ahead not adding any
> performance improvement. At best read-ahead is redundant due to kernel
> read-ahead
>

One thing we are still figuring out is whether kernel read-ahead is
tunable. From what we've explored, it _looks_ like (may not be entirely
correct), ra is capped at 128KB. If that's the case, I am interested in few
things:
* Are there any realworld applications/usecases, which would benefit from
larger read-ahead (Manoj says block devices can do ra of 4MB)?
* Is the limit on kernel ra tunable a hard one? IOW, what does it take to
make it to do higher ra? If its difficult, can glusterfs read-ahead provide
the expected performance improvement for these applications that would
benefit from aggressive ra (as glusterfs can support larger ra sizes)?

I am still inclined to prefer kernel ra as I think its more intelligent and
can identify more sequential patterns than Glusterfs read-ahead [1][2].
[1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf
[2] https://lwn.net/Articles/155510/

and at worst io-cache is degrading the performance for workloads that
> doesn't involve re-read. Given that VFS already have both these
> functionalities, I am proposing to have these two translators turned off by
> default for native fuse mounts.
>
> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
> these xlators on by having custom profiles. Comments?
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>
> regards,
> Raghavendra
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 11:09 PM Darrell Budic 
wrote:

> Is there an example of a custom profile you can share for my ovirt use
> case (with gfapi enabled)?
>

I was speaking about a group setting like "group metadata-cache". Its just
that custom options one would turn on for a class of applications or
problems.

Or are you just talking about the standard group settings for virt as a
> custom profile?
>
> On Feb 12, 2019, at 7:22 AM, Raghavendra Gowdappa 
> wrote:
>
> https://review.gluster.org/22203
>
> On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> We've found perf xlators io-cache and read-ahead not adding any
>> performance improvement. At best read-ahead is redundant due to kernel
>> read-ahead and at worst io-cache is degrading the performance for workloads
>> that doesn't involve re-read. Given that VFS already have both these
>> functionalities, I am proposing to have these two translators turned off by
>> default for native fuse mounts.
>>
>> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
>> these xlators on by having custom profiles. Comments?
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>>
>> regards,
>> Raghavendra
>>
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Failing test case ./tests/bugs/distribute/bug-1161311.t

2019-02-12 Thread Raghavendra Gowdappa
On Wed, Feb 13, 2019 at 9:54 AM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

>
>
> On Wed, Feb 13, 2019 at 9:51 AM Nithya Balachandran 
> wrote:
>
>> I'll take a look at this today. The logs indicate the test completed in
>> under 3 minutes but something seems to be holding up the cleanup.
>>
>>
> Just a look on some successful runs show output like below:
>
> --
>
> *17:44:49* ok 57, LINENUM:155*17:44:49* umount: /d/backends/patchy1: target 
> is busy.*17:44:49* (In some cases useful info about processes that 
> use*17:44:49*  the device is found by lsof(8) or fuser(1))*17:44:49* 
> umount: /d/backends/patchy2: target is busy.*17:44:49* (In some cases 
> useful info about processes that use*17:44:49*  the device is found 
> by lsof(8) or fuser(1))*17:44:49* umount: /d/backends/patchy3: target is 
> busy.*17:44:49* (In some cases useful info about processes that 
> use*17:44:49*  the device is found by lsof(8) or fuser(1))*17:44:49* 
> N*17:44:49* ok
>
> --
>
> This is just before finish, so , the cleanup is being held for sure.
>

Yes. In my tests too, I saw these msgs. But, i thought they are not
accounted in waiting time.


> Regards,
> Amar
>
> On Tue, 12 Feb 2019 at 19:30, Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 7:16 PM Mohit Agrawal 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have observed the test case ./tests/bugs/distribute/bug-1161311.t is
>>>> getting timed
>>>>
>>>
>>> I've seen failure of this too in some of my patches.
>>>
>>> out on build server at the time of running centos regression on one of
>>>> my patch https://review.gluster.org/22166
>>>>
>>>> I have executed test case for i in {1..30}; do time prove -vf
>>>> ./tests/bugs/distribute/bug-1161311.t; done 30 times on softserv vm that is
>>>> similar to build infra, the test case is not taking time more than 3
>>>> minutes but on build server test case is getting timed out.
>>>>
>>>> Kindly share your input if you are facing the same.
>>>>
>>>> Thanks,
>>>> Mohit Agrawal
>>>> ___
>>>> Gluster-devel mailing list
>>>> Gluster-devel@gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Failing test case ./tests/bugs/distribute/bug-1161311.t

2019-02-12 Thread Raghavendra Gowdappa
On Tue, Feb 12, 2019 at 7:16 PM Mohit Agrawal  wrote:

> Hi,
>
> I have observed the test case ./tests/bugs/distribute/bug-1161311.t is
> getting timed
>

I've seen failure of this too in some of my patches.

out on build server at the time of running centos regression on one of my
> patch https://review.gluster.org/22166
>
> I have executed test case for i in {1..30}; do time prove -vf
> ./tests/bugs/distribute/bug-1161311.t; done 30 times on softserv vm that is
> similar to build infra, the test case is not taking time more than 3
> minutes but on build server test case is getting timed out.
>
> Kindly share your input if you are facing the same.
>
> Thanks,
> Mohit Agrawal
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
https://review.gluster.org/22203

On Tue, Feb 12, 2019 at 5:38 PM Raghavendra Gowdappa 
wrote:

> All,
>
> We've found perf xlators io-cache and read-ahead not adding any
> performance improvement. At best read-ahead is redundant due to kernel
> read-ahead and at worst io-cache is degrading the performance for workloads
> that doesn't involve re-read. Given that VFS already have both these
> functionalities, I am proposing to have these two translators turned off by
> default for native fuse mounts.
>
> For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have
> these xlators on by having custom profiles. Comments?
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029
>
> regards,
> Raghavendra
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Disabling read-ahead and io-cache for native fuse mounts

2019-02-12 Thread Raghavendra Gowdappa
All,

We've found perf xlators io-cache and read-ahead not adding any performance
improvement. At best read-ahead is redundant due to kernel read-ahead and
at worst io-cache is degrading the performance for workloads that doesn't
involve re-read. Given that VFS already have both these functionalities, I
am proposing to have these two translators turned off by default for native
fuse mounts.

For non-native fuse mounts like gfapi (NFS-ganesha/samba) we can have these
xlators on by having custom profiles. Comments?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1665029

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Memory management, OOM kills and glusterfs

2019-02-04 Thread Raghavendra Gowdappa
All,

Me, Csaba and Manoj are presenting our experiences with using FUSE as an
interface for Glusterfs at Vault'19 [1]. One of the areas Glusterfs has
faced difficulties is with memory management. One of the reasons for high
memory consumption has been the amount of memory consumed by glusterfs
mount process to maintain the inodes looked up by the kernel. Though we've
a solution [2] for this problem, things would've been much easier and
effective if Glusterfs was in kernel space (for the case of memory
management). In kernel space, the memory consumed by inodes would be
accounted for kernel's inode cache and hence kernel memory management would
manage the inodes more effectively and intelligently. However, being in
userspace there is no way to account the memory consumed for an inode in
user space and hence only a very small part of the memory gets accounted
(the inode maintained by fuse kernel module).

The objective of this mail is to collect more cases/user issues/bugs such
as these so that we can present them as evidence. I am currently aware of a
tracker issue [3] which covers the issue I mentioned above.

Also, if you are aware of any other memory management issues, we are
interested in them.

[1] https://www.usenix.org/conference/vault19/presentation/pillai
[2] https://review.gluster.org/#/c/glusterfs/+/19778/
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1647277
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2019-01-25 Thread Raghavendra Gowdappa
On Sat, Jan 26, 2019 at 8:03 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Jan 11, 2019 at 8:09 PM Raghavendra Gowdappa 
> wrote:
>
>> Here is the update of the progress till now:
>> * The client profile attached till now shows the tuple creation is
>> dominated by writes and fstats. Note that fstats are side-effects of writes
>> as writes invalidate attributes of the file from kernel attribute cache.
>> * The rest of the init phase (which is marked by msgs "setting primary
>> key" and "vaccuum") is dominated by reads. Next bigger set of operations
>> are writes followed by fstats.
>>
>> So, only writes, reads and fstats are the operations we need to optimize
>> to reduce the init time latency. As mentioned in my previous mail, I did
>> following tunings:
>> * Enabled only write-behind, md-cache and open-behind.
>> - write-behind was configured with a cache-size/window-size of 20MB
>> - open-behind was configured with read-after-open yes
>> - md-cache was loaded as a child of write-behind in xlator graph. As
>> a parent of write-behind, writes responses of writes cached in write-behind
>> would invalidate stats. But when loaded as a child of write-behind this
>> problem won't be there. Note that in both cases fstat would pass through
>> write-behind (In the former case due to no stats in md-cache). However in
>> the latter case fstats can be served by md-cache.
>> - md-cache used to aggressively invalidate inodes. For the purpose of
>> this test, I just commented out inode-invalidate code in md-cache. We need
>> to fine tune the invalidation invocation logic.
>> - set group-metadata-cache to on. But turned off upcall
>> notifications. Note that since this workload basically accesses all its
>> data through single mount point. So, there is no shared files across mounts
>> and hence its safe to turn off invalidations.
>> * Applied fix to https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>>
>> With the above set of tunings I could reduce the init time of scale 8000
>> from 16.6 hrs to 11.4 hrs - an improvement in the range 25% to 30%
>>
>> Since the workload is dominated by reads, we think a good read-cache
>> where reads to regions just written are served from cache would greatly
>> improve the performance. Since kernel page-cache already provides that
>> functionality along with read-ahead (which is more intelligent and serves
>> more read patterns than supported by Glusterfs read-ahead), we wanted to
>> try that. But, Manoj found a bug where reads followed by writes are not
>> served from page cache [5]. I am currently waiting for the resolution of
>> this bug. As an alternative, I can modify io-cache to serve reads from the
>> data just written. But, the change involves its challenges and hence would
>> like to get a resolution on [5] (either positive or negative) before
>> proceeding with modifications to io-cache.
>>
>> As to the rpc latency, Krutika had long back identified that reading a
>> single rpc message involves atleast 4 reads to socket. These many number of
>> reads were done to identify the structure of the message on the go. The
>> reason we wanted to discover the rpc message was to identify the part of
>> the rpc message containing read or write payload and make sure that payload
>> is directly read into a buffer different than the one containing rest of
>> the rpc message. This strategy will make sure payloads are not copied again
>> when buffers are moved across caches (read-ahead, io-cache etc) and also
>> the rest of the rpc message can be freed even though the payload outlives
>> the rpc message (when payloads are cached). However, we can experiment an
>> approach where we can either do away with zero-copy requirement or let the
>> entire buffer containing rpc message and payload to live in the cache.
>>
>> From my observations and discussions with Manoj and Xavi, this workload
>> is very sensitive to latency (than to concurrency). So, I am hopeful the
>> above approaches will give positive results.
>>
>
> Me, Manoj and Csaba figured out that invalidations by md-cache and Fuse
> auto-invalidations  were dropping the kernel page-cache (more details on
> [5]). Changes to stats by writes from same client (local writes) were
> triggering both these codepaths dropping the cache. Since all the I/O done
> by this workload goes through the caches of single client, the
> invalidations are not necessary and I made code changes to fuse-bridge to
> disable auto-invalidations completely and commented out inode-invalidations
> in md-cache. Note that thi

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2019-01-25 Thread Raghavendra Gowdappa
On Sat, Jan 26, 2019 at 8:03 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Jan 11, 2019 at 8:09 PM Raghavendra Gowdappa 
> wrote:
>
>> Here is the update of the progress till now:
>> * The client profile attached till now shows the tuple creation is
>> dominated by writes and fstats. Note that fstats are side-effects of writes
>> as writes invalidate attributes of the file from kernel attribute cache.
>> * The rest of the init phase (which is marked by msgs "setting primary
>> key" and "vaccuum") is dominated by reads. Next bigger set of operations
>> are writes followed by fstats.
>>
>> So, only writes, reads and fstats are the operations we need to optimize
>> to reduce the init time latency. As mentioned in my previous mail, I did
>> following tunings:
>> * Enabled only write-behind, md-cache and open-behind.
>> - write-behind was configured with a cache-size/window-size of 20MB
>> - open-behind was configured with read-after-open yes
>> - md-cache was loaded as a child of write-behind in xlator graph. As
>> a parent of write-behind, writes responses of writes cached in write-behind
>> would invalidate stats. But when loaded as a child of write-behind this
>> problem won't be there. Note that in both cases fstat would pass through
>> write-behind (In the former case due to no stats in md-cache). However in
>> the latter case fstats can be served by md-cache.
>> - md-cache used to aggressively invalidate inodes. For the purpose of
>> this test, I just commented out inode-invalidate code in md-cache. We need
>> to fine tune the invalidation invocation logic.
>> - set group-metadata-cache to on. But turned off upcall
>> notifications. Note that since this workload basically accesses all its
>> data through single mount point. So, there is no shared files across mounts
>> and hence its safe to turn off invalidations.
>> * Applied fix to https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>>
>> With the above set of tunings I could reduce the init time of scale 8000
>> from 16.6 hrs to 11.4 hrs - an improvement in the range 25% to 30%
>>
>> Since the workload is dominated by reads, we think a good read-cache
>> where reads to regions just written are served from cache would greatly
>> improve the performance. Since kernel page-cache already provides that
>> functionality along with read-ahead (which is more intelligent and serves
>> more read patterns than supported by Glusterfs read-ahead), we wanted to
>> try that. But, Manoj found a bug where reads followed by writes are not
>> served from page cache [5]. I am currently waiting for the resolution of
>> this bug. As an alternative, I can modify io-cache to serve reads from the
>> data just written. But, the change involves its challenges and hence would
>> like to get a resolution on [5] (either positive or negative) before
>> proceeding with modifications to io-cache.
>>
>> As to the rpc latency, Krutika had long back identified that reading a
>> single rpc message involves atleast 4 reads to socket. These many number of
>> reads were done to identify the structure of the message on the go. The
>> reason we wanted to discover the rpc message was to identify the part of
>> the rpc message containing read or write payload and make sure that payload
>> is directly read into a buffer different than the one containing rest of
>> the rpc message. This strategy will make sure payloads are not copied again
>> when buffers are moved across caches (read-ahead, io-cache etc) and also
>> the rest of the rpc message can be freed even though the payload outlives
>> the rpc message (when payloads are cached). However, we can experiment an
>> approach where we can either do away with zero-copy requirement or let the
>> entire buffer containing rpc message and payload to live in the cache.
>>
>> From my observations and discussions with Manoj and Xavi, this workload
>> is very sensitive to latency (than to concurrency). So, I am hopeful the
>> above approaches will give positive results.
>>
>
> Me, Manoj and Csaba figured out that invalidations by md-cache and Fuse
> auto-invalidations  were dropping the kernel page-cache (more details on
> [5]).
>

Thanks to Miklos for the pointer on auto-invalidations.


> Changes to stats by writes from same client (local writes) were triggering
> both these codepaths dropping the cache. Since all the I/O done by this
> workload goes through the caches of single client, the invalidations are
> not necessary and I made code changes to fuse-bridge to disable
> auto-invalidations complete

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2019-01-25 Thread Raghavendra Gowdappa
On Fri, Jan 11, 2019 at 8:09 PM Raghavendra Gowdappa 
wrote:

> Here is the update of the progress till now:
> * The client profile attached till now shows the tuple creation is
> dominated by writes and fstats. Note that fstats are side-effects of writes
> as writes invalidate attributes of the file from kernel attribute cache.
> * The rest of the init phase (which is marked by msgs "setting primary
> key" and "vaccuum") is dominated by reads. Next bigger set of operations
> are writes followed by fstats.
>
> So, only writes, reads and fstats are the operations we need to optimize
> to reduce the init time latency. As mentioned in my previous mail, I did
> following tunings:
> * Enabled only write-behind, md-cache and open-behind.
> - write-behind was configured with a cache-size/window-size of 20MB
> - open-behind was configured with read-after-open yes
> - md-cache was loaded as a child of write-behind in xlator graph. As a
> parent of write-behind, writes responses of writes cached in write-behind
> would invalidate stats. But when loaded as a child of write-behind this
> problem won't be there. Note that in both cases fstat would pass through
> write-behind (In the former case due to no stats in md-cache). However in
> the latter case fstats can be served by md-cache.
> - md-cache used to aggressively invalidate inodes. For the purpose of
> this test, I just commented out inode-invalidate code in md-cache. We need
> to fine tune the invalidation invocation logic.
> - set group-metadata-cache to on. But turned off upcall notifications.
> Note that since this workload basically accesses all its data through
> single mount point. So, there is no shared files across mounts and hence
> its safe to turn off invalidations.
> * Applied fix to https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>
> With the above set of tunings I could reduce the init time of scale 8000
> from 16.6 hrs to 11.4 hrs - an improvement in the range 25% to 30%
>
> Since the workload is dominated by reads, we think a good read-cache where
> reads to regions just written are served from cache would greatly improve
> the performance. Since kernel page-cache already provides that
> functionality along with read-ahead (which is more intelligent and serves
> more read patterns than supported by Glusterfs read-ahead), we wanted to
> try that. But, Manoj found a bug where reads followed by writes are not
> served from page cache [5]. I am currently waiting for the resolution of
> this bug. As an alternative, I can modify io-cache to serve reads from the
> data just written. But, the change involves its challenges and hence would
> like to get a resolution on [5] (either positive or negative) before
> proceeding with modifications to io-cache.
>
> As to the rpc latency, Krutika had long back identified that reading a
> single rpc message involves atleast 4 reads to socket. These many number of
> reads were done to identify the structure of the message on the go. The
> reason we wanted to discover the rpc message was to identify the part of
> the rpc message containing read or write payload and make sure that payload
> is directly read into a buffer different than the one containing rest of
> the rpc message. This strategy will make sure payloads are not copied again
> when buffers are moved across caches (read-ahead, io-cache etc) and also
> the rest of the rpc message can be freed even though the payload outlives
> the rpc message (when payloads are cached). However, we can experiment an
> approach where we can either do away with zero-copy requirement or let the
> entire buffer containing rpc message and payload to live in the cache.
>
> From my observations and discussions with Manoj and Xavi, this workload is
> very sensitive to latency (than to concurrency). So, I am hopeful the above
> approaches will give positive results.
>

Me, Manoj and Csaba figured out that invalidations by md-cache and Fuse
auto-invalidations  were dropping the kernel page-cache (more details on
[5]). Changes to stats by writes from same client (local writes) were
triggering both these codepaths dropping the cache. Since all the I/O done
by this workload goes through the caches of single client, the
invalidations are not necessary and I made code changes to fuse-bridge to
disable auto-invalidations completely and commented out inode-invalidations
in md-cache. Note that this doesn't regress the consistency/coherency of
data seen in the caches as its a single client use-case. With these two
changes coupled with earlier optimizations (client-io-threads=on,
server/client-event-threads=4, md-cache as a child of write-behind in
xlator graph, performance.md-cache-timeout=600), pgbench init of scale 8000
on a volume with NVMe backend completed in 54m

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2019-01-11 Thread Raghavendra Gowdappa
Here is the update of the progress till now:
* The client profile attached till now shows the tuple creation is
dominated by writes and fstats. Note that fstats are side-effects of writes
as writes invalidate attributes of the file from kernel attribute cache.
* The rest of the init phase (which is marked by msgs "setting primary key"
and "vaccuum") is dominated by reads. Next bigger set of operations are
writes followed by fstats.

So, only writes, reads and fstats are the operations we need to optimize to
reduce the init time latency. As mentioned in my previous mail, I did
following tunings:
* Enabled only write-behind, md-cache and open-behind.
- write-behind was configured with a cache-size/window-size of 20MB
- open-behind was configured with read-after-open yes
- md-cache was loaded as a child of write-behind in xlator graph. As a
parent of write-behind, writes responses of writes cached in write-behind
would invalidate stats. But when loaded as a child of write-behind this
problem won't be there. Note that in both cases fstat would pass through
write-behind (In the former case due to no stats in md-cache). However in
the latter case fstats can be served by md-cache.
- md-cache used to aggressively invalidate inodes. For the purpose of
this test, I just commented out inode-invalidate code in md-cache. We need
to fine tune the invalidation invocation logic.
- set group-metadata-cache to on. But turned off upcall notifications.
Note that since this workload basically accesses all its data through
single mount point. So, there is no shared files across mounts and hence
its safe to turn off invalidations.
* Applied fix to https://bugzilla.redhat.com/show_bug.cgi?id=1648781

With the above set of tunings I could reduce the init time of scale 8000
from 16.6 hrs to 11.4 hrs - an improvement in the range 25% to 30%

Since the workload is dominated by reads, we think a good read-cache where
reads to regions just written are served from cache would greatly improve
the performance. Since kernel page-cache already provides that
functionality along with read-ahead (which is more intelligent and serves
more read patterns than supported by Glusterfs read-ahead), we wanted to
try that. But, Manoj found a bug where reads followed by writes are not
served from page cache [5]. I am currently waiting for the resolution of
this bug. As an alternative, I can modify io-cache to serve reads from the
data just written. But, the change involves its challenges and hence would
like to get a resolution on [5] (either positive or negative) before
proceeding with modifications to io-cache.

As to the rpc latency, Krutika had long back identified that reading a
single rpc message involves atleast 4 reads to socket. These many number of
reads were done to identify the structure of the message on the go. The
reason we wanted to discover the rpc message was to identify the part of
the rpc message containing read or write payload and make sure that payload
is directly read into a buffer different than the one containing rest of
the rpc message. This strategy will make sure payloads are not copied again
when buffers are moved across caches (read-ahead, io-cache etc) and also
the rest of the rpc message can be freed even though the payload outlives
the rpc message (when payloads are cached). However, we can experiment an
approach where we can either do away with zero-copy requirement or let the
entire buffer containing rpc message and payload to live in the cache.

>From my observations and discussions with Manoj and Xavi, this workload is
very sensitive to latency (than to concurrency). So, I am hopeful the above
approaches will give positive results.

[5] https://bugzilla.redhat.com/show_bug.cgi?id=1664934

regards,
Raghavendra

On Fri, Dec 28, 2018 at 12:44 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
>> sankarshan.mukhopadh...@gmail.com> wrote:
>>
>>> [pulling the conclusions up to enable better in-line]
>>>
>>> > Conclusions:
>>> >
>>> > We should never have a volume with caching-related xlators disabled.
>>> The price we pay for it is too high. We need to make them work consistently
>>> and aggressively to avoid as many requests as we can.
>>>
>>> Are there current issues in terms of behavior which are known/observed
>>> when these are enabled?
>>>
>>
>> We did have issues with pgbench in past. But they've have been fixed.
>> Please refer to bz [1] for details. On 5.1, it runs successfully with all
>> caching related xlators enabled. Having said that the only performance
>> xlators which gave improved performance were open-behind and write-behind
>> [2] (write-behin

Re: [Gluster-devel] [Gluster-users] On making ctime generator enabled by default in stack

2019-01-02 Thread Raghavendra Gowdappa
On Mon, Nov 12, 2018 at 10:48 AM Amar Tumballi  wrote:

>
>
> On Mon, Nov 12, 2018 at 10:39 AM Vijay Bellur  wrote:
>
>>
>>
>> On Sun, Nov 11, 2018 at 8:25 PM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Sun, Nov 11, 2018 at 11:41 PM Vijay Bellur 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Nov 5, 2018 at 8:31 PM Raghavendra Gowdappa <
>>>> rgowd...@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2018 at 9:58 AM Vijay Bellur 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 5, 2018 at 7:56 PM Raghavendra Gowdappa <
>>>>>> rgowd...@redhat.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> There is a patch [1] from Kotresh, which makes ctime generator as
>>>>>>> default in stack. Currently ctime generator is being recommended only 
>>>>>>> for
>>>>>>> usecases where ctime is important (like for Elasticsearch). However, a
>>>>>>> reliable (c)(m)time can fix many consistency issues within glusterfs 
>>>>>>> stack
>>>>>>> too. These are issues with caching layers having stale (meta)data
>>>>>>> [2][3][4]. Basically just like applications, components within glusterfs
>>>>>>> stack too need a time to find out which among racing ops (like write, 
>>>>>>> stat,
>>>>>>> etc) has latest (meta)data.
>>>>>>>
>>>>>>> Also note that a consistent (c)(m)time is not an optional feature,
>>>>>>> but instead forms the core of the infrastructure. So, I am proposing to
>>>>>>> merge this patch. If you've any objections, please voice out before Nov 
>>>>>>> 13,
>>>>>>> 2018 (a week from today).
>>>>>>>
>>>>>>> As to the existing known issues/limitations with ctime generator, my
>>>>>>> conversations with Kotresh, revealed following:
>>>>>>> * Potential performance degradation (we don't yet have data to
>>>>>>> conclusively prove it, preliminary basic tests from Kotresh didn't 
>>>>>>> indicate
>>>>>>> a significant perf drop).
>>>>>>>
>>>>>>
>>>>>> Do we have this data captured somewhere? If not, would it be possible
>>>>>> to share that data here?
>>>>>>
>>>>>
>>>>> I misquoted Kotresh. He had measured impact of gfid2path and said both
>>>>> features might've similar impact as major perf cost is related to storing
>>>>> xattrs on backend fs. I am in the process of getting a fresh set of
>>>>> numbers. Will post those numbers when available.
>>>>>
>>>>>
>>>>
>>>> I observe that the patch under discussion has been merged now [1]. A
>>>> quick search did not yield me any performance data. Do we have the
>>>> performance numbers posted somewhere?
>>>>
>>>
>>> No. Perf benchmarking is a task pending on me.
>>>
>>
>> When can we expect this task to be complete?
>>
>> In any case, I don't think it is ideal for us to merge a patch without
>> completing our due diligence on it. How do we want to handle this scenario
>> since the patch is already merged?
>>
>> We could:
>>
>> 1. Revert the patch now
>> 2. Review the performance data and revert the patch if performance
>> characterization indicates a significant dip. It would be preferable to
>> complete this activity before we branch off for the next release.
>>
>
> I am for option 2. Considering the branch out for next release is another
> 2 months, and no one is expected to use the 'release' off a master branch
> yet, it makes sense to give that buffer time to get this activity completed.
>

Its unlikely I'll have time for carrying out perf benchmark. Hence I've
posted a revert here: https://review.gluster.org/#/c/glusterfs/+/21975/


> Regards,
> Amar
>
> 3. Think of some other option?
>>
>> Thanks,
>> Vijay
>>
>>
>>> ___
>> Gluster-users mailing list
>> gluster-us...@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [DHT] serialized readdir(p) across subvols and effect on performance

2018-12-31 Thread Raghavendra Gowdappa
All,

As many of us are aware, readdir(p)s are serialized across DHT subvols. One
of the intuitive first reactions for this algorithm is that readdir(p) is
going to be slow.

However this is partly true as reading the contents of a directory is
normally split into multiple readdir(p) calls and most of the times (when a
directory is sufficiently large to have dentries and inode data is bigger
than a typical readdir(p) buffer size - 128K when readdir-ahead is enabled
and 4KB on fuse when readdir-ahead is disabled - on each subvol) a single
readdir(p) request is served from a single subvolume (or two subvolumes in
the worst case) and hence a single readdir(p) is not serialized across all
subvolumes.

Having said that, there are definitely cases where a single readdir(p)
request can be serialized on many subvolumes. A best example for this is a
readdir(p) request on an empty directory. Other relevant examples are those
directories which don't have enough dentries to fit into a single
readdir(p) buffer size on each subvolume of DHT. This is where
performance.parallel-readdir helps. Also, note that this is the same reason
why having cache-size for each readdir-ahead (loaded as a parent for each
DHT subvolume) way bigger than a single readdir(p) buffer size won't really
improve the performance in proportion to cache-size when
performance.parallel-readdir is enabled.

Though this is not a new observation [1] (I stumbled upon [1] after
realizing the above myself independently while working on
performance.parallel-readdir), I felt this as a common misconception (I ran
into similar argument while trying to explain DHT architecture to someone
new to Glusterfs recently) and hence thought of writing out a mail to
clarify the same.

Wish you a happy new year 2019 :).

[1] https://lists.gnu.org/archive/html/gluster-devel/2013-09/msg00034.html

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2018-12-27 Thread Raghavendra Gowdappa
On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
> sankarshan.mukhopadh...@gmail.com> wrote:
>
>> [pulling the conclusions up to enable better in-line]
>>
>> > Conclusions:
>> >
>> > We should never have a volume with caching-related xlators disabled.
>> The price we pay for it is too high. We need to make them work consistently
>> and aggressively to avoid as many requests as we can.
>>
>> Are there current issues in terms of behavior which are known/observed
>> when these are enabled?
>>
>
> We did have issues with pgbench in past. But they've have been fixed.
> Please refer to bz [1] for details. On 5.1, it runs successfully with all
> caching related xlators enabled. Having said that the only performance
> xlators which gave improved performance were open-behind and write-behind
> [2] (write-behind had some issues, which will be fixed by [3] and we'll
> have to measure performance again with fix to [3]).
>

One quick update. Enabling write-behind and md-cache with fix for [3]
reduced the total time taken for pgbench init phase roughly by 20%-25%
(from 12.5 min to 9.75 min for a scale of 100). Though this is still a huge
time (around 12hrs for a db of scale 8000). I'll follow up with a detailed
report once my experiments are complete. Currently trying to optimize the
read path.


> For some reason, read-side caching didn't improve transactions per second.
> I am working on this problem currently. Note that these bugs measure
> transaction phase of pgbench, but what xavi measured in his mail is init
> phase. Nevertheless, evaluation of read caching (metadata/data) will still
> be relevant for init phase too.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>
>
>> > We need to analyze client/server xlators deeper to see if we can avoid
>> some delays. However optimizing something that is already at the
>> microsecond level can be very hard.
>>
>> That is true - are there any significant gains which can be accrued by
>> putting efforts here or, should this be a lower priority?
>>
>
> The problem identified by xavi is also the one we (Manoj, Krutika, me and
> Milind) had encountered in the past [4]. The solution we used was to have
> multiple rpc connections between single brick and client. The solution
> indeed fixed the bottleneck. So, there is definitely work involved here -
> either to fix the single connection model or go with multiple connection
> model. Its preferred to improve single connection and resort to multiple
> connections only if bottlenecks in single connection are not fixable.
> Personally I think this is high priority along with having appropriate
> client side caching.
>
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52
>
>
>> > We need to determine what causes the fluctuations in brick side and
>> avoid them.
>> > This scenario is very similar to a smallfile/metadata workload, so this
>> is probably one important cause of its bad performance.
>>
>> What kind of instrumentation is required to enable the determination?
>>
>> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez 
>> wrote:
>> >
>> > Hi,
>> >
>> > I've done some tracing of the latency that network layer introduces in
>> gluster. I've made the analysis as part of the pgbench performance issue
>> (in particulat the initialization and scaling phase), so I decided to look
>> at READV for this particular workload, but I think the results can be
>> extrapolated to other operations that also have small latency (cached data
>> from FS for example).
>> >
>> > Note that measuring latencies introduces some latency. It consists in a
>> call to clock_get_time() for each probe point, so the real latency will be
>> a bit lower, but still proportional to these numbers.
>> >
>>
>> [snip]
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2018-12-27 Thread Raghavendra Gowdappa
On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
> sankarshan.mukhopadh...@gmail.com> wrote:
>
>> [pulling the conclusions up to enable better in-line]
>>
>> > Conclusions:
>> >
>> > We should never have a volume with caching-related xlators disabled.
>> The price we pay for it is too high. We need to make them work consistently
>> and aggressively to avoid as many requests as we can.
>>
>> Are there current issues in terms of behavior which are known/observed
>> when these are enabled?
>>
>
> We did have issues with pgbench in past. But they've have been fixed.
> Please refer to bz [1] for details. On 5.1, it runs successfully with all
> caching related xlators enabled. Having said that the only performance
> xlators which gave improved performance were open-behind and write-behind
> [2] (write-behind had some issues, which will be fixed by [3] and we'll
> have to measure performance again with fix to [3]). For some reason,
> read-side caching didn't improve transactions per second. I am working on
> this problem currently. Note that these bugs measure transaction phase of
> pgbench, but what xavi measured in his mail is init phase. Nevertheless,
> evaluation of read caching (metadata/data) will still be relevant for init
> phase too.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>
>
>> > We need to analyze client/server xlators deeper to see if we can avoid
>> some delays. However optimizing something that is already at the
>> microsecond level can be very hard.
>>
>> That is true - are there any significant gains which can be accrued by
>> putting efforts here or, should this be a lower priority?
>>
>
> The problem identified by xavi is also the one we (Manoj, Krutika, me and
> Milind) had encountered in the past [4]. The solution we used was to have
> multiple rpc connections between single brick and client. The solution
> indeed fixed the bottleneck. So, there is definitely work involved here -
> either to fix the single connection model or go with multiple connection
> model. Its preferred to improve single connection and resort to multiple
> connections only if bottlenecks in single connection are not fixable.
> Personally I think this is high priority along with having appropriate
> client side caching.
>

Having multiple connections between a single brick and client didn't help
for pgbench init phase performance. In fact with more number of connections
performance actually regressed.

[5]  https://review.gluster.org/#/c/glusterfs/+/19133/


> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52
>
>
>> > We need to determine what causes the fluctuations in brick side and
>> avoid them.
>> > This scenario is very similar to a smallfile/metadata workload, so this
>> is probably one important cause of its bad performance.
>>
>> What kind of instrumentation is required to enable the determination?
>>
>> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez 
>> wrote:
>> >
>> > Hi,
>> >
>> > I've done some tracing of the latency that network layer introduces in
>> gluster. I've made the analysis as part of the pgbench performance issue
>> (in particulat the initialization and scaling phase), so I decided to look
>> at READV for this particular workload, but I think the results can be
>> extrapolated to other operations that also have small latency (cached data
>> from FS for example).
>> >
>> > Note that measuring latencies introduces some latency. It consists in a
>> call to clock_get_time() for each probe point, so the real latency will be
>> a bit lower, but still proportional to these numbers.
>> >
>>
>> [snip]
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2018-12-25 Thread Raghavendra Gowdappa
On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa 
wrote:

>
>
> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
> sankarshan.mukhopadh...@gmail.com> wrote:
>
>> [pulling the conclusions up to enable better in-line]
>>
>> > Conclusions:
>> >
>> > We should never have a volume with caching-related xlators disabled.
>> The price we pay for it is too high. We need to make them work consistently
>> and aggressively to avoid as many requests as we can.
>>
>> Are there current issues in terms of behavior which are known/observed
>> when these are enabled?
>>
>
> We did have issues with pgbench in past. But they've have been fixed.
> Please refer to bz [1] for details. On 5.1, it runs successfully with all
> caching related xlators enabled. Having said that the only performance
> xlators which gave improved performance were open-behind and write-behind
> [2] (write-behind had some issues, which will be fixed by [3] and we'll
> have to measure performance again with fix to [3]). For some reason,
> read-side caching didn't improve transactions per second.
>

One possible reason for read-caching in glusterfs didn't show increased
performance can be, VFS already supports read-ahead (of 128KB) and
page-cache. It could be that whatever performance boost that can be
provided with caching is already leveraged at VFS page-cache  itself and
hence making glusterfs caching redundant. I'll run some tests to gather
evidence to (dis)prove this hypothesis.

I am working on this problem currently. Note that these bugs measure
> transaction phase of pgbench, but what xavi measured in his mail is init
> phase. Nevertheless, evaluation of read caching (metadata/data) will still
> be relevant for init phase too.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781
>
>
>> > We need to analyze client/server xlators deeper to see if we can avoid
>> some delays. However optimizing something that is already at the
>> microsecond level can be very hard.
>>
>> That is true - are there any significant gains which can be accrued by
>> putting efforts here or, should this be a lower priority?
>>
>
> The problem identified by xavi is also the one we (Manoj, Krutika, me and
> Milind) had encountered in the past [4]. The solution we used was to have
> multiple rpc connections between single brick and client. The solution
> indeed fixed the bottleneck. So, there is definitely work involved here -
> either to fix the single connection model or go with multiple connection
> model. Its preferred to improve single connection and resort to multiple
> connections only if bottlenecks in single connection are not fixable.
> Personally I think this is high priority along with having appropriate
> client side caching.
>
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52
>
>
>> > We need to determine what causes the fluctuations in brick side and
>> avoid them.
>> > This scenario is very similar to a smallfile/metadata workload, so this
>> is probably one important cause of its bad performance.
>>
>> What kind of instrumentation is required to enable the determination?
>>
>> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez 
>> wrote:
>> >
>> > Hi,
>> >
>> > I've done some tracing of the latency that network layer introduces in
>> gluster. I've made the analysis as part of the pgbench performance issue
>> (in particulat the initialization and scaling phase), so I decided to look
>> at READV for this particular workload, but I think the results can be
>> extrapolated to other operations that also have small latency (cached data
>> from FS for example).
>> >
>> > Note that measuring latencies introduces some latency. It consists in a
>> call to clock_get_time() for each probe point, so the real latency will be
>> a bit lower, but still proportional to these numbers.
>> >
>>
>> [snip]
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench

2018-12-24 Thread Raghavendra Gowdappa
On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> [pulling the conclusions up to enable better in-line]
>
> > Conclusions:
> >
> > We should never have a volume with caching-related xlators disabled. The
> price we pay for it is too high. We need to make them work consistently and
> aggressively to avoid as many requests as we can.
>
> Are there current issues in terms of behavior which are known/observed
> when these are enabled?
>

We did have issues with pgbench in past. But they've have been fixed.
Please refer to bz [1] for details. On 5.1, it runs successfully with all
caching related xlators enabled. Having said that the only performance
xlators which gave improved performance were open-behind and write-behind
[2] (write-behind had some issues, which will be fixed by [3] and we'll
have to measure performance again with fix to [3]). For some reason,
read-side caching didn't improve transactions per second. I am working on
this problem currently. Note that these bugs measure transaction phase of
pgbench, but what xavi measured in his mail is init phase. Nevertheless,
evaluation of read caching (metadata/data) will still be relevant for init
phase too.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781


> > We need to analyze client/server xlators deeper to see if we can avoid
> some delays. However optimizing something that is already at the
> microsecond level can be very hard.
>
> That is true - are there any significant gains which can be accrued by
> putting efforts here or, should this be a lower priority?
>

The problem identified by xavi is also the one we (Manoj, Krutika, me and
Milind) had encountered in the past [4]. The solution we used was to have
multiple rpc connections between single brick and client. The solution
indeed fixed the bottleneck. So, there is definitely work involved here -
either to fix the single connection model or go with multiple connection
model. Its preferred to improve single connection and resort to multiple
connections only if bottlenecks in single connection are not fixable.
Personally I think this is high priority along with having appropriate
client side caching.

[4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52


> > We need to determine what causes the fluctuations in brick side and
> avoid them.
> > This scenario is very similar to a smallfile/metadata workload, so this
> is probably one important cause of its bad performance.
>
> What kind of instrumentation is required to enable the determination?
>
> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez 
> wrote:
> >
> > Hi,
> >
> > I've done some tracing of the latency that network layer introduces in
> gluster. I've made the analysis as part of the pgbench performance issue
> (in particulat the initialization and scaling phase), so I decided to look
> at READV for this particular workload, but I think the results can be
> extrapolated to other operations that also have small latency (cached data
> from FS for example).
> >
> > Note that measuring latencies introduces some latency. It consists in a
> call to clock_get_time() for each probe point, so the real latency will be
> a bit lower, but still proportional to these numbers.
> >
>
> [snip]
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [master][FAILED] brick-mux-regression

2018-12-02 Thread Raghavendra Gowdappa
On Mon, Dec 3, 2018 at 8:25 AM Raghavendra Gowdappa 
wrote:

>
>
> On Sat, Dec 1, 2018 at 11:02 AM Milind Changire 
> wrote:
>
>> failed brick-mux-regression job:
>> https://build.gluster.org/job/regression-on-demand-multiplex/411/console
>>
>> patch:
>> https://review.gluster.org/c/glusterfs/+/21719
>>
>
> Does this happen only with the above patch? Does brick-mux regression
> succeed on current master without this patch? Wondering whether the
> parallelism introduced by bumping up event-threads to 2, is opening up some
> races in multiplexed environment (though there were always more than one
> event-thread when more than one brick is multiplexed).
>

Also, is this bug locally reproducible on your setup if you run test
following test with brick-mux enabled (with and without your patch)?

./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t


>
>>
>> stack trace:
>> $ gdb -ex 'set sysroot ./' -ex 'core-file
>> ./build/install/cores/glfs_epoll000-964.core'
>> ./build/install/sbin/glusterfsd
>> GNU gdb (GDB) Fedora 8.2-4.fc29
>> Copyright (C) 2018 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.
>> Type "show copying" and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>> <http://www.gnu.org/software/gdb/documentation/>.
>>
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from ./build/install/sbin/glusterfsd...done.
>> [New LWP 970]
>> [New LWP 992]
>> [New LWP 993]
>> [New LWP 1005]
>> [New LWP 1241]
>> [New LWP 964]
>> [New LWP 968]
>> [New LWP 996]
>> [New LWP 995]
>> [New LWP 994]
>> [New LWP 967]
>> [New LWP 969]
>> [New LWP 1003]
>> [New LWP 1181]
>> [New LWP 1242]
>> [New LWP 966]
>> [New LWP 965]
>> [New LWP 999]
>> [New LWP 1000]
>> [New LWP 1002]
>> [New LWP 989]
>> [New LWP 990]
>> [New LWP 991]
>> [New LWP 971]
>> warning: Ignoring non-absolute filename: <./lib64/libz.so.1>
>> Missing separate debuginfo for ./lib64/libz.so.1
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/ea/8e45dc8e395cc5e26890470112d97a1f1e0b65.debug
>> warning: Ignoring non-absolute filename: <./lib64/libuuid.so.1>
>> Missing separate debuginfo for ./lib64/libuuid.so.1
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/71/de190dc0c93504abacc17b9747cd772a1e4b0d.debug
>> warning: Ignoring non-absolute filename: <./lib64/libm.so.6>
>> Missing separate debuginfo for ./lib64/libm.so.6
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/f4/cae74047f9aa2d5a71fdec67c4285d75753eba.debug
>> warning: Ignoring non-absolute filename: <./lib64/librt.so.1>
>> Missing separate debuginfo for ./lib64/librt.so.1
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/d3/3989ec31efe745eb0d3b68a92d19e77d7ddfda.debug
>> warning: Ignoring non-absolute filename: <./lib64/libdl.so.2>
>> Missing separate debuginfo for ./lib64/libdl.so.2
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/5c/db5a56336e7e2bd14ffa189411e44a834afcd8.debug
>> warning: Ignoring non-absolute filename: <./lib64/libpthread.so.0>
>> Missing separate debuginfo for ./lib64/libpthread.so.0
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/f4/c04bce85d2d269d0a2af4972fc69805b50345b.debug
>> warning: Expected absolute pathname for libpthread in the inferior, but
>> got ./lib64/libpthread.so.0.
>> warning: Unable to find libthread_db matching inferior's thread library,
>> thread debugging will not be available.
>> warning: Ignoring non-absolute filename: <./lib64/libcrypto.so.10>
>> Missing separate debuginfo for ./lib64/libcrypto.so.10
>> Try: dnf --enablerepo='*debug*' install
>> /usr/lib/debug/.build-id/67/ceb4edd36bfe0eb31cd92da2694aca5377a599.debug
>> warning: Ignoring non-absolute filename: <./lib64/libc.so.6>
>> Missing separate debuginfo for ./lib64/libc.so.

Re: [Gluster-devel] [master][FAILED] brick-mux-regression

2018-12-02 Thread Raghavendra Gowdappa
On Sat, Dec 1, 2018 at 11:02 AM Milind Changire  wrote:

> failed brick-mux-regression job:
> https://build.gluster.org/job/regression-on-demand-multiplex/411/console
>
> patch:
> https://review.gluster.org/c/glusterfs/+/21719
>

Does this happen only with the above patch? Does brick-mux regression
succeed on current master without this patch? Wondering whether the
parallelism introduced by bumping up event-threads to 2, is opening up some
races in multiplexed environment (though there were always more than one
event-thread when more than one brick is multiplexed).


>
> stack trace:
> $ gdb -ex 'set sysroot ./' -ex 'core-file
> ./build/install/cores/glfs_epoll000-964.core'
> ./build/install/sbin/glusterfsd
> GNU gdb (GDB) Fedora 8.2-4.fc29
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./build/install/sbin/glusterfsd...done.
> [New LWP 970]
> [New LWP 992]
> [New LWP 993]
> [New LWP 1005]
> [New LWP 1241]
> [New LWP 964]
> [New LWP 968]
> [New LWP 996]
> [New LWP 995]
> [New LWP 994]
> [New LWP 967]
> [New LWP 969]
> [New LWP 1003]
> [New LWP 1181]
> [New LWP 1242]
> [New LWP 966]
> [New LWP 965]
> [New LWP 999]
> [New LWP 1000]
> [New LWP 1002]
> [New LWP 989]
> [New LWP 990]
> [New LWP 991]
> [New LWP 971]
> warning: Ignoring non-absolute filename: <./lib64/libz.so.1>
> Missing separate debuginfo for ./lib64/libz.so.1
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/ea/8e45dc8e395cc5e26890470112d97a1f1e0b65.debug
> warning: Ignoring non-absolute filename: <./lib64/libuuid.so.1>
> Missing separate debuginfo for ./lib64/libuuid.so.1
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/71/de190dc0c93504abacc17b9747cd772a1e4b0d.debug
> warning: Ignoring non-absolute filename: <./lib64/libm.so.6>
> Missing separate debuginfo for ./lib64/libm.so.6
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/f4/cae74047f9aa2d5a71fdec67c4285d75753eba.debug
> warning: Ignoring non-absolute filename: <./lib64/librt.so.1>
> Missing separate debuginfo for ./lib64/librt.so.1
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/d3/3989ec31efe745eb0d3b68a92d19e77d7ddfda.debug
> warning: Ignoring non-absolute filename: <./lib64/libdl.so.2>
> Missing separate debuginfo for ./lib64/libdl.so.2
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/5c/db5a56336e7e2bd14ffa189411e44a834afcd8.debug
> warning: Ignoring non-absolute filename: <./lib64/libpthread.so.0>
> Missing separate debuginfo for ./lib64/libpthread.so.0
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/f4/c04bce85d2d269d0a2af4972fc69805b50345b.debug
> warning: Expected absolute pathname for libpthread in the inferior, but
> got ./lib64/libpthread.so.0.
> warning: Unable to find libthread_db matching inferior's thread library,
> thread debugging will not be available.
> warning: Ignoring non-absolute filename: <./lib64/libcrypto.so.10>
> Missing separate debuginfo for ./lib64/libcrypto.so.10
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/67/ceb4edd36bfe0eb31cd92da2694aca5377a599.debug
> warning: Ignoring non-absolute filename: <./lib64/libc.so.6>
> Missing separate debuginfo for ./lib64/libc.so.6
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/cb/4b7554d1adbef2f001142dd6f0a5139fc9aa69.debug
> warning: Ignoring non-absolute filename: <./lib64/ld-linux-x86-64.so.2>
> Missing separate debuginfo for ./lib64/ld-linux-x86-64.so.2
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/d2/66b1f6650927e18108323bcca8f7b68e68eb92.debug
> warning: Ignoring non-absolute filename: <./lib64/libssl.so.10>
> Missing separate debuginfo for ./lib64/libssl.so.10
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/64/68a4e28a19cdd885a3cbc30e009589ca4c2e92.debug
> warning: Ignoring non-absolute filename: <./lib64/libgssapi_krb5.so.2>
> Missing separate debuginfo for ./lib64/libgssapi_krb5.so.2
> Try: dnf --enablerepo='*debug*' install
> /usr/lib/debug/.build-id/16/fe0dc6cefc5f444bc876516d02efe9cc2d432f.debug
> warning: Ignoring non-absolute filename: <./lib64/libkrb5.so.3>
> Missing separate debuginfo for ./lib64/libkrb5.so.3
> Try: dnf --enablerepo='*debug*' install
> 

Re: [Gluster-devel] [Gluster-users] On making ctime generator enabled by default in stack

2018-11-05 Thread Raghavendra Gowdappa
On Tue, Nov 6, 2018 at 9:58 AM Vijay Bellur  wrote:

>
>
> On Mon, Nov 5, 2018 at 7:56 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> There is a patch [1] from Kotresh, which makes ctime generator as default
>> in stack. Currently ctime generator is being recommended only for usecases
>> where ctime is important (like for Elasticsearch). However, a reliable
>> (c)(m)time can fix many consistency issues within glusterfs stack too.
>> These are issues with caching layers having stale (meta)data [2][3][4].
>> Basically just like applications, components within glusterfs stack too
>> need a time to find out which among racing ops (like write, stat, etc) has
>> latest (meta)data.
>>
>> Also note that a consistent (c)(m)time is not an optional feature, but
>> instead forms the core of the infrastructure. So, I am proposing to merge
>> this patch. If you've any objections, please voice out before Nov 13, 2018
>> (a week from today).
>>
>> As to the existing known issues/limitations with ctime generator, my
>> conversations with Kotresh, revealed following:
>> * Potential performance degradation (we don't yet have data to
>> conclusively prove it, preliminary basic tests from Kotresh didn't indicate
>> a significant perf drop).
>>
>
> Do we have this data captured somewhere? If not, would it be possible to
> share that data here?
>

I misquoted Kotresh. He had measured impact of gfid2path and said both
features might've similar impact as major perf cost is related to storing
xattrs on backend fs. I am in the process of getting a fresh set of
numbers. Will post those numbers when available.


>
> Thanks,
> Vijay
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] On making ctime generator enabled by default in stack

2018-11-05 Thread Raghavendra Gowdappa
All,

There is a patch [1] from Kotresh, which makes ctime generator as default
in stack. Currently ctime generator is being recommended only for usecases
where ctime is important (like for Elasticsearch). However, a reliable
(c)(m)time can fix many consistency issues within glusterfs stack too.
These are issues with caching layers having stale (meta)data [2][3][4].
Basically just like applications, components within glusterfs stack too
need a time to find out which among racing ops (like write, stat, etc) has
latest (meta)data.

Also note that a consistent (c)(m)time is not an optional feature, but
instead forms the core of the infrastructure. So, I am proposing to merge
this patch. If you've any objections, please voice out before Nov 13, 2018
(a week from today).

As to the existing known issues/limitations with ctime generator, my
conversations with Kotresh, revealed following:
* Potential performance degradation (we don't yet have data to conclusively
prove it, preliminary basic tests from Kotresh didn't indicate a
significant perf drop).
* atime consistency. ctime generator offers atime consistency equivalent to
noatime mounts. But, with my limited experience I've not seen too many
usecases that require atime consistency. If you've a usecase please point
it out and we'll think how we can meet that requirement.

[1] https://review.gluster.org/#/c/glusterfs/+/21060/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600923
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1617972
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1393743

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Release 5: GA and what are we waiting on

2018-10-11 Thread Raghavendra Gowdappa
On Thu, Oct 11, 2018 at 9:16 PM Krutika Dhananjay 
wrote:

>
>
> On Thu, Oct 11, 2018 at 8:55 PM Shyam Ranganathan 
> wrote:
>
>> So we are through with a series of checks and tasks on release-5 (like
>> ensuring all backports to other branches are present in 5, upgrade
>> testing, basic performance testing, Package testing, etc.), but still
>> need the following resolved else we stand to delay the release GA
>> tagging, which I hope to get done over the weekend or by Monday 15th
>> morning (EDT).
>>
>> 1) Fix for libgfapi-python related blocker on Gluster:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1630804
>
>
I'll take a look and respond over the bz.


>
>>
>> @ppai, who needs to look into this?
>>
>> 2) Release notes for options added to the code (see:
>> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055563.html
>> )
>>
>> @du, @krutika can we get some text for the options referred in the mail
>> above?
>>
>
I just responded in the other thread.


>
>>
> Replied here -
> https://lists.gluster.org/pipermail/gluster-devel/2018-October/055577.html
>
> -Krutika
>
> 3) Python3 testing
>> - Heard back from Kotresh on geo-rep passing and saw that we have
>> handled cliutils issues
>> - Anything more to cover? (@aravinda, @kotresh, @ppai?)
>> - We are attempting to get a regression run on a Python3 platform, but
>> that maybe a little ways away from the release (see:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1638030 )
>>
>> Request attention to the above, to ensure we are not breaking things
>> with the release.
>>
>> Thanks,
>> Shyam
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Glusterfs and Structured data

2018-10-07 Thread Raghavendra Gowdappa
+Gluster-users 

On Mon, Oct 8, 2018 at 9:34 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Feb 9, 2018 at 4:30 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> - Original Message -
>> > From: "Pranith Kumar Karampuri" 
>> > To: "Raghavendra G" 
>> > Cc: "Gluster Devel" 
>> > Sent: Friday, February 9, 2018 2:30:59 PM
>> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
>> >
>> >
>> >
>> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G <
>> raghaven...@gluster.com >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com >
>> > wrote:
>> >
>> >
>> > All,
>> >
>> > One of our users pointed out to the documentation that glusterfs is not
>> good
>> > for storing "Structured data" [1], while discussing an issue [2].
>> >
>> >
>> > As far as I remember, the content around structured data in the Install
>> Guide
>> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
>> > startup's market positioning. Most of that was based on not wanting to
>> get
>> > into performance based comparisons of storage systems that are
>> frequently
>> > seen in the structured data space.
>> >
>> >
>> > Does any of you have more context on the feasibility of storing
>> "structured
>> > data" on Glusterfs? Is one of the reasons for such a suggestion
>> "staleness
>> > of metadata" as encountered in bugs like [3]?
>> >
>> >
>> > There are challenges that distributed storage systems face when exposed
>> to
>> > applications that were written for a local filesystem interface. We have
>> > encountered problems with applications like tar [4] that are not in the
>> > realm of "Structured data". If we look at the common theme across all
>> these
>> > problems, it is related to metadata & read after write consistency
>> issues
>> > with the default translator stack that gets exposed on the client side.
>> > While the default stack is optimal for other scenarios, it does seem
>> that a
>> > category of applications needing strict metadata consistency is not well
>> > served by that. We have observed that disabling a few performance
>> > translators and tuning cache timeouts for VFS/FUSE have helped to
>> overcome
>> > some of them. The WIP effort on timestamp consistency across the
>> translator
>> > stack, patches that have been merged as a result of the bugs that you
>> > mention & other fixes for outstanding issues should certainly help in
>> > catering to these workloads better with the file interface.
>> >
>> > There are deployments that I have come across where glusterfs is used
>> for
>> > storing structured data. gluster-block & qemu-libgfapi overcome the
>> metadata
>> > consistency problem by exposing a file as a block device & by disabling
>> most
>> > of the performance translators in the default stack. Workloads that have
>> > been deemed problematic with the file interface for the reasons alluded
>> > above, function well with the block interface.
>> >
>> > I agree that gluster-block due to its usage of a subset of glusterfs
>> fops
>> > (mostly reads/writes I guess), runs into less number of consistency
>> issues.
>> > However, as you've mentioned we seem to disable perf xlator stack in our
>> > tests/use-cases till now. Note that perf xlator stack is one of worst
>> > offenders as far as the metadata consistency is concerned (relatively
>> less
>> > scenarios of data inconsistency). So, I wonder,
>> > * what would be the scenario if we enable perf xlator stack for
>> > gluster-block?
>> > * Is performance on gluster-block satisfactory so that we don't need
>> these
>> > xlators?
>> > - Or is it that these xlators are not useful for the workload usually
>> run on
>> > gluster-block (For random read/write workload, read/write caching
>> xlators
>> > offer less or no advantage)?
>> >
>> > Yes. They are not useful. Block/VM files are opened with

Re: [Gluster-devel] Glusterfs and Structured data

2018-10-07 Thread Raghavendra Gowdappa
On Fri, Feb 9, 2018 at 4:30 PM Raghavendra Gowdappa 
wrote:

>
>
> - Original Message -
> > From: "Pranith Kumar Karampuri" 
> > To: "Raghavendra G" 
> > Cc: "Gluster Devel" 
> > Sent: Friday, February 9, 2018 2:30:59 PM
> > Subject: Re: [Gluster-devel] Glusterfs and Structured data
> >
> >
> >
> > On Thu, Feb 8, 2018 at 12:05 PM, Raghavendra G < raghaven...@gluster.com
> >
> > wrote:
> >
> >
> >
> >
> >
> > On Tue, Feb 6, 2018 at 8:15 PM, Vijay Bellur < vbel...@redhat.com >
> wrote:
> >
> >
> >
> >
> >
> > On Sun, Feb 4, 2018 at 3:39 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com >
> > wrote:
> >
> >
> > All,
> >
> > One of our users pointed out to the documentation that glusterfs is not
> good
> > for storing "Structured data" [1], while discussing an issue [2].
> >
> >
> > As far as I remember, the content around structured data in the Install
> Guide
> > is from a FAQ that was being circulated in Gluster, Inc. indicating the
> > startup's market positioning. Most of that was based on not wanting to
> get
> > into performance based comparisons of storage systems that are frequently
> > seen in the structured data space.
> >
> >
> > Does any of you have more context on the feasibility of storing
> "structured
> > data" on Glusterfs? Is one of the reasons for such a suggestion
> "staleness
> > of metadata" as encountered in bugs like [3]?
> >
> >
> > There are challenges that distributed storage systems face when exposed
> to
> > applications that were written for a local filesystem interface. We have
> > encountered problems with applications like tar [4] that are not in the
> > realm of "Structured data". If we look at the common theme across all
> these
> > problems, it is related to metadata & read after write consistency issues
> > with the default translator stack that gets exposed on the client side.
> > While the default stack is optimal for other scenarios, it does seem
> that a
> > category of applications needing strict metadata consistency is not well
> > served by that. We have observed that disabling a few performance
> > translators and tuning cache timeouts for VFS/FUSE have helped to
> overcome
> > some of them. The WIP effort on timestamp consistency across the
> translator
> > stack, patches that have been merged as a result of the bugs that you
> > mention & other fixes for outstanding issues should certainly help in
> > catering to these workloads better with the file interface.
> >
> > There are deployments that I have come across where glusterfs is used for
> > storing structured data. gluster-block & qemu-libgfapi overcome the
> metadata
> > consistency problem by exposing a file as a block device & by disabling
> most
> > of the performance translators in the default stack. Workloads that have
> > been deemed problematic with the file interface for the reasons alluded
> > above, function well with the block interface.
> >
> > I agree that gluster-block due to its usage of a subset of glusterfs fops
> > (mostly reads/writes I guess), runs into less number of consistency
> issues.
> > However, as you've mentioned we seem to disable perf xlator stack in our
> > tests/use-cases till now. Note that perf xlator stack is one of worst
> > offenders as far as the metadata consistency is concerned (relatively
> less
> > scenarios of data inconsistency). So, I wonder,
> > * what would be the scenario if we enable perf xlator stack for
> > gluster-block?
> > * Is performance on gluster-block satisfactory so that we don't need
> these
> > xlators?
> > - Or is it that these xlators are not useful for the workload usually
> run on
> > gluster-block (For random read/write workload, read/write caching xlators
> > offer less or no advantage)?
> >
> > Yes. They are not useful. Block/VM files are opened with O_DIRECT, so we
> > don't enable caching at any layer in glusterfs. md-cache could be useful
> for
> > serving fstat from glusterfs. But apart from that I don't see any other
> > xlator contributing much.
> >
> >
> >
> > - Or theoretically the workload is ought to benefit from perf xlators,
> but we
> > don't see them in our results (there are open bugs to this effect)?
> >
> > I am asking these questions to ascertain priority on fixing perf xlators
> for
> 

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-05 Thread Raghavendra Gowdappa
What is the agreed upon clang version for Glusterfs project? Is it clang-6?

On Fri, Oct 5, 2018 at 1:58 PM Raghavendra Gowdappa 
wrote:

> clang-4.0.1 pushes patch, but still doesn't understand some keys in
> clang-format.
>
> [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
> [detached HEAD 401a7b6] cluster/dht: clang-format dht-common.c
>  1 file changed, 10674 insertions(+), 11166 deletions(-)
>  rewrite xlators/cluster/dht/src/dht-common.c (88%)
> [detached HEAD 2bf5031] cluster/dht: fixes to unlinking invalid linkto file
>  1 file changed, 236 insertions(+), 268 deletions(-)
> Successfully rebased and updated refs/heads/1635145.
> YAML:35:23: error: unknown key 'SplitEmptyFunction'
>   SplitEmptyFunction: true
>   ^~~~
>
>
> On Fri, Oct 5, 2018 at 10:47 AM Raghavendra Gowdappa 
> wrote:
>
>> We should document (better still add checks in rfc.sh and warn user to
>> upgrade) that we need clang version x or greater.
>>
>> On Fri, Oct 5, 2018 at 10:45 AM Sachidananda URS  wrote:
>>
>>>
>>>
>>> On Fri, Oct 5, 2018 at 10:41 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>> General options:
>>>>
>>>>   -help - Display available options (-help-hidden
>>>> for more)
>>>>   -help-list- Display list of available options
>>>> (-help-list-hidden for more)
>>>>   -version  - Display the version of this program
>>>> [rgowdapp@rgowdapp ~]$ clang-format -version ; echo $?
>>>> LLVM (http://llvm.org/):
>>>>   LLVM version 3.4.2
>>>>   Optimized build.
>>>>   Built Dec  7 2015 (09:37:36).
>>>>   Default target: x86_64-redhat-linux-gnu
>>>>   Host CPU: x86-64
>>>> 1
>>>>
>>>>
>>> It is a bug then, which they've fixed later on. Upgrade is the only
>>> choice.
>>>
>>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-05 Thread Raghavendra Gowdappa
clang-4.0.1 pushes patch, but still doesn't understand some keys in
clang-format.

[rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
[detached HEAD 401a7b6] cluster/dht: clang-format dht-common.c
 1 file changed, 10674 insertions(+), 11166 deletions(-)
 rewrite xlators/cluster/dht/src/dht-common.c (88%)
[detached HEAD 2bf5031] cluster/dht: fixes to unlinking invalid linkto file
 1 file changed, 236 insertions(+), 268 deletions(-)
Successfully rebased and updated refs/heads/1635145.
YAML:35:23: error: unknown key 'SplitEmptyFunction'
  SplitEmptyFunction: true
  ^~~~


On Fri, Oct 5, 2018 at 10:47 AM Raghavendra Gowdappa 
wrote:

> We should document (better still add checks in rfc.sh and warn user to
> upgrade) that we need clang version x or greater.
>
> On Fri, Oct 5, 2018 at 10:45 AM Sachidananda URS  wrote:
>
>>
>>
>> On Fri, Oct 5, 2018 at 10:41 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> General options:
>>>
>>>   -help - Display available options (-help-hidden
>>> for more)
>>>   -help-list- Display list of available options
>>> (-help-list-hidden for more)
>>>   -version  - Display the version of this program
>>> [rgowdapp@rgowdapp ~]$ clang-format -version ; echo $?
>>> LLVM (http://llvm.org/):
>>>   LLVM version 3.4.2
>>>   Optimized build.
>>>   Built Dec  7 2015 (09:37:36).
>>>   Default target: x86_64-redhat-linux-gnu
>>>   Host CPU: x86-64
>>> 1
>>>
>>>
>> It is a bug then, which they've fixed later on. Upgrade is the only
>> choice.
>>
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
We should document (better still add checks in rfc.sh and warn user to
upgrade) that we need clang version x or greater.

On Fri, Oct 5, 2018 at 10:45 AM Sachidananda URS  wrote:

>
>
> On Fri, Oct 5, 2018 at 10:41 AM, Raghavendra Gowdappa  > wrote:
>
>> General options:
>>
>>   -help - Display available options (-help-hidden for
>> more)
>>   -help-list- Display list of available options
>> (-help-list-hidden for more)
>>   -version  - Display the version of this program
>> [rgowdapp@rgowdapp ~]$ clang-format -version ; echo $?
>> LLVM (http://llvm.org/):
>>   LLVM version 3.4.2
>>   Optimized build.
>>   Built Dec  7 2015 (09:37:36).
>>   Default target: x86_64-redhat-linux-gnu
>>   Host CPU: x86-64
>> 1
>>
>>
> It is a bug then, which they've fixed later on. Upgrade is the only choice.
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
General options:

  -help - Display available options (-help-hidden for
more)
  -help-list- Display list of available options
(-help-list-hidden for more)
  -version  - Display the version of this program
[rgowdapp@rgowdapp ~]$ clang-format -version ; echo $?
LLVM (http://llvm.org/):
  LLVM version 3.4.2
  Optimized build.
  Built Dec  7 2015 (09:37:36).
  Default target: x86_64-redhat-linux-gnu
  Host CPU: x86-64
1


On Fri, Oct 5, 2018 at 10:37 AM Sachidananda URS  wrote:

>
>
>
>  [rgowdapp@rgowdapp glusterfs]$ clang-format --version ; echo $?
>> LLVM (http://llvm.org/):
>>   LLVM version 3.4.2
>>   Optimized build.
>>   Built Dec  7 2015 (09:37:36).
>>   Default target: x86_64-redhat-linux-gnu
>>   Host CPU: x86-64
>> 1
>>
>> Wonder why clang-format --version has to return non-zero return code
>> though.
>>
>
> Maybe because the syntax is `clang-format -version' not --version.
> In the newer releases both -version and --version return 0.
> You can try -version, if it returns 0. We can fix `rfc.sh'
>
> But, what they document is -version.
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
On Fri, Oct 5, 2018 at 9:58 AM Sachidananda URS  wrote:

>
>
> On Fri, Oct 5, 2018 at 9:45 AM, Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Fri, Oct 5, 2018 at 9:34 AM Raghavendra Gowdappa 
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 5, 2018 at 9:11 AM Kaushal M  wrote:
>>>
>>>> On Fri, Oct 5, 2018 at 9:05 AM Raghavendra Gowdappa <
>>>> rgowd...@redhat.com> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Oct 5, 2018 at 8:53 AM Amar Tumballi 
>>>> wrote:
>>>> >>
>>>> >> Can you try below diff in your rfc, and let me know if it works?
>>>> >
>>>> >
>>>> > No. it didn't. I see the same error.
>>>> >  [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
>>>> > + rebase_changes
>>>> > + GIT_EDITOR=./rfc.sh
>>>> > + git rebase -i origin/master
>>>> > [detached HEAD e50667e] cluster/dht: clang-format dht-common.c
>>>> >  1 file changed, 10674 insertions(+), 11166 deletions(-)
>>>> >  rewrite xlators/cluster/dht/src/dht-common.c (88%)
>>>> > [detached HEAD 0734847] cluster/dht: fixes to unlinking invalid
>>>> linkto file
>>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> > [detached HEAD 7aeba07] rfc.sh: test - DO NOT MERGE
>>>> >  1 file changed, 8 insertions(+), 3 deletions(-)
>>>> > Successfully rebased and updated refs/heads/1635145.
>>>> > + check_backport
>>>> > + moveon=N
>>>> > + '[' master = master ']'
>>>> > + return
>>>> > + assert_diverge
>>>> > + git diff origin/master..HEAD
>>>> > + grep -q .
>>>> > ++ git log -n1 --format=%b
>>>> > ++ grep -ow -E
>>>> '([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
>>>> > ++ awk -F '#' '{print $2}'
>>>> > + reference=1635145
>>>> > + '[' -z 1635145 ']'
>>>> > ++ clang-format --version
>>>> > + clang_format='LLVM (http://llvm.org/):
>>>> >   LLVM version 3.4.2
>>>> >   Optimized build.
>>>> >   Built Dec  7 2015 (09:37:36).
>>>> >   Default target: x86_64-redhat-linux-gnu
>>>> >   Host CPU: x86-64'
>>>>
>>>> This is a pretty old version of clang. Maybe this is the problem?
>>>>
>>>
>>> Yes. That's what I suspected too. Trying to get repos for the upgrade.
>>>
>>
>> But, what's surprising is that script exits.
>>
>
> What is the return code of clang-format? If it is non-zero then script
> will silently exit because that is what
> it is told to do.
>
> `#!/bin/sh -e' means exit on error.
>

You are right :).

 [rgowdapp@rgowdapp glusterfs]$ clang-format --version ; echo $?
LLVM (http://llvm.org/):
  LLVM version 3.4.2
  Optimized build.
  Built Dec  7 2015 (09:37:36).
  Default target: x86_64-redhat-linux-gnu
  Host CPU: x86-64
1

Wonder why clang-format --version has to return non-zero return code though.


> -sac
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
On Fri, Oct 5, 2018 at 9:34 AM Raghavendra Gowdappa 
wrote:

>
>
> On Fri, Oct 5, 2018 at 9:11 AM Kaushal M  wrote:
>
>> On Fri, Oct 5, 2018 at 9:05 AM Raghavendra Gowdappa 
>> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 5, 2018 at 8:53 AM Amar Tumballi 
>> wrote:
>> >>
>> >> Can you try below diff in your rfc, and let me know if it works?
>> >
>> >
>> > No. it didn't. I see the same error.
>> >  [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
>> > + rebase_changes
>> > + GIT_EDITOR=./rfc.sh
>> > + git rebase -i origin/master
>> > [detached HEAD e50667e] cluster/dht: clang-format dht-common.c
>> >  1 file changed, 10674 insertions(+), 11166 deletions(-)
>> >  rewrite xlators/cluster/dht/src/dht-common.c (88%)
>> > [detached HEAD 0734847] cluster/dht: fixes to unlinking invalid linkto
>> file
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > [detached HEAD 7aeba07] rfc.sh: test - DO NOT MERGE
>> >  1 file changed, 8 insertions(+), 3 deletions(-)
>> > Successfully rebased and updated refs/heads/1635145.
>> > + check_backport
>> > + moveon=N
>> > + '[' master = master ']'
>> > + return
>> > + assert_diverge
>> > + git diff origin/master..HEAD
>> > + grep -q .
>> > ++ git log -n1 --format=%b
>> > ++ grep -ow -E
>> '([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
>> > ++ awk -F '#' '{print $2}'
>> > + reference=1635145
>> > + '[' -z 1635145 ']'
>> > ++ clang-format --version
>> > + clang_format='LLVM (http://llvm.org/):
>> >   LLVM version 3.4.2
>> >   Optimized build.
>> >   Built Dec  7 2015 (09:37:36).
>> >   Default target: x86_64-redhat-linux-gnu
>> >   Host CPU: x86-64'
>>
>> This is a pretty old version of clang. Maybe this is the problem?
>>
>
> Yes. That's what I suspected too. Trying to get repos for the upgrade.
>

But, what's surprising is that script exits.


>
>
>>
>> >
>> >>
>> >> ```
>> >>>
>> >>> diff --git a/rfc.sh b/rfc.sh
>> >>> index 607fd7528f..4ffef26ca1 100755
>> >>> --- a/rfc.sh
>> >>> +++ b/rfc.sh
>> >>> @@ -321,21 +321,21 @@ main()
>> >>>  fi
>> >>>
>> >>>  # TODO: add clang-format command here. It will after the changes
>> are done everywhere else
>> >>> +set +e
>> >>>  clang_format=$(clang-format --version)
>> >>>  if [ ! -z "${clang_format}" ]; then
>> >>>  # Considering git show may not give any files as output
>> matching the
>> >>>  # criteria, good to tell script not to fail on error
>> >>> -set +e
>> >>>  list_of_files=$(git show --pretty="format:" --name-only |
>> >>>  grep -v "contrib/" | egrep --color=never
>> "*\.[ch]$");
>> >>>  if [ ! -z "${list_of_files}" ]; then
>> >>>  echo "${list_of_files}" | xargs clang-format -i
>> >>>  fi
>> >>> -set -e
>> >>>  else
>> >>>  echo "High probability of your patch not passing smoke due
>> to coding standard check"
>> >>>  echo "Please install 'clang-format' to format the patch
>> before submitting"
>> >>>  fi
>> >>> +set -e
>> >>>
>> >>>  if [ "$DRY_RUN" = 1 ]; then
>> >>>  drier='echo -e Please use the following command to send your
>> commits to review:\n\n'
>> >>
>> >> ```
>> >> -Amar
>> >>
>> >> On Fri, Oct 5, 2018 at 8:09 AM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>> >>>
>> >>> All,
>> >>>
>> >>> [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
>> >>> + rebase_changes
>> >>> + GIT_EDITOR=./rfc.sh
>> >>> + git rebase -i origin/master
>> >>> [detached HEAD 34fabdd] cluster/dht: clang-format dht-common.c
>> >>>  1 file changed, 10674 insertions(+), 11166 deletions(-)
>> >>>  rewrite xlators/cluster/dht/src/dht-common.c (88%)
>> >>> [detached HEAD 4bbcbf9] cluster/dht

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
On Fri, Oct 5, 2018 at 9:11 AM Kaushal M  wrote:

> On Fri, Oct 5, 2018 at 9:05 AM Raghavendra Gowdappa 
> wrote:
> >
> >
> >
> > On Fri, Oct 5, 2018 at 8:53 AM Amar Tumballi 
> wrote:
> >>
> >> Can you try below diff in your rfc, and let me know if it works?
> >
> >
> > No. it didn't. I see the same error.
> >  [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
> > + rebase_changes
> > + GIT_EDITOR=./rfc.sh
> > + git rebase -i origin/master
> > [detached HEAD e50667e] cluster/dht: clang-format dht-common.c
> >  1 file changed, 10674 insertions(+), 11166 deletions(-)
> >  rewrite xlators/cluster/dht/src/dht-common.c (88%)
> > [detached HEAD 0734847] cluster/dht: fixes to unlinking invalid linkto
> file
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > [detached HEAD 7aeba07] rfc.sh: test - DO NOT MERGE
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > Successfully rebased and updated refs/heads/1635145.
> > + check_backport
> > + moveon=N
> > + '[' master = master ']'
> > + return
> > + assert_diverge
> > + git diff origin/master..HEAD
> > + grep -q .
> > ++ git log -n1 --format=%b
> > ++ grep -ow -E
> '([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
> > ++ awk -F '#' '{print $2}'
> > + reference=1635145
> > + '[' -z 1635145 ']'
> > ++ clang-format --version
> > + clang_format='LLVM (http://llvm.org/):
> >   LLVM version 3.4.2
> >   Optimized build.
> >   Built Dec  7 2015 (09:37:36).
> >   Default target: x86_64-redhat-linux-gnu
> >   Host CPU: x86-64'
>
> This is a pretty old version of clang. Maybe this is the problem?
>

Yes. That's what I suspected too. Trying to get repos for the upgrade.


>
> >
> >>
> >> ```
> >>>
> >>> diff --git a/rfc.sh b/rfc.sh
> >>> index 607fd7528f..4ffef26ca1 100755
> >>> --- a/rfc.sh
> >>> +++ b/rfc.sh
> >>> @@ -321,21 +321,21 @@ main()
> >>>  fi
> >>>
> >>>  # TODO: add clang-format command here. It will after the changes
> are done everywhere else
> >>> +set +e
> >>>  clang_format=$(clang-format --version)
> >>>  if [ ! -z "${clang_format}" ]; then
> >>>  # Considering git show may not give any files as output
> matching the
> >>>  # criteria, good to tell script not to fail on error
> >>> -set +e
> >>>  list_of_files=$(git show --pretty="format:" --name-only |
> >>>  grep -v "contrib/" | egrep --color=never
> "*\.[ch]$");
> >>>  if [ ! -z "${list_of_files}" ]; then
> >>>  echo "${list_of_files}" | xargs clang-format -i
> >>>  fi
> >>> -set -e
> >>>  else
> >>>  echo "High probability of your patch not passing smoke due to
> coding standard check"
> >>>  echo "Please install 'clang-format' to format the patch
> before submitting"
> >>>  fi
> >>> +set -e
> >>>
> >>>  if [ "$DRY_RUN" = 1 ]; then
> >>>  drier='echo -e Please use the following command to send your
> commits to review:\n\n'
> >>
> >> ```
> >> -Amar
> >>
> >> On Fri, Oct 5, 2018 at 8:09 AM Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
> >>>
> >>> All,
> >>>
> >>> [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
> >>> + rebase_changes
> >>> + GIT_EDITOR=./rfc.sh
> >>> + git rebase -i origin/master
> >>> [detached HEAD 34fabdd] cluster/dht: clang-format dht-common.c
> >>>  1 file changed, 10674 insertions(+), 11166 deletions(-)
> >>>  rewrite xlators/cluster/dht/src/dht-common.c (88%)
> >>> [detached HEAD 4bbcbf9] cluster/dht: fixes to unlinking invalid linkto
> file
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>> [detached HEAD c5583ea] rfc.sh: test - DO NOT MERGE
> >>>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>> Successfully rebased and updated refs/heads/1635145.
> >>> + check_backport
> >>> + moveon=N
> >>> + '[' master = master ']'
> >>> + return
> >>> + assert_diverge
> >>> + git diff origin/master..HEAD
> >>> 

Re: [Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
On Fri, Oct 5, 2018 at 8:53 AM Amar Tumballi  wrote:

> Can you try below diff in your rfc, and let me know if it works?
>

No. it didn't. I see the same error.
 [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
+ rebase_changes
+ GIT_EDITOR=./rfc.sh
+ git rebase -i origin/master
[detached HEAD e50667e] cluster/dht: clang-format dht-common.c
 1 file changed, 10674 insertions(+), 11166 deletions(-)
 rewrite xlators/cluster/dht/src/dht-common.c (88%)
[detached HEAD 0734847] cluster/dht: fixes to unlinking invalid linkto file
 1 file changed, 1 insertion(+), 1 deletion(-)
[detached HEAD 7aeba07] rfc.sh: test - DO NOT MERGE
 1 file changed, 8 insertions(+), 3 deletions(-)
Successfully rebased and updated refs/heads/1635145.
+ check_backport
+ moveon=N
+ '[' master = master ']'
+ return
+ assert_diverge
+ git diff origin/master..HEAD
+ grep -q .
++ git log -n1 --format=%b
++ grep -ow -E
'([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
++ awk -F '#' '{print $2}'
+ reference=1635145
+ '[' -z 1635145 ']'
++ clang-format --version
+ clang_format='LLVM (http://llvm.org/):
  LLVM version 3.4.2
  Optimized build.
  Built Dec  7 2015 (09:37:36).
  Default target: x86_64-redhat-linux-gnu
  Host CPU: x86-64'


> ```
>
>> diff --git a/rfc.sh b/rfc.sh
>> index 607fd7528f..4ffef26ca1 100755
>> --- a/rfc.sh
>> +++ b/rfc.sh
>> @@ -321,21 +321,21 @@ main()
>>  fi
>>
>>  # TODO: add clang-format command here. It will after the changes are
>> done everywhere else
>> +set +e
>>  clang_format=$(clang-format --version)
>>  if [ ! -z "${clang_format}" ]; then
>>  # Considering git show may not give any files as output matching
>> the
>>  # criteria, good to tell script not to fail on error
>> -set +e
>>  list_of_files=$(git show --pretty="format:" --name-only |
>>  grep -v "contrib/" | egrep --color=never
>> "*\.[ch]$");
>>  if [ ! -z "${list_of_files}" ]; then
>>  echo "${list_of_files}" | xargs clang-format -i
>>  fi
>> -set -e
>>  else
>>  echo "High probability of your patch not passing smoke due to
>> coding standard check"
>>  echo "Please install 'clang-format' to format the patch before
>> submitting"
>>  fi
>> +set -e
>>
>>  if [ "$DRY_RUN" = 1 ]; then
>>  drier='echo -e Please use the following command to send your
>> commits to review:\n\n'
>
> ```
> -Amar
>
> On Fri, Oct 5, 2018 at 8:09 AM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> [rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
>> + rebase_changes
>> + GIT_EDITOR=./rfc.sh
>> + git rebase -i origin/master
>> [detached HEAD 34fabdd] cluster/dht: clang-format dht-common.c
>>  1 file changed, 10674 insertions(+), 11166 deletions(-)
>>  rewrite xlators/cluster/dht/src/dht-common.c (88%)
>> [detached HEAD 4bbcbf9] cluster/dht: fixes to unlinking invalid linkto
>> file
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> [detached HEAD c5583ea] rfc.sh: test - DO NOT MERGE
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>> Successfully rebased and updated refs/heads/1635145.
>> + check_backport
>> + moveon=N
>> + '[' master = master ']'
>> + return
>> + assert_diverge
>> + git diff origin/master..HEAD
>> + grep -q .
>> ++ git log -n1 --format=%b
>> ++ grep -ow -E
>> '([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
>> ++ awk -F '#' '{print $2}'
>> + reference=1635145
>> + '[' -z 1635145 ']'
>> ++ clang-format --version
>> + clang_format='LLVM (http://llvm.org/):
>>   LLVM version 3.4.2
>>   Optimized build.
>>   Built Dec  7 2015 (09:37:36).
>>   Default target: x86_64-redhat-linux-gnu
>>   Host CPU: x86-64'
>>
>> Looks like the script is exiting right after it completes clang-format
>> --version. Nothing after that statement gets executed (did it crash? I
>> don't see any cores). Any help is appreciated
>>
>> regards,
>> Raghavendra
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] ./rfc.sh not pushing patch to gerrit

2018-10-04 Thread Raghavendra Gowdappa
All,

[rgowdapp@rgowdapp glusterfs]$ ./rfc.sh
+ rebase_changes
+ GIT_EDITOR=./rfc.sh
+ git rebase -i origin/master
[detached HEAD 34fabdd] cluster/dht: clang-format dht-common.c
 1 file changed, 10674 insertions(+), 11166 deletions(-)
 rewrite xlators/cluster/dht/src/dht-common.c (88%)
[detached HEAD 4bbcbf9] cluster/dht: fixes to unlinking invalid linkto file
 1 file changed, 1 insertion(+), 1 deletion(-)
[detached HEAD c5583ea] rfc.sh: test - DO NOT MERGE
 1 file changed, 8 insertions(+), 3 deletions(-)
Successfully rebased and updated refs/heads/1635145.
+ check_backport
+ moveon=N
+ '[' master = master ']'
+ return
+ assert_diverge
+ git diff origin/master..HEAD
+ grep -q .
++ git log -n1 --format=%b
++ grep -ow -E
'([fF][iI][xX][eE][sS]|[uU][pP][dD][aA][tT][eE][sS])(:)?[[:space:]]+(gluster\/glusterfs)?(bz)?#[[:digit:]]+'
++ awk -F '#' '{print $2}'
+ reference=1635145
+ '[' -z 1635145 ']'
++ clang-format --version
+ clang_format='LLVM (http://llvm.org/):
  LLVM version 3.4.2
  Optimized build.
  Built Dec  7 2015 (09:37:36).
  Default target: x86_64-redhat-linux-gnu
  Host CPU: x86-64'

Looks like the script is exiting right after it completes clang-format
--version. Nothing after that statement gets executed (did it crash? I
don't see any cores). Any help is appreciated

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Update of work on fixing POSIX compliance issues in Glusterfs

2018-10-01 Thread Raghavendra Gowdappa
All,

There have been issues related to POSIX compliance especially while running
Database workloads  on
Glusterfs. Recently we've worked on fixing some of them. This mail is an
update on that effort.

The issues themselves can be classfied into following categories:

   - rename atomicity. When rename (src, dst) is done with dst already
   present, at no point in time access to dst (like open, stat, chmod etc)
   should fail. However, since the rename itself changes the association of
   dst-path from dst-inode to src-inode, inode based operations like open,
   stat etc that have already completed resolution of dst-path  into dst-inode
   will end up not finding the dst-inode after rename causing them to fail.
   However VFS provides a workaround for this by doing the resolution of path
   once again provided operations fail with ESTALE. There were some issues
   associated with this:
  - Glusterfs in some codepaths returned ENOENT even when the operation
  is on an inode and hence VFS didn't retry the resolution. Much of the
  discussion around this topic can be found at this mail thread
  . This
  issue has been
  
   fixed
  
   by
  
various
  patches
  - VFS retries exactly once. So, when retry fails with ESTALE, VFS
  gives up and syscalls like open are failed. We've hit this class
of issues
  in bugs like these
  . The current
  understanding is real world workloads won't hit this race and hence one
  retry mechanism is enough. NFS relies on the same mechanism of
VFS and NFS
  developers say they've not hit bugs of this kind in real workloads.
  - DHT in rename codepaths acquires locks on src and dst inodes. If a
  parallel rename overwrote dst-inode, this locking fails and rename
  operation used to fail. The issue is tracked and fixed as part of this
  bug .
  - Quorum imposition by afr in open fop. afr imposes Quorum on fd
   based operations, but not on open. This means operations can fail on a
   valid fd due to lack of Quorum. Not fixed yet and is tracked on this bug
   .
   - Operations on a valid fd failing after the file was deleted by
   rename/unlink.
  - Fuse-bridge used to randomly pick fds in fstat codepath as earlier
  versions of fuse api didn't provide filehandle as argument of Getattr
  request. This resulted in fstat failures when the file was deleted either
  through rename/unlink after it has been successfully opened.
This is fixed
  in this patch
  
and
  this patch
  
  .
  - performance/open-behind fakes an open call. Due to bugs in
  rename/unlink codepath, it couldn't open file before the file was deleted
  due to rename or unlink. Fixed by this patch
  
   - Stale (meta)data cached by various performance xlators
   - md-cache used to cache stale fstat. Fixed by this patch
  
  .
  - write-behind did not provide correct stat in rename cbk when writes
  on src were cached in write-behind. Fixed by this patch
  
  .
  - write-behind did not provide correct stat in readdirp response.
  Fixed by this patch
  
  - Ordering of operations done on different fds by write-behind. It
  considered operations on different fds as independent. So an fstat done
  after a write is complete when both operations are on different
fds, didn't
  fetch stat that reflected the write operation. This is fixed by this
  patch
  
  - readdir-ahead used to provide stale stat. The issue is fixed by
  this patch
  
  - Most of the caching xlators rely on ctime/mtime of stat to find out
  whether the current (meta)data is newer/stale than the cached (meta)data.
  However ctime/mtime provided by replica/afr is not always
consistent as it
  can pick stat from any of its subvolumes. This issue can be
solved once ctime
  generatior 

Re: [Gluster-devel] On making performance.parallel-readdir as a default option

2018-09-21 Thread Raghavendra Gowdappa
On Fri, Sep 21, 2018 at 11:25 PM Raghavendra Gowdappa 
wrote:

> Hi all,
>
> We've a feature performance.parallel-readdir [1] that is known to improve
> performance of readdir operations [2][3][4]. The option is especially
> useful when distribute scale is relatively large (>10) and is known to
> improve performance of readdir operations even on smaller scale of
> distribute count 1 [4].
>
> However, this option is not enabled by default. I am here proposing to
> make this as a default feature.
>
> But, there are some important things to be addressed in readdir-ahead
> (which is core part of parallel-readdir), before we can do so:
>
> To summarize issues with readdir-ahead:
> * There seems to be one prominent problem of missing dentries with
> parallel-readdir. There was one problem discussed on tech-list just
> yesterday. I've heard about this recurrently earlier too. Not sure whether
> this is the problem of missing unlink/rmdir/create etc fops (see below) in
> readdir-ahead. ATM, no RCA.
> * fixes to maintain stat-consistency in dentries pre-fetched have not made
> into downstream yet (though merged upstream [5]).
> * readdir-ahead doesn't implement directory modification fops like
> rmdir/create/symlink/link/unlink/rename. This means cache won't be updated
> wiith newer content, even on single mount till its consumed by application
> or purged.
> * dht linkto-files should store relative positions of subvolumes instead
> of absolute subvolume name, so that changes to immediate child won't render
> them stale.
> * Features parallel-readdir depends on to be working should be enabled
> automatically even though they were off earlier when parallel-readdir is
> enabled [6].
>
> I've listed important known issues above. But we can discuss which are the
> blockers for making this feature as a default.
>
> Thoughts?
>
> [1] http://review.gluster.org/#/c/16090/
> [2]
> https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
> (sections on small directory)
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1628807#c35
> <https://bugzilla.redhat.com/show_bug.cgi?id=1628807#c34>
> [4] https://www.spinics.net/lists/gluster-users/msg34956.html
> [5] http://review.gluster.org/#/c/glusterfs/+/20639/
> [6] https://bugzilla.redhat.com/show_bug.cgi?id=1631406
>
> regards,
> Raghavendra
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] On making performance.parallel-readdir as a default option

2018-09-21 Thread Raghavendra Gowdappa
Hi all,

We've a feature performance.parallel-readdir [1] that is known to improve
performance of readdir operations [2][3][4]. The option is especially
useful when distribute scale is relatively large (>10) and is known to
improve performance of readdir operations even on smaller scale of
distribute count 1 [4].

However, this option is not enabled by default. I am here proposing to make
this as a default feature.

But, there are some important things to be addressed in readdir-ahead
(which is core part of parallel-readdir), before we can do so:

To summarize issues with readdir-ahead:
* There seems to be one prominent problem of missing dentries with
parallel-readdir. There was one problem discussed on tech-list just
yesterday. I've heard about this recurrently earlier too. Not sure whether
this is the problem of missing unlink/rmdir/create etc fops (see below) in
readdir-ahead. ATM, no RCA.
* fixes to maintain stat-consistency in dentries pre-fetched have not made
into downstream yet (though merged upstream [5]).
* readdir-ahead doesn't implement directory modification fops like
rmdir/create/symlink/link/unlink/rename. This means cache won't be updated
wiith newer content, even on single mount till its consumed by application
or purged.
* dht linkto-files should store relative positions of subvolumes instead of
absolute subvolume name, so that changes to immediate child won't render
them stale.
* Features parallel-readdir depends on to be working should be enabled
automatically even though they were off earlier when parallel-readdir is
enabled [6].

I've listed important known issues above. But we can discuss which are the
blockers for making this feature as a default.

Thoughts?

[1] http://review.gluster.org/#/c/16090/
[2]
https://events.static.linuxfound.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
(sections on small directory)
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1628807#c35

[4] https://www.spinics.net/lists/gluster-users/msg34956.html
[5] http://review.gluster.org/#/c/glusterfs/+/20639/
[6] https://bugzilla.redhat.com/show_bug.cgi?id=1631406

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [regression tests] seeing files from previous test run

2018-08-13 Thread Raghavendra Gowdappa
All,

I was consistently seeing failures for test
https://review.gluster.org/#/c/glusterfs/+/20639/12/tests/bugs/readdir-ahead/bug-1390050.t

TEST glusterfs --volfile-server=$H0 --volfile-id=$V0 $M0
rm -rf $M0/*
TEST mkdir -p $DIRECTORY
#rm -rf
$DIRECTORY/*

TEST touch $DIRECTORY/file{0..10}
EXPECT "0" stat -c "%s" $DIRECTORY/file4
#rdd_tester="$(dirname
$0)/rdd-tester"

TEST build_tester $(dirname $0)/bug-1390050.c -o $(dirname $0)/rdd-tester
TEST $(dirname $0)/rdd-tester $DIRECTORY $DIRECTORY/file4

However, if I uncomment line "rm -rf $DIRECTORY/*" test succeeds.

I've also added a sleep just after mkdir -p $DIRECTORY and manually checked
the directory. Turns out there are files left from previous run.

So, It looks like files (left out from previous runs) are causing the
failures. Are there any changes to cleanup sequence which could've caused
this failure?

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests ( ./tests/bugs/quick-read/bug-846240.t)

2018-08-12 Thread Raghavendra Gowdappa
Failure is tracked by bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1615096



Earlier this test did following things on M0 and M1 mounted on same
volume:
1 create file  M0/testfile
2 open an fd on M0/testfile
3 remove the file from M1, M1/testfile
4 echo "data" >> M0/testfile

The test expects appending data to M0/testfile to fail. However,
redirector ">>" creates a file if it doesn't exist. So, the only
reason test succeeded was due to lookup succeeding due to stale stat
in md-cache. This hypothesis is verified by two experiments:
* Add a sleep of 10 seconds before append operation. md-cache cache
  expires and lookup fails followed by creation of file and hence append
  succeeds to new file.
* set md-cache timeout to 600 seconds and test never fails even with
  sleep 10 before append operation. Reason is stale stat in md-cache
  survives sleep 10.

So, the spurious nature of failure was dependent on whether lookup is
done when stat is present in md-cache or not.

The actual test should've been to write to the fd opened in step 2
above. I've changed the test accordingly. Note that this patch also
remounts M0 after initial file creation as open-behind disables
opening-behind on witnessing a setattr on the inode and touch involves
a setattr. On remount, create operation is not done and hence file is
opened-behind.



Fix submitted at: https://review.gluster.org/#/c/glusterfs/+/20710/

regards,
Raghavendra

On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan 
wrote:

> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
>
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
>
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
>
> ./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
> ./tests/basic/tier/tier-heald.t TBD
> ./tests/basic/afr/sparse-file-self-heal.t   TBD
> ./tests/bugs/shard/bug-1251824.tTBD
> ./tests/bugs/shard/configure-lru-limit.tTBD
> ./tests/bugs/replicate/bug-1408712.tRavi
> ./tests/basic/afr/replace-brick-self-heal.t TBD
> ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
> ./tests/basic/stats-dump.t  TBD
> ./tests/bugs/bug-1110262.t  TBD
> ./tests/basic/ec/ec-data-heal.t Mohit
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>  Pranith
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
> other-processes-accessing-mounted-path.t
> TBD
> ./tests/basic/ec/ec-5-2.t   Sunil
> ./tests/bugs/shard/bug-shard-discard.t  TBD
> ./tests/bugs/glusterd/remove-brick-testcases.t  TBD
> ./tests/bugs/protocol/bug-808400-repl.t TBD
> ./tests/bugs/quick-read/bug-846240.tDu
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t
>  Mohit
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
> ./tests/bugs/ec/bug-1236065.t   Pranith
> ./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
> ./tests/basic/ec/ec-1468261.t   Ashish
> ./tests/basic/afr/add-brick-self-heal.t Ravi
> ./tests/basic/afr/granular-esh/replace-brick.t  Pranith
> ./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
> ./tests/bugs/glusterd/validating-server-quorum.tAtin
> ./tests/bugs/replicate/bug-1363721.tRavi
> ./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>Karthik
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> Atin
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>  TBD
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
> ./tests/bitrot/bug-1373520.tKotresh
> ./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
> ./tests/bugs/glusterd/quorum-validation.t   Atin
> ./tests/bugs/distribute/bug-1042725.t   Shyam
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
> txn-on-quorum-failure.t
> Karthik
> ./tests/bugs/quota/bug-1293601.tTBD
> ./tests/bugs/bug-1368312.t  Du
> ./tests/bugs/distribute/bug-1122443.t   Du
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568
> Nithya/Shyam
>
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests (bug-1368312.t)

2018-08-12 Thread Raghavendra Gowdappa
Failure of this test is tracked by bz https://bugzilla.redhat.com/
show_bug.cgi?id=1608158.



I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures
stat was not served from md-cache, but instead was wound down to afr
which failed stat with EIO as the file was in split brain. So, I did
another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat
requests being absorbed either by kernel attribute cache or md-cache.
When its not happening stats are reaching afr and resulting in
failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t:
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj:
regression failures on afr/split-brain-resolution



regards,
Raghavendra



On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan 
wrote:

> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
>
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
>
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
>
> ./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
> ./tests/basic/tier/tier-heald.t TBD
> ./tests/basic/afr/sparse-file-self-heal.t   TBD
> ./tests/bugs/shard/bug-1251824.tTBD
> ./tests/bugs/shard/configure-lru-limit.tTBD
> ./tests/bugs/replicate/bug-1408712.tRavi
> ./tests/basic/afr/replace-brick-self-heal.t TBD
> ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
> ./tests/basic/stats-dump.t  TBD
> ./tests/bugs/bug-1110262.t  TBD
> ./tests/basic/ec/ec-data-heal.t Mohit
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>  Pranith
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
> other-processes-accessing-mounted-path.t
> TBD
> ./tests/basic/ec/ec-5-2.t   Sunil
> ./tests/bugs/shard/bug-shard-discard.t  TBD
> ./tests/bugs/glusterd/remove-brick-testcases.t  TBD
> ./tests/bugs/protocol/bug-808400-repl.t TBD
> ./tests/bugs/quick-read/bug-846240.tDu
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t
>  Mohit
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
> ./tests/bugs/ec/bug-1236065.t   Pranith
> ./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
> ./tests/basic/ec/ec-1468261.t   Ashish
> ./tests/basic/afr/add-brick-self-heal.t Ravi
> ./tests/basic/afr/granular-esh/replace-brick.t  Pranith
> ./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
> ./tests/bugs/glusterd/validating-server-quorum.tAtin
> ./tests/bugs/replicate/bug-1363721.tRavi
> ./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>Karthik
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> Atin
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>  TBD
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
> ./tests/bitrot/bug-1373520.tKotresh
> ./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
> ./tests/bugs/glusterd/quorum-validation.t   Atin
> ./tests/bugs/distribute/bug-1042725.t   Shyam
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
> txn-on-quorum-failure.t
> Karthik
> ./tests/bugs/quota/bug-1293601.tTBD
> ./tests/bugs/bug-1368312.t  Du
> ./tests/bugs/distribute/bug-1122443.t   Du
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568
> Nithya/Shyam
>
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org

Re: [Gluster-devel] [Gluster-Maintainers] bug-1368312.t

2018-08-12 Thread Raghavendra Gowdappa
Failure of this test is tracked by bz
https://bugzilla.redhat.com/show_bug.cgi?id=1608158.



I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures
stat was not served from md-cache, but instead was wound down to afr
which failed stat with EIO as the file was in split brain. So, I did
another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat
requests being absorbed either by kernel attribute cache or md-cache.
When its not happening stats are reaching afr and resulting in
failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t:
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj:
regression failures on afr/split-brain-resolution



regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-12 Thread Raghavendra Gowdappa
On Sun, Aug 12, 2018 at 9:11 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Sat, Aug 11, 2018 at 10:33 PM, Shyam Ranganathan 
> wrote:
>
>> On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
>> >
>> >
>> > On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan > > <mailto:srang...@redhat.com>> wrote:
>> >
>> > On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
>> > > Today's patch set 7 [1], included fixes provided till last
>> evening IST,
>> > > and its runs can be seen here [2] (yay! we can link to comments in
>> > > gerrit now).
>> > >
>> > > New failures: (added to the spreadsheet)
>> > > ./tests/bugs/quick-read/bug-846240.t
>> >
>> > The above test fails always if there is a sleep of 10 added at line
>> 36.
>> >
>> > I tried to replicate this in my setup, and was able to do so 3/150
>> times
>> > and the failures were the same as the ones reported in the build
>> logs
>> > (as below).
>> >
>> > Not finding any clear reason for the failure, I delayed the test
>> (i.e
>> > added a sleep 10) after the open on M0 to see if the race is
>> uncovered,
>> > and it was.
>> >
>> > Du, request you to take a look at the same, as the test is around
>> > quick-read but involves open-behind as well.
>> >
>> >
>> > Thanks for that information. I'll be working on this today.
>>
>> Heads up Du, failed again with the same pattern in run
>> https://build.gluster.org/job/regression-on-demand-full-run/
>> 46/consoleFull
>
>
> Sorry Shyam.
>
> I found out the cause [1]. But still thinking about the fix or to remove
> the test given recent changes to open-behind from [1]. You'll have an
> answer by EOD today.
>

Fix submitted at  https://review.gluster.org/#/c/glusterfs/+/20710/


> [1] https://review.gluster.org/20428
>
>
>>
>> >
>> >
>> > Failure snippet:
>> > 
>> > 23:41:24 [23:41:28] Running tests in file
>> > ./tests/bugs/quick-read/bug-846240.t
>> > 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
>> > 23:41:28 1..17
>> > 23:41:28 ok 1, LINENUM:9
>> > 23:41:28 ok 2, LINENUM:10
>> > 
>> > 23:41:28 ok 13, LINENUM:40
>> > 23:41:28 not ok 14 , LINENUM:50
>> > 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
>> >
>> > Shyam
>> >
>> >
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-11 Thread Raghavendra Gowdappa
On Sat, Aug 11, 2018 at 10:33 PM, Shyam Ranganathan 
wrote:

> On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
> >
> >
> > On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan  > <mailto:srang...@redhat.com>> wrote:
> >
> > On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > > Today's patch set 7 [1], included fixes provided till last evening
> IST,
> > > and its runs can be seen here [2] (yay! we can link to comments in
> > > gerrit now).
> > >
> > > New failures: (added to the spreadsheet)
> > > ./tests/bugs/quick-read/bug-846240.t
> >
> > The above test fails always if there is a sleep of 10 added at line
> 36.
> >
> > I tried to replicate this in my setup, and was able to do so 3/150
> times
> > and the failures were the same as the ones reported in the build logs
> > (as below).
> >
> > Not finding any clear reason for the failure, I delayed the test (i.e
> > added a sleep 10) after the open on M0 to see if the race is
> uncovered,
> > and it was.
> >
> > Du, request you to take a look at the same, as the test is around
> > quick-read but involves open-behind as well.
> >
> >
> > Thanks for that information. I'll be working on this today.
>
> Heads up Du, failed again with the same pattern in run
> https://build.gluster.org/job/regression-on-demand-full-run/46/consoleFull


Sorry Shyam.

I found out the cause [1]. But still thinking about the fix or to remove
the test given recent changes to open-behind from [1]. You'll have an
answer by EOD today.

[1] https://review.gluster.org/20428


>
> >
> >
> > Failure snippet:
> > 
> > 23:41:24 [23:41:28] Running tests in file
> > ./tests/bugs/quick-read/bug-846240.t
> > 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> > 23:41:28 1..17
> > 23:41:28 ok 1, LINENUM:9
> > 23:41:28 ok 2, LINENUM:10
> > 
> > 23:41:28 ok 13, LINENUM:40
> > 23:41:28 not ok 14 , LINENUM:50
> > 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
> >
> > Shyam
> >
> >
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/basic/afr/metadata-self-heal.t core dumped

2018-08-09 Thread Raghavendra Gowdappa
On Fri, Aug 10, 2018 at 11:21 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Fri, Aug 10, 2018 at 8:54 AM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> Details can be found at:
>> https://build.gluster.org/job/centos7-regression/2190/console
>>
>> Process that core dumped: glfs_shdheal
>>
>> Note that the patch on which this regression failures is on readdir-ahead
>> which is not loaded in xlator graph of self heal daemon.
>>
>> From bt,
>>
>> *23:53:24* __FUNCTION__ = "syncop_getxattr"*23:53:24* #8  
>> 0x7f5af8738aef in syncop_gfid_to_path_hard (itable=0x7f5ae401ce50, 
>> subvol=0x7f5ae40079e0, gfid=0x7f5adc00b4e8 "", inode=0x0, 
>> path_p=0x7f5acbffebe8, hard_resolve=false) at 
>> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:585*23:53:24*
>>  ret = 0*23:53:24* path = 0x0*23:53:24* loc = {path 
>> = 0x0, name = 0x0, inode = 0x7f5ac00028a8, parent = 0x0, gfid = '\000' 
>> , pargfid = '\000' }*23:53:24* 
>> xattr = 0x0*23:53:24* #9  0x7f5af8738c28 in syncop_gfid_to_path 
>> (itable=0x7f5ae401ce50, subvol=0x7f5ae40079e0, gfid=0x7f5adc00b4e8 "", 
>> path_p=0x7f5acbffebe8) at 
>> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:636*23:53:24*
>>  No locals.
>> *23:53:24* #10 0x7f5aeaad65e1 in afr_shd_selfheal 
>> (healer=0x7f5ae401d490, child=0, gfid=0x7f5adc00b4e8 "") at 
>> /home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:331*23:53:24*
>>  ret = 0*23:53:24* eh = 0x0*23:53:24* priv = 
>> 0x7f5ae401c780*23:53:24* shd = 0x7f5ae401c8e8*23:53:24* 
>> shd_event = 0x0*23:53:24* path = 0x0*23:53:24* subvol = 
>> 0x7f5ae40079e0*23:53:24* this = 0x7f5ae400d540*23:53:24* 
>> crawl_event = 0x7f5ae401d4a0*23:53:24* #11 0x7f5aeaad6de5 in 
>> afr_shd_full_heal (subvol=0x7f5ae40079e0, entry=0x7f5adc00b440, 
>> parent=0x7f5acbffee20, data=0x7f5ae401d490) at 
>> /home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:541*23:53:24*
>>  healer = 0x7f5ae401d490*23:53:24* this = 
>> 0x7f5ae400d540*23:53:24* priv = 0x7f5ae401c780*23:53:24* #12 
>> 0x7f5af8737b2f in syncop_ftw (subvol=0x7f5ae40079e0, loc=0x7f5acbffee20, 
>> pid=-6, data=0x7f5ae401d490, fn=0x7f5aeaad6d40 ) at 
>> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:123*23:53:24*
>>  child_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, 
>> gfid = '\000' , pargfid = '\000' > times>}*23:53:24* fd = 0x7f5ac0001398
>>
>>
>> Assert for a non-null gfid failed in client_pre_getxattr_v2. From bt, it
>> looks like a NULL gfid was passed to afr_shd_full.
>>
>
> Most probably it is because of the change in gf_link_inode_from_dirent()
> in your patch. Why did you make that change? Wondering if we need to change
> afr/ec code accordingly.
>

Please hold on. I'll let you know whether changes in afr/ec are necessary.
I am thinking whether that change is really necessary.


>
>>
>> *23:53:24* __PRETTY_FUNCTION__ = "client_pre_getxattr_v2"*23:53:24* 
>> #5  0x7f5aeada8f2a in client4_0_getxattr (frame=0x7f5ac0008198, 
>> this=0x7f5ae40079e0, data=0x7f5acbffdcc0) at 
>> /home/jenkins/root/workspace/centos7-regression/xlators/protocol/client/src/client-rpc-fops_v2.c:4287*23:53:24*
>>  conf = 0x7f5ae40293e0*23:53:24* args = 
>> 0x7f5acbffdcc0*23:53:24* req = {gfid = '\000' , 
>> namelen = 0, name = 0x0, xdata = {xdr_size = 0, count = 0, pairs = 
>> {pairs_len = 0, pairs_val = 0x0}}}*23:53:24* dict = 0x0*23:53:24*
>>  ret = 0*23:53:24* op_ret = -1*23:53:24* op_errno = 
>> 116*23:53:24* local = 0x7f5ac00082a8*23:53:24* __FUNCTION__ 
>> = "client4_0_getxattr"*23:53:24* __PRETTY_FUNCTION__ = 
>> "client4_0_getxattr"
>>
>>
>> regards,
>> Raghavendra
>>
>
>
> --
> Pranith
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/basic/afr/metadata-self-heal.t core dumped

2018-08-09 Thread Raghavendra Gowdappa
This looks to be from the code change
https://review.gluster.org/#/c/glusterfs/+/20639/4/libglusterfs/src/gf-dirent.c

I've reverted the changes and retriggered tests. Sorry about the confusion.

On Fri, Aug 10, 2018 at 8:54 AM, Raghavendra Gowdappa 
wrote:

> All,
>
> Details can be found at:
> https://build.gluster.org/job/centos7-regression/2190/console
>
> Process that core dumped: glfs_shdheal
>
> Note that the patch on which this regression failures is on readdir-ahead
> which is not loaded in xlator graph of self heal daemon.
>
> From bt,
>
> *23:53:24* __FUNCTION__ = "syncop_getxattr"*23:53:24* #8  
> 0x7f5af8738aef in syncop_gfid_to_path_hard (itable=0x7f5ae401ce50, 
> subvol=0x7f5ae40079e0, gfid=0x7f5adc00b4e8 "", inode=0x0, 
> path_p=0x7f5acbffebe8, hard_resolve=false) at 
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:585*23:53:24*
>  ret = 0*23:53:24* path = 0x0*23:53:24* loc = {path = 
> 0x0, name = 0x0, inode = 0x7f5ac00028a8, parent = 0x0, gfid = '\000'  15 times>, pargfid = '\000' }*23:53:24* xattr = 
> 0x0*23:53:24* #9  0x7f5af8738c28 in syncop_gfid_to_path 
> (itable=0x7f5ae401ce50, subvol=0x7f5ae40079e0, gfid=0x7f5adc00b4e8 "", 
> path_p=0x7f5acbffebe8) at 
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:636*23:53:24*
>  No locals.
> *23:53:24* #10 0x7f5aeaad65e1 in afr_shd_selfheal (healer=0x7f5ae401d490, 
> child=0, gfid=0x7f5adc00b4e8 "") at 
> /home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:331*23:53:24*
>  ret = 0*23:53:24* eh = 0x0*23:53:24* priv = 
> 0x7f5ae401c780*23:53:24* shd = 0x7f5ae401c8e8*23:53:24* 
> shd_event = 0x0*23:53:24* path = 0x0*23:53:24* subvol = 
> 0x7f5ae40079e0*23:53:24* this = 0x7f5ae400d540*23:53:24* 
> crawl_event = 0x7f5ae401d4a0*23:53:24* #11 0x7f5aeaad6de5 in 
> afr_shd_full_heal (subvol=0x7f5ae40079e0, entry=0x7f5adc00b440, 
> parent=0x7f5acbffee20, data=0x7f5ae401d490) at 
> /home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:541*23:53:24*
>  healer = 0x7f5ae401d490*23:53:24* this = 
> 0x7f5ae400d540*23:53:24* priv = 0x7f5ae401c780*23:53:24* #12 
> 0x7f5af8737b2f in syncop_ftw (subvol=0x7f5ae40079e0, loc=0x7f5acbffee20, 
> pid=-6, data=0x7f5ae401d490, fn=0x7f5aeaad6d40 ) at 
> /home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:123*23:53:24*
>  child_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid 
> = '\000' , pargfid = '\000' }*23:53:24*   
>   fd = 0x7f5ac0001398
>
>
> Assert for a non-null gfid failed in client_pre_getxattr_v2. From bt, it
> looks like a NULL gfid was passed to afr_shd_full.
>
> *23:53:24* __PRETTY_FUNCTION__ = "client_pre_getxattr_v2"*23:53:24* 
> #5  0x7f5aeada8f2a in client4_0_getxattr (frame=0x7f5ac0008198, 
> this=0x7f5ae40079e0, data=0x7f5acbffdcc0) at 
> /home/jenkins/root/workspace/centos7-regression/xlators/protocol/client/src/client-rpc-fops_v2.c:4287*23:53:24*
>  conf = 0x7f5ae40293e0*23:53:24* args = 
> 0x7f5acbffdcc0*23:53:24* req = {gfid = '\000' , 
> namelen = 0, name = 0x0, xdata = {xdr_size = 0, count = 0, pairs = {pairs_len 
> = 0, pairs_val = 0x0}}}*23:53:24* dict = 0x0*23:53:24* ret = 
> 0*23:53:24* op_ret = -1*23:53:24* op_errno = 116*23:53:24*
>  local = 0x7f5ac00082a8*23:53:24* __FUNCTION__ = 
> "client4_0_getxattr"*23:53:24* __PRETTY_FUNCTION__ = 
> "client4_0_getxattr"
>
>
> regards,
> Raghavendra
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] ./tests/basic/afr/metadata-self-heal.t core dumped

2018-08-09 Thread Raghavendra Gowdappa
All,

Details can be found at:
https://build.gluster.org/job/centos7-regression/2190/console

Process that core dumped: glfs_shdheal

Note that the patch on which this regression failures is on readdir-ahead
which is not loaded in xlator graph of self heal daemon.

>From bt,

*23:53:24* __FUNCTION__ = "syncop_getxattr"*23:53:24* #8
0x7f5af8738aef in syncop_gfid_to_path_hard (itable=0x7f5ae401ce50,
subvol=0x7f5ae40079e0, gfid=0x7f5adc00b4e8 "", inode=0x0,
path_p=0x7f5acbffebe8, hard_resolve=false) at
/home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:585*23:53:24*
ret = 0*23:53:24* path = 0x0*23:53:24* loc =
{path = 0x0, name = 0x0, inode = 0x7f5ac00028a8, parent = 0x0, gfid =
'\000' , pargfid = '\000' }*23:53:24* xattr = 0x0*23:53:24* #9  0x7f5af8738c28
in syncop_gfid_to_path (itable=0x7f5ae401ce50, subvol=0x7f5ae40079e0,
gfid=0x7f5adc00b4e8 "", path_p=0x7f5acbffebe8) at
/home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:636*23:53:24*
No locals.
*23:53:24* #10 0x7f5aeaad65e1 in afr_shd_selfheal
(healer=0x7f5ae401d490, child=0, gfid=0x7f5adc00b4e8 "") at
/home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:331*23:53:24*
ret = 0*23:53:24* eh = 0x0*23:53:24* priv =
0x7f5ae401c780*23:53:24* shd = 0x7f5ae401c8e8*23:53:24*
 shd_event = 0x0*23:53:24* path = 0x0*23:53:24* subvol
= 0x7f5ae40079e0*23:53:24* this = 0x7f5ae400d540*23:53:24*
crawl_event = 0x7f5ae401d4a0*23:53:24* #11 0x7f5aeaad6de5 in
afr_shd_full_heal (subvol=0x7f5ae40079e0, entry=0x7f5adc00b440,
parent=0x7f5acbffee20, data=0x7f5ae401d490) at
/home/jenkins/root/workspace/centos7-regression/xlators/cluster/afr/src/afr-self-heald.c:541*23:53:24*
healer = 0x7f5ae401d490*23:53:24* this =
0x7f5ae400d540*23:53:24* priv = 0x7f5ae401c780*23:53:24* #12
0x7f5af8737b2f in syncop_ftw (subvol=0x7f5ae40079e0,
loc=0x7f5acbffee20, pid=-6, data=0x7f5ae401d490, fn=0x7f5aeaad6d40
) at
/home/jenkins/root/workspace/centos7-regression/libglusterfs/src/syncop-utils.c:123*23:53:24*
child_loc = {path = 0x0, name = 0x0, inode = 0x0, parent =
0x0, gfid = '\000' , pargfid = '\000' }*23:53:24* fd = 0x7f5ac0001398


Assert for a non-null gfid failed in client_pre_getxattr_v2. From bt, it
looks like a NULL gfid was passed to afr_shd_full.

*23:53:24* __PRETTY_FUNCTION__ =
"client_pre_getxattr_v2"*23:53:24* #5  0x7f5aeada8f2a in
client4_0_getxattr (frame=0x7f5ac0008198, this=0x7f5ae40079e0,
data=0x7f5acbffdcc0) at
/home/jenkins/root/workspace/centos7-regression/xlators/protocol/client/src/client-rpc-fops_v2.c:4287*23:53:24*
conf = 0x7f5ae40293e0*23:53:24* args =
0x7f5acbffdcc0*23:53:24* req = {gfid = '\000' , namelen = 0, name = 0x0, xdata = {xdr_size = 0, count = 0,
pairs = {pairs_len = 0, pairs_val = 0x0}}}*23:53:24* dict =
0x0*23:53:24* ret = 0*23:53:24* op_ret = -1*23:53:24*
   op_errno = 116*23:53:24* local =
0x7f5ac00082a8*23:53:24* __FUNCTION__ =
"client4_0_getxattr"*23:53:24* __PRETTY_FUNCTION__ =
"client4_0_getxattr"


regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Raghavendra Gowdappa
On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan 
wrote:

> On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > Today's patch set 7 [1], included fixes provided till last evening IST,
> > and its runs can be seen here [2] (yay! we can link to comments in
> > gerrit now).
> >
> > New failures: (added to the spreadsheet)
> > ./tests/bugs/quick-read/bug-846240.t
>
> The above test fails always if there is a sleep of 10 added at line 36.
>
> I tried to replicate this in my setup, and was able to do so 3/150 times
> and the failures were the same as the ones reported in the build logs
> (as below).
>
> Not finding any clear reason for the failure, I delayed the test (i.e
> added a sleep 10) after the open on M0 to see if the race is uncovered,
> and it was.
>
> Du, request you to take a look at the same, as the test is around
> quick-read but involves open-behind as well.
>

Thanks for that information. I'll be working on this today.


> Failure snippet:
> 
> 23:41:24 [23:41:28] Running tests in file
> ./tests/bugs/quick-read/bug-846240.t
> 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> 23:41:28 1..17
> 23:41:28 ok 1, LINENUM:9
> 23:41:28 ok 2, LINENUM:10
> 
> 23:41:28 ok 13, LINENUM:40
> 23:41:28 not ok 14 , LINENUM:50
> 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
>
> Shyam
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 10:43 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:
>
>> grep count will ouput nothing, so I grep size, the results are:
>>
>> $ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep size
>> xlator.mount.fuse.itable.lru_size=191726
>>
>
> Kernel is holding too many inodes in its cache. What's the data set like?
> Do you've too many directories? How many files do you have?
>

Just to be sure, can you give the output of following cmd too:

# grep itable  | grep lru | wc -l


>
>> $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep
>> size
>> xlator.mount.fuse.itable.active_size=17
>>
>>
>> huting3
>> huti...@corp.netease.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>> 签名由 网易邮箱大师 <http://mail.163.com/dashi/> 定制
>>
>> On 08/9/2018 12:36,Raghavendra Gowdappa
>>  wrote:
>>
>> Can you get the output of following cmds?
>>
>> # grep itable  | grep lru | grep count
>>
>> # grep itable  | grep active | grep count
>>
>> On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:
>>
>>> Yes, I got the dump file and found there are many huge num_allocs just
>>> like following:
>>>
>>> I found memusage of 4 variable types are extreamly huge.
>>>
>>>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
>>> size=47202352
>>> num_allocs=2030212
>>> max_size=47203074
>>> max_num_allocs=2030235
>>> total_allocs=26892201
>>>
>>> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
>>> size=24362448
>>> num_allocs=2030204
>>> max_size=24367560
>>> max_num_allocs=2030226
>>> total_allocs=17830860
>>>
>>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>>> size=2497947552
>>> num_allocs=4578229
>>> max_size=2459135680
>>> max_num_allocs=7123206
>>> total_allocs=41635232
>>>
>>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>>> size=4038730976
>>> num_allocs=1
>>> max_size=4294962264
>>> max_num_allocs=37
>>> total_allocs=150049981
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>>>
>>>
>>>
>>> huting3
>>> huti...@corp.netease.com
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>>> 签名由 网易邮箱大师 <http://mail.163.com/dashi/> 定制
>>>
>>> On 08/9/2018 11:36,Raghavendra Gowdappa
>>>  wrote:
>>>
>>>
>>>
>>> On Thu, Aug 9, 2018 at 8:55 AM, huting3 
>>> wrote:
>>>
>>>> Hi expert:
>>>>
>>>> I meet a problem when I use glusterfs. The problem is that the fuse
>>>> client consumes huge memory when write a   lot of files(>million) to the
>>>> gluster, at last leading to killed by OS oom. The memory the fuse process
>>>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>>>> fuse process, or some other causes.
>>>>
>>>
>>> Can you get statedump of fuse process consuming huge memory?
>>>
>>>
>>>> My gluster version is 3.13.2, the gluster volume info is listed as
>>>> following:
>>>>
>>>> Volume Name: gv0
>>>> Type: Distributed-Replicate
>>>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 19 x 3 = 57
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>>>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>>>> Brick3: dl22.d

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:

> grep count will ouput nothing, so I grep size, the results are:
>
> $ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep size
> xlator.mount.fuse.itable.lru_size=191726
>

Kernel is holding too many inodes in its cache. What's the data set like?
Do you've too many directories? How many files do you have?


> $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep size
> xlator.mount.fuse.itable.active_size=17
>
>
> huting3
> huti...@corp.netease.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
> 签名由 网易邮箱大师 <http://mail.163.com/dashi/> 定制
>
> On 08/9/2018 12:36,Raghavendra Gowdappa
>  wrote:
>
> Can you get the output of following cmds?
>
> # grep itable  | grep lru | grep count
>
> # grep itable  | grep active | grep count
>
> On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:
>
>> Yes, I got the dump file and found there are many huge num_allocs just
>> like following:
>>
>> I found memusage of 4 variable types are extreamly huge.
>>
>>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
>> size=47202352
>> num_allocs=2030212
>> max_size=47203074
>> max_num_allocs=2030235
>> total_allocs=26892201
>>
>> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
>> size=24362448
>> num_allocs=2030204
>> max_size=24367560
>> max_num_allocs=2030226
>> total_allocs=17830860
>>
>> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
>> size=2497947552
>> num_allocs=4578229
>> max_size=2459135680
>> max_num_allocs=7123206
>> total_allocs=41635232
>>
>> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
>> size=4038730976
>> num_allocs=1
>> max_size=4294962264
>> max_num_allocs=37
>> total_allocs=150049981
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>>
>>
>>
>> huting3
>> huti...@corp.netease.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>> 签名由 网易邮箱大师 <http://mail.163.com/dashi/> 定制
>>
>> On 08/9/2018 11:36,Raghavendra Gowdappa
>>  wrote:
>>
>>
>>
>> On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:
>>
>>> Hi expert:
>>>
>>> I meet a problem when I use glusterfs. The problem is that the fuse
>>> client consumes huge memory when write a   lot of files(>million) to the
>>> gluster, at last leading to killed by OS oom. The memory the fuse process
>>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>>> fuse process, or some other causes.
>>>
>>
>> Can you get statedump of fuse process consuming huge memory?
>>
>>
>>> My gluster version is 3.13.2, the gluster volume info is listed as
>>> following:
>>>
>>> Volume Name: gv0
>>> Type: Distributed-Replicate
>>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 19 x 3 = 57
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
>>> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
>>> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
>>> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
>

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
Can you get the output of following cmds?

# grep itable  | grep lru | grep count

# grep itable  | grep active | grep count

On Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:

> Yes, I got the dump file and found there are many huge num_allocs just
> like following:
>
> I found memusage of 4 variable types are extreamly huge.
>
>  [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]
> size=47202352
> num_allocs=2030212
> max_size=47203074
> max_num_allocs=2030235
> total_allocs=26892201
>
> [protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]
> size=24362448
> num_allocs=2030204
> max_size=24367560
> max_num_allocs=2030226
> total_allocs=17830860
>
> [mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]
> size=2497947552
> num_allocs=4578229
> max_size=2459135680
> max_num_allocs=7123206
> total_allocs=41635232
>
> [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
> size=4038730976
> num_allocs=1
> max_size=4294962264
> max_num_allocs=37
> total_allocs=150049981​
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
>
>
>
> huting3
> huti...@corp.netease.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?=huting3=huting3%40corp.netease.com=1=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22huting3%40corp.netease.com%22%5D=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyeicon%2F209a2912f40f6683af56bb7caff1cb54.png>
> 签名由 网易邮箱大师 <http://mail.163.com/dashi/> 定制
>
> On 08/9/2018 11:36,Raghavendra Gowdappa
>  wrote:
>
>
>
> On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:
>
>> Hi expert:
>>
>> I meet a problem when I use glusterfs. The problem is that the fuse
>> client consumes huge memory when write a   lot of files(>million) to the
>> gluster, at last leading to killed by OS oom. The memory the fuse process
>> consumes can up to 100G! I wonder if there are memory leaks in the gluster
>> fuse process, or some other causes.
>>
>
> Can you get statedump of fuse process consuming huge memory?
>
>
>> My gluster version is 3.13.2, the gluster volume info is listed as
>> following:
>>
>> Volume Name: gv0
>> Type: Distributed-Replicate
>> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 19 x 3 = 57
>> Transport-type: tcp
>> Bricks:
>> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
>> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0
>> Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0
>> Brick35: dl24.dg

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-08 Thread Raghavendra Gowdappa
On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:

> Hi expert:
>
> I meet a problem when I use glusterfs. The problem is that the fuse client
> consumes huge memory when write a   lot of files(>million) to the gluster,
> at last leading to killed by OS oom. The memory the fuse process consumes
> can up to 100G! I wonder if there are memory leaks in the gluster fuse
> process, or some other causes.
>

Can you get statedump of fuse process consuming huge memory?


> My gluster version is 3.13.2, the gluster volume info is listed as
> following:
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 19 x 3 = 57
> Transport-type: tcp
> Bricks:
> Brick1: dl20.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick2: dl21.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick3: dl22.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick4: dl20.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick5: dl21.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick6: dl22.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick7: dl20.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick8: dl21.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick9: dl22.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick10: dl23.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick11: dl24.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick12: dl25.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick13: dl26.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick14: dl27.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick15: dl28.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick16: dl29.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick17: dl30.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick18: dl31.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick19: dl32.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick20: dl33.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick21: dl34.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick22: dl23.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick23: dl24.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick24: dl25.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick25: dl26.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick26: dl27.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick27: dl28.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick28: dl29.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick29: dl30.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick30: dl31.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick31: dl32.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick32: dl33.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick33: dl34.dg.163.org:/glusterfs_brick/brick2/gv0
> Brick34: dl23.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick35: dl24.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick36: dl25.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick37: dl26.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick38: dl27.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick39: dl28.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick40: dl29.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick41: dl30.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick42: dl31.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick43: dl32.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick44: dl33.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick45: dl34.dg.163.org:/glusterfs_brick/brick3/gv0
> Brick46: dl0.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick47: dl1.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick48: dl2.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick49: dl3.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick50: dl5.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick51: dl6.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick52: dl9.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick53: dl10.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick54: dl11.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick55: dl12.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick56: dl13.dg.163.org:/glusterfs_brick/brick1/gv0
> Brick57: dl14.dg.163.org:/glusterfs_brick/brick1/gv0
> Options Reconfigured:
> performance.cache-size: 10GB
> performance.parallel-readdir: on
> performance.readdir-ahead: on
> network.inode-lru-limit: 20
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> features.inode-quota: off
> features.quota: off
> cluster.quorum-reads: on
> cluster.quorum-count: 2
> cluster.quorum-type: fixed
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> cluster.server-quorum-ratio: 51%
>
>
> huting3
> huti...@corp.netease.com
>
> 
> 签名由 网易邮箱大师  定制
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> 

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-04 Thread Raghavendra Gowdappa
On Fri, Aug 3, 2018 at 5:03 PM, Raghavendra Gowdappa 
wrote:

> Will take a look.
>

Patch https://review.gluster.org/16419 indeed had a bug. It used to zero
out stats just retaining ia_gfid and ia_type. However, fuse_readdirp_cbk
would pass the attributes as valid to kernel causing the bug. I've fixed
this issue in https://review.gluster.org/20639.


> On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
> wrote:
>
>> Adding Raghavendra G who actually restored and reworked on this after it
>> was abandoned.
>>
>> -Krutika
>>
>> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran 
>> wrote:
>>
>>> Using git bisect, the patch that introduced this behaviour is :
>>>
>>> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
>>> Author: Krutika Dhananjay 
>>> Date:   Tue Jan 17 16:40:04 2017 +0530
>>>
>>> performance/readdir-ahead: Invalidate cached dentries if they're
>>> modified while in cache
>>>
>>> Krutika, can you take a look and fix this?
>>>
>>> To summarize, this is _not_ a spurious failure.
>>>
>>>
>>> regards,
>>> Nithya
>>>
>>>
>>> On 3 August 2018 at 14:13, Nithya Balachandran 
>>> wrote:
>>>
>>>> This is a new issue - the test uses ls -l to get some information. With
>>>> the latest master, ls -l returns strange results the first time it is
>>>> called on the mount point causing the test to fail:
>>>>
>>>>
>>>> With the latest master, I created a single brick volume and some files
>>>> inside it.
>>>>
>>>> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
>>>> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
>>>> again"; ls -l /mnt/fuse1
>>>> umount: /mnt/fuse1: not mounted
>>>> total 0
>>>> *--. 0 root root 0 Jan  1  1970 file-1*
>>>> *--. 0 root root 0 Jan  1  1970 file-2*
>>>> *--. 0 root root 0 Jan  1  1970 file-3*
>>>> *--. 0 root root 0 Jan  1  1970 file-4*
>>>> *--. 0 root root 0 Jan  1  1970 file-5*
>>>> *d-. 0 root root 0 Jan  1  1970 subdir*
>>>> Trying again
>>>> total 3
>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
>>>> d-. 0 root root  0 Jan  1  1970 subdir
>>>> [root@rhgs313-6 ~]#
>>>>
>>>>
>>>>
>>>> This is consistently reproducible. I am still debugging this to see
>>>> which patch caused this.
>>>>
>>>> regards,
>>>> Nithya
>>>>
>>>>
>>>> On 2 August 2018 at 07:13, Atin Mukherjee 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>>>>>
>>>>>> Will have a look at it and update.
>>>>>>
>>>>>
>>>>> There’s already a patch from Mohit for this.
>>>>>
>>>>>
>>>>>> Susant
>>>>>>
>>>>>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
>>>>>> wrote:
>>>>>>
>>>>>>> Same here - https://build.gluster.org/job/
>>>>>>> centos7-regression/2024/console
>>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee >>>>>> > wrote:
>>>>>>>
>>>>>>>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
>>>>>>>> times) running with master branch. As per my knowledge I've not seen 
>>>>>>>> this
>>>>>>>> test failing earlier. Looks like some recent changes has caused it. 
>>>>>>>> One of
>>>>>>>> such instance is https://build.gluster.org/job/
>>>>>>>> centos7-regression/1955/ .
>>>>>>>>
>>>>>>>> Request the component owners to take a look at it.
>>>>>>>>
>>>>>>>> ___
>>>>>>>> Gluster-devel mailing list
>>>>>>>> Gluster-devel@gluster.org
>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>
>>>>>>>
>>>>>>> ___
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel@gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>> ___
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel@gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>> --
>>>>> --Atin
>>>>>
>>>>> ___
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel@gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Raghavendra Gowdappa
On Fri, Aug 3, 2018 at 5:58 PM, Yaniv Kaul  wrote:

> Why not revert, fix and resubmit (unless you can quickly fix it)?
> Y.
>

https://review.gluster.org/20634


>
> On Fri, Aug 3, 2018, 5:04 PM Raghavendra Gowdappa 
> wrote:
>
>> Will take a look.
>>
>> On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
>> wrote:
>>
>>> Adding Raghavendra G who actually restored and reworked on this after it
>>> was abandoned.
>>>
>>> -Krutika
>>>
>>> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran >> > wrote:
>>>
>>>> Using git bisect, the patch that introduced this behaviour is :
>>>>
>>>> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
>>>> Author: Krutika Dhananjay 
>>>> Date:   Tue Jan 17 16:40:04 2017 +0530
>>>>
>>>> performance/readdir-ahead: Invalidate cached dentries if they're
>>>> modified while in cache
>>>>
>>>> Krutika, can you take a look and fix this?
>>>>
>>>> To summarize, this is _not_ a spurious failure.
>>>>
>>>>
>>>> regards,
>>>> Nithya
>>>>
>>>>
>>>> On 3 August 2018 at 14:13, Nithya Balachandran 
>>>> wrote:
>>>>
>>>>> This is a new issue - the test uses ls -l to get some information.
>>>>> With the latest master, ls -l returns strange results the first time it is
>>>>> called on the mount point causing the test to fail:
>>>>>
>>>>>
>>>>> With the latest master, I created a single brick volume and some files
>>>>> inside it.
>>>>>
>>>>> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
>>>>> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
>>>>> again"; ls -l /mnt/fuse1
>>>>> umount: /mnt/fuse1: not mounted
>>>>> total 0
>>>>> *--. 0 root root 0 Jan  1  1970 file-1*
>>>>> *--. 0 root root 0 Jan  1  1970 file-2*
>>>>> *--. 0 root root 0 Jan  1  1970 file-3*
>>>>> *--. 0 root root 0 Jan  1  1970 file-4*
>>>>> *--. 0 root root 0 Jan  1  1970 file-5*
>>>>> *d-. 0 root root 0 Jan  1  1970 subdir*
>>>>> Trying again
>>>>> total 3
>>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
>>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
>>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
>>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
>>>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
>>>>> d-. 0 root root  0 Jan  1  1970 subdir
>>>>> [root@rhgs313-6 ~]#
>>>>>
>>>>>
>>>>>
>>>>> This is consistently reproducible. I am still debugging this to see
>>>>> which patch caused this.
>>>>>
>>>>> regards,
>>>>> Nithya
>>>>>
>>>>>
>>>>> On 2 August 2018 at 07:13, Atin Mukherjee 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:
>>>>>>
>>>>>>> Will have a look at it and update.
>>>>>>>
>>>>>>
>>>>>> There’s already a patch from Mohit for this.
>>>>>>
>>>>>>
>>>>>>> Susant
>>>>>>>
>>>>>>> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Same here - https://build.gluster.org/job/centos7-regression/2024/
>>>>>>>> console
>>>>>>>>
>>>>>>>> -Krutika
>>>>>>>>
>>>>>>>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee <
>>>>>>>> amukh...@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
>>>>>>>>> times) running with master branch. As per my knowledge I've not seen 
>>>>>>>>> this
>>>>>>>>> test failing earlier. Looks like some recent changes has caused it. 
>>>>>>>>> One of
>>>>>>>>> such instance is https://build.gluster.org/job/
>>>>>>>>> centos7-regression/1955/ .
>>>>>>>>>
>>>>>>>>> Request the component owners to take a look at it.
>>>>>>>>>
>>>>>>>>> ___
>>>>>>>>> Gluster-devel mailing list
>>>>>>>>> Gluster-devel@gluster.org
>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>> ___
>>>>>>>> Gluster-devel mailing list
>>>>>>>> Gluster-devel@gluster.org
>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>> ___
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel@gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>> --
>>>>>> --Atin
>>>>>>
>>>>>> ___
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel@gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-03 Thread Raghavendra Gowdappa
On Fri, Aug 3, 2018 at 4:01 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> Hi Du/Poornima,
>
> I was analysing bitrot and geo-rep failures and I suspect there is a bug
> in some perf xlator
> that was one of the cause. I was seeing following behaviour in few runs.
>
> 1. Geo-rep synced data to slave. It creats empty file and then rsync syncs
> data.
> But test does "stat --format "%F" " to confirm. If it's empty,
> it returns
> "regular empty file" else "regular file". I believe it did get the
> "regular empty file"
> instead of "regular file" until timeout.
>

https://review.gluster.org/20549 might be relevant.


> 2. Other behaviour is with bitrot, with brick-mux. If a file is deleted on
> the back end on one brick
> and the look up is done. What all performance xlators needs to be
> disabled to get the lookup/revalidate
> on the brick where the file was deleted. Earlier, only md-cache was
> disable and it used to work.
> No it's failing intermittently.
>

You need to disable readdirplus in the entire stack. Refer to
https://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html


> Are there any pending patches around these areas that needs to be merged ?
> If there are, then it could be affecting other tests as well.
>
> Thanks,
> Kotresh HR
>
> On Fri, Aug 3, 2018 at 3:07 PM, Karthik Subrahmanya 
> wrote:
>
>>
>>
>> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya 
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya 
>>> wrote:
>>>


 On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, 
 wrote:

> I just went through the nightly regression report of brick mux runs
> and here's what I can summarize.
>
> 
> 
> =
> Fails only with brick-mux
> 
> 
> =
> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even
> after 400 secs. Refer https://fstat.gluster.org/fail
> ure/209?state=2_date=2018-06-30_date=2018-07-31=all,
> specifically the latest report https://build.gluster.org/job/
> regression-test-burn-in/4051/consoleText . Wasn't timing out as
> frequently as it was till 12 July. But since 27 July, it has timed out
> twice. Beginning to believe commit 
> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2
> has added the delay and now 400 secs isn't sufficient enough (Mohit?)
>
> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Ref - https://build.gluster.org/job/regression-test-with-multiplex
> /814/console) -  Test fails only in brick-mux mode, AI on Atin to
> look at and get back.
>
> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t (
> https://build.gluster.org/job/regression-test-with-multiple
> x/813/console) - Seems like failed just twice in last 30 days as per
> https://fstat.gluster.org/failure/251?state=2_date=
> 2018-06-30_date=2018-07-31=all. Need help from AFR team.
>
> tests/bugs/quota/bug-1293601.t (https://build.gluster.org/job
> /regression-test-with-multiplex/812/console) - Hasn't failed after 26
> July and earlier it was failing regularly. Did we fix this test through 
> any
> patch (Mohit?)
>
> tests/bitrot/bug-1373520.t - (https://build.gluster.org/job
> /regression-test-with-multiplex/811/console)  - Hasn't failed after
> 27 July and earlier it was failing regularly. Did we fix this test through
> any patch (Mohit?)
>
> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a
> core, not sure if related to brick mux or not, so not sure if brick mux is
> culprit here or not. Ref - https://build.gluster.org/job/
> regression-test-with-multiplex/806/console . Seems to be a glustershd
> crash. Need help from AFR folks.
>
> 
> 
> =
> Fails for non-brick mux case too
> 
> 
> =
> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup
> very often, with out brick mux as well. Refer
> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText
> . There's an email in gluster-devel and a BZ 1610240 for the same.
>
> tests/bugs/bug-1368312.t - Seems to be recent failures (
> 

Re: [Gluster-devel] tests/bugs/distribute/bug-1122443.t - spurious failure

2018-08-03 Thread Raghavendra Gowdappa
Will take a look.

On Fri, Aug 3, 2018 at 3:08 PM, Krutika Dhananjay 
wrote:

> Adding Raghavendra G who actually restored and reworked on this after it
> was abandoned.
>
> -Krutika
>
> On Fri, Aug 3, 2018 at 2:38 PM, Nithya Balachandran 
> wrote:
>
>> Using git bisect, the patch that introduced this behaviour is :
>>
>> commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
>> Author: Krutika Dhananjay 
>> Date:   Tue Jan 17 16:40:04 2017 +0530
>>
>> performance/readdir-ahead: Invalidate cached dentries if they're
>> modified while in cache
>>
>> Krutika, can you take a look and fix this?
>>
>> To summarize, this is _not_ a spurious failure.
>>
>>
>> regards,
>> Nithya
>>
>>
>> On 3 August 2018 at 14:13, Nithya Balachandran 
>> wrote:
>>
>>> This is a new issue - the test uses ls -l to get some information. With
>>> the latest master, ls -l returns strange results the first time it is
>>> called on the mount point causing the test to fail:
>>>
>>>
>>> With the latest master, I created a single brick volume and some files
>>> inside it.
>>>
>>> [root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
>>> 192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
>>> again"; ls -l /mnt/fuse1
>>> umount: /mnt/fuse1: not mounted
>>> total 0
>>> *--. 0 root root 0 Jan  1  1970 file-1*
>>> *--. 0 root root 0 Jan  1  1970 file-2*
>>> *--. 0 root root 0 Jan  1  1970 file-3*
>>> *--. 0 root root 0 Jan  1  1970 file-4*
>>> *--. 0 root root 0 Jan  1  1970 file-5*
>>> *d-. 0 root root 0 Jan  1  1970 subdir*
>>> Trying again
>>> total 3
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
>>> -rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
>>> d-. 0 root root  0 Jan  1  1970 subdir
>>> [root@rhgs313-6 ~]#
>>>
>>>
>>>
>>> This is consistently reproducible. I am still debugging this to see
>>> which patch caused this.
>>>
>>> regards,
>>> Nithya
>>>
>>>
>>> On 2 August 2018 at 07:13, Atin Mukherjee 
>>> wrote:
>>>


 On Thu, 2 Aug 2018 at 07:05, Susant Palai  wrote:

> Will have a look at it and update.
>

 There’s already a patch from Mohit for this.


> Susant
>
> On Wed, 1 Aug 2018, 18:58 Krutika Dhananjay, 
> wrote:
>
>> Same here - https://build.gluster.org/job/
>> centos7-regression/2024/console
>>
>> -Krutika
>>
>> On Sun, Jul 29, 2018 at 1:53 PM, Atin Mukherjee 
>> wrote:
>>
>>> tests/bugs/distribute/bug-1122443.t fails my set up (3 out of 5
>>> times) running with master branch. As per my knowledge I've not seen 
>>> this
>>> test failing earlier. Looks like some recent changes has caused it. One 
>>> of
>>> such instance is https://build.gluster.org/job/
>>> centos7-regression/1955/ .
>>>
>>> Request the component owners to take a look at it.
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

 --
 --Atin

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 https://lists.gluster.org/mailman/listinfo/gluster-devel

>>>
>>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Release 5: Master branch health report (Week of 30th July)

2018-08-02 Thread Raghavendra Gowdappa
On Thu, Aug 2, 2018 at 5:48 PM, Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:

> I am facing different issue in softserve machines. The fuse mount itself
> is failing.
> I tried day before yesterday to debug geo-rep failures. I discussed with
> Raghu,
> but could not root cause it.
>

Where can I find the complete client logs for this?

So none of the tests were passing. It happened on
> both machine instances I tried.
>
> 
> [2018-07-31 10:41:49.288117] D [fuse-bridge.c:5407:notify] 0-fuse: got
> event 6 on graph 0
> [2018-07-31 10:41:49.289427] D [fuse-bridge.c:4990:fuse_get_mount_status]
> 0-fuse: mount status is 0
> [2018-07-31 10:41:49.289555] D [fuse-bridge.c:4256:fuse_init]
> 0-glusterfs-fuse: Detected support for FUSE_AUTO_INVAL_DATA. Enabling
> fopen_keep_cache automatically.
> [2018-07-31 10:41:49.289591] T [fuse-bridge.c:278:send_fuse_iov]
> 0-glusterfs-fuse: writev() result 40/40
> [2018-07-31 10:41:49.289610] I [fuse-bridge.c:4314:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
> [2018-07-31 10:41:49.289627] I [fuse-bridge.c:4948:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2018-07-31 10:41:49.289696] T [MSGID: 0] [syncop.c:1261:syncop_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from fuse to
> meta-autoload
> [2018-07-31 10:41:49.289743] T [MSGID: 0] [defaults.c:2716:default_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from meta-autoload to
> master
> [2018-07-31 10:41:49.289787] T [MSGID: 0] [io-stats.c:2788:io_stats_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from master to
> master-md-cache
> [2018-07-31 10:41:49.289833] T [MSGID: 0] [md-cache.c:513:mdc_inode_iatt_get]
> 0-md-cache: mdc_inode_ctx_get failed (----
> 0001)
> [2018-07-31 10:41:49.289923] T [MSGID: 0] [md-cache.c:1200:mdc_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from master-md-cache
> to master-open-behind
> [2018-07-31 10:41:49.289946] T [MSGID: 0] [defaults.c:2716:default_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from
> master-open-behind to master-quick-read
> [2018-07-31 10:41:49.289973] T [MSGID: 0] [quick-read.c:556:qr_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from
> master-quick-read to master-io-cache
> [2018-07-31 10:41:49.290002] T [MSGID: 0] [io-cache.c:298:ioc_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from master-io-cache
> to master-readdir-ahead
> [2018-07-31 10:41:49.290034] T [MSGID: 0] [defaults.c:2716:default_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from
> master-readdir-ahead to master-read-ahead
> [2018-07-31 10:41:49.290052] T [MSGID: 0] [defaults.c:2716:default_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from
> master-read-ahead to master-write-behind
> [2018-07-31 10:41:49.290077] T [MSGID: 0] [write-behind.c:2439:wb_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from
> master-write-behind to master-dht
> [2018-07-31 10:41:49.290156] D [MSGID: 0] 
> [dht-common.c:3674:dht_do_fresh_lookup]
> 0-master-dht: /: no subvolume in layout for path, checking on all the
> subvols to see if it is a directory
> [2018-07-31 10:41:49.290180] D [MSGID: 0] 
> [dht-common.c:3688:dht_do_fresh_lookup]
> 0-master-dht: /: Found null hashed subvol. Calling lookup on all nodes.
> [2018-07-31 10:41:49.290199] T [MSGID: 0] 
> [dht-common.c:3695:dht_do_fresh_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from master-dht to
> master-replicate-0
> [2018-07-31 10:41:49.290245] I [MSGID: 108006]
> [afr-common.c:5582:afr_local_init] 0-master-replicate-0: no subvolumes up
> [2018-07-31 10:41:49.290291] D [MSGID: 0] [afr-common.c:3212:afr_discover]
> 0-stack-trace: stack-address: 0x7f36e4001058, master-replicate-0 returned
> -1 error: Transport endpoint is not conne
> cted [Transport endpoint is not connected]
> [2018-07-31 10:41:49.290323] D [MSGID: 0] 
> [dht-common.c:1391:dht_lookup_dir_cbk]
> 0-master-dht: lookup of / on master-replicate-0 returned error [Transport
> endpoint is not connected]
> [2018-07-31 10:41:49.290350] T [MSGID: 0] 
> [dht-common.c:3695:dht_do_fresh_lookup]
> 0-stack-trace: stack-address: 0x7f36e4001058, winding from master-dht to
> master-replicate-1
> [2018-07-31 10:41:49.290381] I [MSGID: 108006]
> [afr-common.c:5582:afr_local_init] 0-master-replicate-1: no subvolumes up
> [2018-07-31 10:41:49.290403] D [MSGID: 0] [afr-common.c:3212:afr_discover]
> 0-stack-trace: stack-address: 0x7f36e4001058, master-replicate-1 returned
> -1 error: Transport endpoint is not connected [Transport endpoint is not
> connected]
> [2018-07-31 10:41:49.290427] D [MSGID: 0] 
> [dht-common.c:1391:dht_lookup_dir_cbk]
> 0-master-dht: lookup of / on master-replicate-1 returned error [Transport
> endpoint is not connected]
> [2018-07-31 10:41:49.290452] D [MSGID: 0] 
> 

Re: [Gluster-devel] ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails if non-anonymous fds are used in read path

2018-08-02 Thread Raghavendra Gowdappa
On Thu, Aug 2, 2018 at 3:54 PM, Rafi Kavungal Chundattu Parambil <
rkavu...@redhat.com> wrote:

> Yes, I think we can mark the test as bad for now. We found two issues that
> cause the failures.
>
> One issue is with the usage of anonymous fd from a fuse mount. posix acl
> which sits on the brick graph does the authentication check during open.
> But with anonymous FD's we may not have an explicit open received before
> let's a read fop. As a result, posix acl is not getting honoured with
> anonymous fd.
>
> The second issue is with snapd and libgfapi where it uses libgfapi to get
> the information from snapshot bricks. But uid, and gid's received from a
> client are not passed through libgfapi.
>
> I will fail two separate bugs to track this issue.
>
> Since both of this issues are not relevant to the fix which Raghavendra
> send, I agree to mark the tests as bad.
>

Thanks Rafi.


>
> Regards
> Rafi KC
>
>
> - Original Message -
> From: "Raghavendra Gowdappa" 
> To: "Sunny Kumar" , "Rafi" 
> Cc: "Gluster Devel" 
> Sent: Thursday, August 2, 2018 3:23:00 PM
> Subject: Re: 
> ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
> fails if non-anonymous fds are used in read path
>
> I've filed  a bug to track this failure:
> https://bugzilla.redhat.com/show_bug.cgi?id=1611532
>
> As a stop gap measure I propose to mark the test as Bad to unblock patches
> [1][2]. Are maintainers of snapshot in agreement with this?
>
> regards,
> Raghavendra
>
> On Wed, Aug 1, 2018 at 10:28 AM, Raghavendra Gowdappa  >
> wrote:
>
> > Sunny/Rafi,
> >
> > I was trying to debug regression failures on [1]. Note that patch [1]
> only
> > disables usage of anonymous fds on readv. So, I tried the same test
> > disabling performance.open-behind
> >
> > [root@rhs-client27 glusterfs]# git diff
> > diff --git a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> gid-during-nfs-access.t
> > b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> > gid-during-nfs-access.t
> > index 3776451..cedf96b 100644
> > --- a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> > gid-during-nfs-access.t
> > +++ b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> > gid-during-nfs-access.t
> > @@ -79,6 +79,7 @@ TEST $CLI volume start $V0
> >  EXPECT_WITHIN $NFS_EXPORT_TIMEOUT "1" is_nfs_export_available
> >  TEST glusterfs -s $H0 --volfile-id $V0 $M0
> >  TEST mount_nfs $H0:/$V0 $N0 nolock
> > +TEST $CLI volume set $V0 performance.open-behind off
> >
> >  # Create 2 user
> >  user1=$(get_new_user)
> >
> >
> > With the above change, I can see consistent failures of the test just
> like
> > observed in [1].
> >
> > TEST 23 (line 154): Y check_if_permitted eeefadc
> > /mnt/glusterfs/0/.snaps/snap2/file3 cat
> > su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> > such file or directory
> > cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> > su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> > such file or directory
> > cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> > su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> > such file or directory
> > cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> > su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> > such file or directory
> > cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> >
> >
> > Test Summary Report
> > ---
> > ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> gid-during-nfs-access.t
> > (Wstat: 0 Tests: 46 Failed: 1)
> >   Failed test:  23
> >
> >
> > I had a feeling this test fails spuriously and the spurious nature is
> tied
> > with whether open-behind uses an anonymous fd or a regular fd for read.
> >
> > @Sunny,
> >
> > This test is blocking two of my patches - [1] and [2]. Can I mark this
> > test as bad and proceed with my work on [1] and [2]?
> >
> > [1] https://review.gluster.org/20511
> > [2] https://review.gluster.org/20428
> >
> > regards,
> > Raghavendra
> >
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails if non-anonymous fds are used in read path

2018-08-02 Thread Raghavendra Gowdappa
I've filed  a bug to track this failure:
https://bugzilla.redhat.com/show_bug.cgi?id=1611532

As a stop gap measure I propose to mark the test as Bad to unblock patches
[1][2]. Are maintainers of snapshot in agreement with this?

regards,
Raghavendra

On Wed, Aug 1, 2018 at 10:28 AM, Raghavendra Gowdappa 
wrote:

> Sunny/Rafi,
>
> I was trying to debug regression failures on [1]. Note that patch [1] only
> disables usage of anonymous fds on readv. So, I tried the same test
> disabling performance.open-behind
>
> [root@rhs-client27 glusterfs]# git diff
> diff --git 
> a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
> b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> gid-during-nfs-access.t
> index 3776451..cedf96b 100644
> --- a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> gid-during-nfs-access.t
> +++ b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-
> gid-during-nfs-access.t
> @@ -79,6 +79,7 @@ TEST $CLI volume start $V0
>  EXPECT_WITHIN $NFS_EXPORT_TIMEOUT "1" is_nfs_export_available
>  TEST glusterfs -s $H0 --volfile-id $V0 $M0
>  TEST mount_nfs $H0:/$V0 $N0 nolock
> +TEST $CLI volume set $V0 performance.open-behind off
>
>  # Create 2 user
>  user1=$(get_new_user)
>
>
> With the above change, I can see consistent failures of the test just like
> observed in [1].
>
> TEST 23 (line 154): Y check_if_permitted eeefadc
> /mnt/glusterfs/0/.snaps/snap2/file3 cat
> su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> such file or directory
> cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> such file or directory
> cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> such file or directory
> cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
> su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
> such file or directory
> cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
>
>
> Test Summary Report
> ---
> ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
> (Wstat: 0 Tests: 46 Failed: 1)
>   Failed test:  23
>
>
> I had a feeling this test fails spuriously and the spurious nature is tied
> with whether open-behind uses an anonymous fd or a regular fd for read.
>
> @Sunny,
>
> This test is blocking two of my patches - [1] and [2]. Can I mark this
> test as bad and proceed with my work on [1] and [2]?
>
> [1] https://review.gluster.org/20511
> [2] https://review.gluster.org/20428
>
> regards,
> Raghavendra
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] ./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t fails if non-anonymous fds are used in read path

2018-07-31 Thread Raghavendra Gowdappa
Sunny/Rafi,

I was trying to debug regression failures on [1]. Note that patch [1] only
disables usage of anonymous fds on readv. So, I tried the same test
disabling performance.open-behind

[root@rhs-client27 glusterfs]# git diff
diff --git
a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
index 3776451..cedf96b 100644
---
a/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
+++
b/tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
@@ -79,6 +79,7 @@ TEST $CLI volume start $V0
 EXPECT_WITHIN $NFS_EXPORT_TIMEOUT "1" is_nfs_export_available
 TEST glusterfs -s $H0 --volfile-id $V0 $M0
 TEST mount_nfs $H0:/$V0 $N0 nolock
+TEST $CLI volume set $V0 performance.open-behind off

 # Create 2 user
 user1=$(get_new_user)


With the above change, I can see consistent failures of the test just like
observed in [1].

TEST 23 (line 154): Y check_if_permitted eeefadc
/mnt/glusterfs/0/.snaps/snap2/file3 cat
su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
such file or directory
cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
such file or directory
cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
such file or directory
cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied
su: warning: cannot change directory to /tmp/tmp.eaKBKS0lfM/eeefadc: No
such file or directory
cat: /mnt/glusterfs/0/.snaps/snap2/file3: Permission denied


Test Summary Report
---
./tests/bugs/snapshot/bug-1167580-set-proper-uid-and-gid-during-nfs-access.t
(Wstat: 0 Tests: 46 Failed: 1)
  Failed test:  23


I had a feeling this test fails spuriously and the spurious nature is tied
with whether open-behind uses an anonymous fd or a regular fd for read.

@Sunny,

This test is blocking two of my patches - [1] and [2]. Can I mark this test
as bad and proceed with my work on [1] and [2]?

[1] https://review.gluster.org/20511
[2] https://review.gluster.org/20428

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/00-geo-rep/georep-basic-dr-rsync.t fails

2018-07-28 Thread Raghavendra Gowdappa
Few failures on https://review.gluster.org/#/c/20576/ seen too.

On Sat, Jul 28, 2018 at 2:47 PM, Raghavendra Gowdappa 
wrote:

> Kotresh,
>
> The test failed on master (without the patch) too. I've seen failures on
> this earlier too.
>

https://build.gluster.org/job/centos7-regression/1934/


> regards,
> Raghavendra
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] ./tests/00-geo-rep/georep-basic-dr-rsync.t fails

2018-07-28 Thread Raghavendra Gowdappa
Kotresh,

The test failed on master (without the patch) too. I've seen failures on
this earlier too.

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Wed, Jul 25, 2018 at 10:25 AM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 08:45 PM, Raghavendra Gowdappa wrote:
>
>
> I tried higher values of attribute-timeout and its not helping. Are there
> any other similar split brain related tests? Can I mark these tests bad for
> time being as  the md-cache patch has a deadline?
>
>
>>
>>> `git grep split-brain-status ` on the tests folder returned the
> following:
> tests/basic/afr/split-brain-resolution.t:
> tests/bugs/bug-1368312.t:
> tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
> tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t
>
> I guess if it is blocking you , you can mark them as bad tests and assign
> the bug to me.
>

https://bugzilla.redhat.com/show_bug.cgi?id=1608158.

Will mark these tests as bad in the md-cache patch referred in the first
mail.

-Ravi
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 6:54 PM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 06:30 PM, Ravishankar N wrote:
>
>
>
> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>
> All,
>
> I was trying to debug regression failures on [1] and observed that
> split-brain-resolution.t was failing consistently.
>
> =
> TEST 45 (line 88): 0 get_pending_heal_count patchy
> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>
> Test Summary Report
> ---
> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
>   Failed tests:  24-26, 28-36, 41-45
>
>
> On probing deeper, I observed a curious fact - on most of the failures
> stat was not served from md-cache, but instead was wound down to afr which
> failed stat with EIO as the file was in split brain. So, I did another test:
> * disabled md-cache
> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>
> Now the test fails always. So, I think the test relied on stat requests
> being absorbed either by kernel attribute cache or md-cache. When its not
> happening stats are reaching afr and resulting in failures of cmds like
> getfattr etc.
>
>
> This indeed seems to be the case.  Is there any way we can avoid the stat?
> When a getfattr is performed on the mount, aren't lookup + getfattr are the
> only fops that need to be hit in gluster?
>
>
> Or should afr allow (f)stat even for replica-2 split-brains because it is
> allowing lookup anyway (lookup cbk contains stat information from one of
> its children) ?
>

I think the question here should be what kind of access we've to provide
for files in split-brain. Once that broader question is answered, we should
think about what fops come under those kinds of access. If
setfattr/getfattr cmd access has to be provided I guess lookup, stat,
setxattr, getxattr need to work with split-brain files.

-Ravi
>
> -Ravi
>
> Thoughts?
>
> [1] https://review.gluster.org/#/c/20549/
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 8:36 PM, Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Jul 24, 2018 at 8:35 PM, Raghavendra Gowdappa  > wrote:
>
>>
>>
>> On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
>> wrote:
>>
>>>
>>>
>>> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>>>
>>> All,
>>>
>>> I was trying to debug regression failures on [1] and observed that
>>> split-brain-resolution.t was failing consistently.
>>>
>>> =
>>> TEST 45 (line 88): 0 get_pending_heal_count patchy
>>> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
>>> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>>>
>>> Test Summary Report
>>> ---
>>> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed:
>>> 17)
>>>   Failed tests:  24-26, 28-36, 41-45
>>>
>>>
>>> On probing deeper, I observed a curious fact - on most of the failures
>>> stat was not served from md-cache, but instead was wound down to afr which
>>> failed stat with EIO as the file was in split brain. So, I did another test:
>>> * disabled md-cache
>>> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>>>
>>> Now the test fails always. So, I think the test relied on stat requests
>>> being absorbed either by kernel attribute cache or md-cache. When its not
>>> happening stats are reaching afr and resulting in failures of cmds like
>>> getfattr etc.
>>>
>>>
>>> This indeed seems to be the case.  Is there any way we can avoid the
>>> stat? When a getfattr is performed on the mount, aren't lookup + getfattr
>>> are the only fops that need to be hit in gluster?
>>>
>>
>> Its a black box to me how kernel decides whether to do lookup or stat.
>> But I guess, if only stat is needed and its not available in cache it would
>> do a stat.
>>
>
> Another thing you can do is mounting with a higher value of
> attribute-timeout. Let us know whether it works.
>

I tried higher values of attribute-timeout and its not helping. Are there
any other similar split brain related tests? Can I mark these tests bad for
time being as  the md-cache patch has a deadline?


>
>> -Ravi
>>>
>>> Thoughts?
>>>
>>> [1] https://review.gluster.org/#/c/20549/
>>>
>>>
>>> ___
>>> Gluster-devel mailing 
>>> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 8:35 PM, Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
> wrote:
>
>>
>>
>> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>>
>> All,
>>
>> I was trying to debug regression failures on [1] and observed that
>> split-brain-resolution.t was failing consistently.
>>
>> =
>> TEST 45 (line 88): 0 get_pending_heal_count patchy
>> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
>> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>>
>> Test Summary Report
>> ---
>> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed:
>> 17)
>>   Failed tests:  24-26, 28-36, 41-45
>>
>>
>> On probing deeper, I observed a curious fact - on most of the failures
>> stat was not served from md-cache, but instead was wound down to afr which
>> failed stat with EIO as the file was in split brain. So, I did another test:
>> * disabled md-cache
>> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>>
>> Now the test fails always. So, I think the test relied on stat requests
>> being absorbed either by kernel attribute cache or md-cache. When its not
>> happening stats are reaching afr and resulting in failures of cmds like
>> getfattr etc.
>>
>>
>> This indeed seems to be the case.  Is there any way we can avoid the
>> stat? When a getfattr is performed on the mount, aren't lookup + getfattr
>> are the only fops that need to be hit in gluster?
>>
>
> Its a black box to me how kernel decides whether to do lookup or stat.
> But I guess, if only stat is needed and its not available in cache it would
> do a stat.
>

Another thing you can do is mounting with a higher value of
attribute-timeout. Let us know whether it works.


> -Ravi
>>
>> Thoughts?
>>
>> [1] https://review.gluster.org/#/c/20549/
>>
>>
>> ___
>> Gluster-devel mailing 
>> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>
> All,
>
> I was trying to debug regression failures on [1] and observed that
> split-brain-resolution.t was failing consistently.
>
> =
> TEST 45 (line 88): 0 get_pending_heal_count patchy
> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>
> Test Summary Report
> ---
> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
>   Failed tests:  24-26, 28-36, 41-45
>
>
> On probing deeper, I observed a curious fact - on most of the failures
> stat was not served from md-cache, but instead was wound down to afr which
> failed stat with EIO as the file was in split brain. So, I did another test:
> * disabled md-cache
> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>
> Now the test fails always. So, I think the test relied on stat requests
> being absorbed either by kernel attribute cache or md-cache. When its not
> happening stats are reaching afr and resulting in failures of cmds like
> getfattr etc.
>
>
> This indeed seems to be the case.  Is there any way we can avoid the stat?
> When a getfattr is performed on the mount, aren't lookup + getfattr are the
> only fops that need to be hit in gluster?
>

Its a black box to me how kernel decides whether to do lookup or stat.  But
I guess, if only stat is needed and its not available in cache it would do
a stat.

-Ravi
>
> Thoughts?
>
> [1] https://review.gluster.org/#/c/20549/
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
All,

I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat
was not served from md-cache, but instead was wound down to afr which
failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests
being absorbed either by kernel attribute cache or md-cache. When its not
happening stats are reaching afr and resulting in failures of cmds like
getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-22 Thread Raghavendra Gowdappa
On Mon, Jul 23, 2018 at 8:40 AM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> Hi,
>
>
>
> I tested both patchset1 and patchset2 of  https://review.gluster.org/20549,
> the ctime issue seems both be there.
>
> And I use my test c program and “dd” program, the issue both be there.
>

Strange. Tests on my laptop succeeded.


>
> But when use the patch of https://review.gluster.org/#/c/20410/11,
>
> My test C program and “dd” to an exist file will pass,
>
> ONLY “dd” to new file will be failed.
>

Thanks. I'll think more about this.


>
> Best Regards,
>
> George
>
>
>
>
>
>
>
> *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Raghavendra Gowdappa
> *Sent:* Monday, July 23, 2018 10:37 AM
>
> *To:* Lian, George (NSB - CN/Hangzhou) 
> *Cc:* Zhang, Bingxuan (NSB - CN/Hangzhou) ;
> Raghavendra G ; Gluster-devel@gluster.org
> *Subject:* Re: [Gluster-devel] The ctime of fstat is not correct which
> lead to "tar" utility error
>
>
>
>
>
>
>
> On Sun, Jul 22, 2018 at 1:41 PM, Raghavendra Gowdappa 
> wrote:
>
> George,
>
> Sorry. I sent you a version of the fix which was stale. Can you try with:
> https://review.gluster.org/20549
>
> This patch passes the test case you've given.
>
>
>
> Patchset 1 solves this problem. However, it ran into dbench failures as
> md-cache was slow to update its cache. Once I fixed it, I am seeing
> failures again. with performance.stat-prefetch off, the error goes away.
> But, I can see only ctime changes. Wondering whether this is related to
> ctime translator or an issue in  md-cache. Note that md-cache caches stats
> from codepaths which don't result in stat updation in kernel too. So, it
> could be either,
>
> * a bug in md-cache
>
> * or a bug where in those codepaths wrong/changed stat was sent.
>
>
>
> I'll probe the first hypothesis. @Pranith/@Ravi,
>
>
>
> What do you think about second hypothesis?
>
>
>
> regards,
>
> Raghavendra
>
>
>
> regards,
>
> Raghavendra
>
>
>
> On Fri, Jul 20, 2018 at 2:59 PM, Lian, George (NSB - CN/Hangzhou) <
> george.l...@nokia-sbell.com> wrote:
>
> Hi,
>
>
>
> Sorry, there seems still have issue.
>
>
>
> We use “dd” application of linux tools instead of my demo program, and if
> the file is not exist before dd, the issue still be there.
>
>
>
> The test command is
>
> rm -rf /mnt/test/file.txt ; dd if=/dev/zero of=/mnt/test/file.txt bs=512
> count=1 oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>
> 1) If we set md-cache-timeout to 0, the issue will not happen
>
> 2) If we set md-cache-timeout to 1, the issue will 100% reproduced!
> (with new patch you mentioned in the mail)
>
>
>
> Please see detail test result as the below:
>
>
>
> bash-4.4# gluster v set export md-cache-timeout 0
>
> volume set: failed: Volume export does not exist
>
> bash-4.4# gluster v set test md-cache-timeout 0
>
> volume set: success
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
> /mnt/test/file.txt;stat /mnt/test/file.txt^C
>
> bash-4.4# rm /mnt/test/file.txt
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
> /mnt/test/file.txt;stat /mnt/test/file.txt
>
> 1+0 records in
>
> 1+0 records out
>
> 512 bytes copied, 0.00932571 s, 54.9 kB/s
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:02.75600 +
>
> Modify: 2018-07-13 17:55:02.76400 +
>
> Change: 2018-07-13 17:55:02.76800 +
>
> Birth: -
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file.txt
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:02.77600 +
>
> Modify: 2018-07-13 17:55:02.76400 +
>
> Change: 2018-07-13 17:55:02.76800 +
>
> Birth: -
>
> bash-4.4# gluster v set test md-cache-timeout 1
>
> volume set: success
>
> bash-4.4# rm /mnt/test/file.txt
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.t

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-22 Thread Raghavendra Gowdappa
On Sun, Jul 22, 2018 at 1:41 PM, Raghavendra Gowdappa 
wrote:

> George,
>
> Sorry. I sent you a version of the fix which was stale. Can you try with:
> https://review.gluster.org/20549
>
> This patch passes the test case you've given.
>

Patchset 1 solves this problem. However, it ran into dbench failures as
md-cache was slow to update its cache. Once I fixed it, I am seeing
failures again. with performance.stat-prefetch off, the error goes away.
But, I can see only ctime changes. Wondering whether this is related to
ctime translator or an issue in  md-cache. Note that md-cache caches stats
from codepaths which don't result in stat updation in kernel too. So, it
could be either,
* a bug in md-cache
* or a bug where in those codepaths wrong/changed stat was sent.

I'll probe the first hypothesis. @Pranith/@Ravi,

What do you think about second hypothesis?

regards,
Raghavendra

>
> regards,
> Raghavendra
>
> On Fri, Jul 20, 2018 at 2:59 PM, Lian, George (NSB - CN/Hangzhou) <
> george.l...@nokia-sbell.com> wrote:
>
>> Hi,
>>
>>
>>
>> Sorry, there seems still have issue.
>>
>>
>>
>> We use “dd” application of linux tools instead of my demo program, and
>> if the file is not exist before dd, the issue still be there.
>>
>>
>>
>> The test command is
>>
>> rm -rf /mnt/test/file.txt ; dd if=/dev/zero of=/mnt/test/file.txt bs=512
>> count=1 oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>>
>> 1) If we set md-cache-timeout to 0, the issue will not happen
>>
>> 2) If we set md-cache-timeout to 1, the issue will 100% reproduced!
>> (with new patch you mentioned in the mail)
>>
>>
>>
>> Please see detail test result as the below:
>>
>>
>>
>> bash-4.4# gluster v set export md-cache-timeout 0
>>
>> volume set: failed: Volume export does not exist
>>
>> bash-4.4# gluster v set test md-cache-timeout 0
>>
>> volume set: success
>>
>> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
>> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>> /mnt/test/file.txt;stat /mnt/test/file.txt^C
>>
>> bash-4.4# rm /mnt/test/file.txt
>>
>> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
>> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>> /mnt/test/file.txt;stat /mnt/test/file.txt
>>
>> 1+0 records in
>>
>> 1+0 records out
>>
>> 512 bytes copied, 0.00932571 s, 54.9 kB/s
>>
>>   File: /mnt/test/file.txt
>>
>>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>>
>> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>>
>> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>>
>> Access: 2018-07-13 17:55:02.75600 +
>>
>> Modify: 2018-07-13 17:55:02.76400 +
>>
>> Change: 2018-07-13 17:55:02.76800 +
>>
>> Birth: -
>>
>> tar: Removing leading `/' from member names
>>
>> /mnt/test/file.txt
>>
>>   File: /mnt/test/file.txt
>>
>>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>>
>> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>>
>> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>>
>> Access: 2018-07-13 17:55:02.77600 +
>>
>> Modify: 2018-07-13 17:55:02.76400 +
>>
>> Change: 2018-07-13 17:55:02.76800 +
>>
>> Birth: -
>>
>> bash-4.4# gluster v set test md-cache-timeout 1
>>
>> volume set: success
>>
>> bash-4.4# rm /mnt/test/file.txt
>>
>> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
>> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>> /mnt/test/file.txt;stat /mnt/test/file.txt
>>
>> 1+0 records in
>>
>> 1+0 records out
>>
>> 512 bytes copied, 0.0107589 s, 47.6 kB/s
>>
>>   File: /mnt/test/file.txt
>>
>>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>>
>> Device: 33h/51d Inode: 13569976446871695205  Links: 1
>>
>> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>>
>> Access: 2018-07-13 17:55:11.54800 +
>>
>> Modify: 2018-07-13 17:55:11.56000 +
>>
>> Change: 2018-07-13 17:55:11.56000 +
>>
>> Birth: -
>>
>> tar: Removing leading `/' from member names
>>
>> /mnt/test/file.txt
>>
>> tar: /mnt/test/file.txt: file changed as we read it
>>
&

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-22 Thread Raghavendra Gowdappa
George,

Sorry. I sent you a version of the fix which was stale. Can you try with:
https://review.gluster.org/20549

This patch passes the test case you've given.

regards,
Raghavendra

On Fri, Jul 20, 2018 at 2:59 PM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> Hi,
>
>
>
> Sorry, there seems still have issue.
>
>
>
> We use “dd” application of linux tools instead of my demo program, and if
> the file is not exist before dd, the issue still be there.
>
>
>
> The test command is
>
> rm -rf /mnt/test/file.txt ; dd if=/dev/zero of=/mnt/test/file.txt bs=512
> count=1 oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
>
> 1) If we set md-cache-timeout to 0, the issue will not happen
>
> 2) If we set md-cache-timeout to 1, the issue will 100% reproduced!
> (with new patch you mentioned in the mail)
>
>
>
> Please see detail test result as the below:
>
>
>
> bash-4.4# gluster v set export md-cache-timeout 0
>
> volume set: failed: Volume export does not exist
>
> bash-4.4# gluster v set test md-cache-timeout 0
>
> volume set: success
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
> /mnt/test/file.txt;stat /mnt/test/file.txt^C
>
> bash-4.4# rm /mnt/test/file.txt
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
> /mnt/test/file.txt;stat /mnt/test/file.txt
>
> 1+0 records in
>
> 1+0 records out
>
> 512 bytes copied, 0.00932571 s, 54.9 kB/s
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:02.75600 +
>
> Modify: 2018-07-13 17:55:02.76400 +
>
> Change: 2018-07-13 17:55:02.76800 +
>
> Birth: -
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file.txt
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 9949244856126716752  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:02.77600 +
>
> Modify: 2018-07-13 17:55:02.76400 +
>
> Change: 2018-07-13 17:55:02.76800 +
>
> Birth: -
>
> bash-4.4# gluster v set test md-cache-timeout 1
>
> volume set: success
>
> bash-4.4# rm /mnt/test/file.txt
>
> bash-4.4# dd if=/dev/zero of=/mnt/test/file.txt bs=512 count=1
> oflag=sync;stat /mnt/test/file.txt;tar -czvf /tmp/abc.gz
> /mnt/test/file.txt;stat /mnt/test/file.txt
>
> 1+0 records in
>
> 1+0 records out
>
> 512 bytes copied, 0.0107589 s, 47.6 kB/s
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 13569976446871695205  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:11.54800 +
>
> Modify: 2018-07-13 17:55:11.56000 +
>
> Change: 2018-07-13 17:55:11.56000 +
>
> Birth: -
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file.txt
>
> tar: /mnt/test/file.txt: file changed as we read it
>
>   File: /mnt/test/file.txt
>
>   Size: 512 Blocks: 1  IO Block: 131072 regular file
>
> Device: 33h/51d Inode: 13569976446871695205  Links: 1
>
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
>
> Access: 2018-07-13 17:55:11.58000 +
>
> Modify: 2018-07-13 17:55:11.56000 +
>
> Change: 2018-07-13 17:55:11.56400 +
>
> Birth: -
>
>
>
>
>
> Best Regards,
>
> George
>
> *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Raghavendra Gowdappa
> *Sent:* Friday, July 20, 2018 4:01 PM
> *To:* Lian, George (NSB - CN/Hangzhou) 
> *Cc:* Zhang, Bingxuan (NSB - CN/Hangzhou) ;
> Raghavendra G ; Gluster-devel@gluster.org
>
> *Subject:* Re: [Gluster-devel] The ctime of fstat is not correct which
> lead to "tar" utility error
>
>
>
>
>
>
>
> On Fri, Jul 20, 2018 at 1:22 PM, Lian, George (NSB - CN/Hangzhou) <
> george.l...@nokia-sbell.com> wrote:
>
> >>>We recently identified an issue with stat-prefetch. Fix can be found at:
>
> >>>https://review.gluster.org/#/c/20410/11
>
>
>
> >&g

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-20 Thread Raghavendra Gowdappa
On Fri, Jul 20, 2018 at 1:22 PM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> >>>We recently identified an issue with stat-prefetch. Fix can be found at:
>
> >>>https://review.gluster.org/#/c/20410/11
>
>
>
> >>>Can you let us know whether this helps?
>
>
>
>
>
> The patch can resolve this issue, I have verified in Gluster 4.2(master
> trunk branch) and Gluster 3.12.3!
>

Thanks we'll merge it.


>
> Thanks & Best Regards,
>
> George
>
>
>
> *From:* gluster-devel-boun...@gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Raghavendra Gowdappa
> *Sent:* Thursday, July 19, 2018 5:06 PM
> *To:* Lian, George (NSB - CN/Hangzhou) 
> *Cc:* Zhang, Bingxuan (NSB - CN/Hangzhou) ;
> Gluster-devel@gluster.org; Raghavendra G 
> *Subject:* Re: [Gluster-devel] The ctime of fstat is not correct which
> lead to "tar" utility error
>
>
>
>
>
>
>
> On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) <
> george.l...@nokia-sbell.com> wrote:
>
> Hi, Gluster Experts,
>
>
>
> In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after
> we use fsync,
>
> We have a demo execute binary which write some data and then do fsync for
> this file, it named as “tt”,
>
> Then run tar command right after “tt” command, it will always error with “tar:
> /mnt/test/file1.txt: file changed as we read it”
>
>
>
> The command output is list as the below, the source code and volume info
> configuration attached FYI,
>
> This issue will be 100% reproducible! (/mnt/test is mountpoint of
> glusterfs volume “test” , which the volume info is attached in mail)
>
> --
>
> ./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
>
> mtime:1531247107.27200
>
> ctime:1531247107.27200
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file1.txt
>
> tar: /mnt/test/file1.txt: file changed as we read it
>
> --
>
>
>
> After my investigation, the xattrop for changelog is later than the fsync
> response , this is mean:
>
> In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume
> (this, local->fd, stub);
>
>
>
> In our case, it always a pending changelog , so glusterfs save the
> metadata information to stub, and handle pending changelog first,
>
> But the changelog will also change the ctime, from the packet captured by
> tcpdump, the response packet of xattrop will not include the metadata
> information,  and the wake_resume also not handle this metadata changed
> case.
>
>
>
> So in this case, the metadata in mdc_cache is not right, and when cache is
> valid, the application will get WRONG metadata!
>
>
>
> For verify my guess, if I change the configuration for this volume
>
> “gluster v set test md-cache-timeout 0” or
>
> “gluster v set export stat-prefetch off”
>
> This issue will be GONE!
>
>
>
> We recently identified an issue with stat-prefetch. Fix can be found at:
>
> https://review.gluster.org/#/c/20410/11
>
>
>
> Can you let us know whether this helps?
>
>
>
>
>
>
>
> And I restore the configuration to default, which mean stat-prefetch is on
> and md-cache-timeout is 1 second,
>
> I try invalidate the md-cache in source code as the below in function
> mdc_fync_cbk on md-cache.c
>
> The issue also will be GONE!
>
>
>
> So GLusterFS Experts,
>
> Could you please verify this issue, and share your comments on my
> investigation?
>
> And your finally solutions is highly appreciated!
>
>
>
> Does the following fix you've posted solves the problem?
>
>
>
>
>
> changes in function “mdc_fsync_cbk”
>
> int
>
> mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>
>int32_t op_ret, int32_t op_errno,
>
>struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
>
> {
>
> mdc_local_t  *local = NULL;
>
>
>
> local = frame->local;
>
>
>
> if (op_ret != 0)
>
> goto out;
>
>
>
> if (!local)
>
> goto out;
>
>
>
> mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf,
> postbuf,
>
>  _gf_true);
>
> /* new added for ctime issue*/
>
> mdc_inode_iatt_invalidate(this, local->fd->inode);
>
>
> /* new added end*/
>
> out:
>
> MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
>
>   xdata);
>
>
>
> return 0;
>
> }
>
> 
> -
>
> Best Regards,
>
> George
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] The ctime of fstat is not correct which lead to "tar" utility error

2018-07-19 Thread Raghavendra Gowdappa
On Thu, Jul 19, 2018 at 2:29 PM, Lian, George (NSB - CN/Hangzhou) <
george.l...@nokia-sbell.com> wrote:

> Hi, Gluster Experts,
>
>
>
> In glusterfs version 3.12.3, There seems a “fstat” issue for ctime after
> we use fsync,
>
> We have a demo execute binary which write some data and then do fsync for
> this file, it named as “tt”,
>
> Then run tar command right after “tt” command, it will always error with
> “tar: /mnt/test/file1.txt: file changed as we read it”
>
>
>
> The command output is list as the below, the source code and volume info
> configuration attached FYI,
>
> This issue will be 100% reproducible! (/mnt/test is mountpoint of
> glusterfs volume “test” , which the volume info is attached in mail)
>
> --
>
> ./tt;tar -czvf /tmp/abc.gz /mnt/test/file1.txt
>
> mtime:1531247107.27200
>
> ctime:1531247107.27200
>
> tar: Removing leading `/' from member names
>
> /mnt/test/file1.txt
>
> tar: /mnt/test/file1.txt: file changed as we read it
>
> --
>
>
>
> After my investigation, the xattrop for changelog is later than the fsync
> response , this is mean:
>
> In function  “afr_fsync_cbk” will call afr_delayed_changelog_wake_resume
> (this, local->fd, stub);
>
>
>
> In our case, it always a pending changelog , so glusterfs save the
> metadata information to stub, and handle pending changelog first,
>
> But the changelog will also change the ctime, from the packet captured by
> tcpdump, the response packet of xattrop will not include the metadata
> information,  and the wake_resume also not handle this metadata changed
> case.
>
>
>
> So in this case, the metadata in mdc_cache is not right, and when cache is
> valid, the application will get WRONG metadata!
>
>
>
> For verify my guess, if I change the configuration for this volume
>
> “gluster v set test md-cache-timeout 0” or
>
> “gluster v set export stat-prefetch off”
>
> This issue will be GONE!
>

We recently identified an issue with stat-prefetch. Fix can be found at:
https://review.gluster.org/#/c/20410/11

Can you let us know whether this helps?


>
>
>
> And I restore the configuration to default, which mean stat-prefetch is on
> and md-cache-timeout is 1 second,
>
> I try invalidate the md-cache in source code as the below in function
> mdc_fync_cbk on md-cache.c
>
> The issue also will be GONE!
>
>
>
> So GLusterFS Experts,
>
> Could you please verify this issue, and share your comments on my
> investigation?
>
> And your finally solutions is highly appreciated!
>
>
Does the following fix you've posted solves the problem?


>
> changes in function “mdc_fsync_cbk”
>
> int
>
> mdc_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
>
>int32_t op_ret, int32_t op_errno,
>
>struct iatt *prebuf, struct iatt *postbuf, dict_t *xdata)
>
> {
>
> mdc_local_t  *local = NULL;
>
>
>
> local = frame->local;
>
>
>
> if (op_ret != 0)
>
> goto out;
>
>
>
> if (!local)
>
> goto out;
>
>
>
> mdc_inode_iatt_set_validate(this, local->fd->inode, prebuf,
> postbuf,
>
>  _gf_true);
>
> /* new added for ctime issue*/
>
> mdc_inode_iatt_invalidate(this, local->fd->inode);
>
>
> /* new added end*/
>
> out:
>
> MDC_STACK_UNWIND (fsync, frame, op_ret, op_errno, prebuf, postbuf,
>
>   xdata);
>
>
>
> return 0;
>
> }
>
> 
> -
>
> Best Regards,
>
> George
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Storing list of dentries of children in parent inode

2018-07-02 Thread Raghavendra Gowdappa
On Fri, Jun 29, 2018 at 1:02 PM, Amar Tumballi  wrote:

>
>
> On Fri, Jun 29, 2018 at 12:25 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Wed, Jun 27, 2018 at 10:15 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> All,
>>>
>>> There is a requirement in write-behind where during readdirp we may have
>>> to invalidate iatts/stats of some of the children of the directory [1]. For
>>> lack of better alternatives I added a dentry list to parent inode which
>>> contains all children that've been linked (through lookup or readdirp on
>>> directory). I myself am not too comfortable with this solution as it might
>>> eat up significant memory for large directories.
>>>
>>> Thoughts?
>>>
>>
>>
>> Reading [2] makes me wonder if write-behind is the appropriate place for
>> this change. Shouldn't md-cache be made aware of inode generations or
>> something similar?
>>
>>
> Thanks for the pointers Vijay. But, now what happens if user keeps
> write-behind and turns off md-cache? (like virt profile and block profile).
>
> Raghavendra, while this patch fixes the problem specific to the usecase,
> all these changes break the boundary of translators, which were supposed to
> deal with just 1 fop (or one feature).
>
> It makes sense for us to move towards a global 'caching' xlator which can
> give better performance,  and has visibility to all the information about
> the file, and all fop. That will reduce all this complexity of what should
> be done for certain specific cases, type of problems.
>

I guess my previous reply was not sufficiently verbose. I agree with the
future plan. As I've discussed offline with you, the ambiguity is in
roadmap on how we arrive at that future. What this attempt is currently
maintenance. If you've an action plan/roadmap for,
* fixing current consistency issues and known performance issues
* unified performance xlator/global caching

I would be happy to discuss. It'll be helpful If we can make the discussion
concrete in terms of
* what issues we are fixing as part of maintenance and what are the issues
we are taking a call  on not fixing, but deferring for future solution.
* plans to mitigate (if any) any issues in the interim till new solution
becomes stable enough to be a replacement with current perf xlator stack.

Note that there is a growing demand for supporting DB workloads. Are we
planning to say no for these workloads till new solution is a in a state to
replace existing stack? Long ago I made a list of issues with current perf
xlator stack at [1]. We can use that as a rough reference.

[1]
https://docs.google.com/document/d/1wOsXAfhXFN0drGDTInPZXpFl2fX3La5FgkrdRksUiiw/edit?usp=sharing


> Also, is there any performance numbers possible with this patch and
> without this patch in regular readdirp operations ?
>
> Regards,
> Amar
>
> Thanks,
>> Vijay
>>
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691#c18
>>
>>
>>
>>>
>>> [1] https://review.gluster.org/20413
>>>
>>> regards,
>>> Raghavendra
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Storing list of dentries of children in parent inode

2018-06-29 Thread Raghavendra Gowdappa
On Fri, Jun 29, 2018 at 12:25 PM, Vijay Bellur  wrote:

>
>
> On Wed, Jun 27, 2018 at 10:15 PM Raghavendra Gowdappa 
> wrote:
>
>> All,
>>
>> There is a requirement in write-behind where during readdirp we may have
>> to invalidate iatts/stats of some of the children of the directory [1]. For
>> lack of better alternatives I added a dentry list to parent inode which
>> contains all children that've been linked (through lookup or readdirp on
>> directory). I myself am not too comfortable with this solution as it might
>> eat up significant memory for large directories.
>>
>> Thoughts?
>>
>
>
> Reading [2] makes me wonder if write-behind is the appropriate place for
> this change. Shouldn't md-cache be made aware of inode generations or
> something similar?
>

Both are independent fixes and cater to different issues. The fix for [2]
is at https://review.gluster.org/20410

The patch which is subject of this mail thread tries to address stat
consistency when readdirp is done on a directory containing files having
cached writes. If readdirp happen to fetch dentries of these files, stats
contained in those dentries can be stale. Consider the following scenario
(All following operations happen in a single thread of the application):

* w1 is done on f1, write syscall completes with success. w1 is cached in
write-behind.
* w2 is done on f2, write syscall completes with success. w2 is cached in
write-behind.
* readdirp on parent directory is done. In response, f1 and f2 are fetched.
stats for f1 and f2 are cached in kernel attribute cache.
* kernel/application does fstat on f1 and f2. Stats returned wouldn't
account for w1 and w2.

What patch does is to not send stats for f1 and f2 in readdirp response, so
that kernel is forced to do a lookup/getattr which will be serialized with
w1 and w2 and fetch correct values.





> Thanks,
> Vijay
>
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691#c18
>
>
>
>>
>> [1] https://review.gluster.org/20413
>>
>> regards,
>> Raghavendra
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Storing list of dentries of children in parent inode

2018-06-29 Thread Raghavendra Gowdappa
On Fri, Jun 29, 2018 at 1:02 PM, Amar Tumballi  wrote:

>
>
> On Fri, Jun 29, 2018 at 12:25 PM, Vijay Bellur  wrote:
>
>>
>>
>> On Wed, Jun 27, 2018 at 10:15 PM Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> All,
>>>
>>> There is a requirement in write-behind where during readdirp we may have
>>> to invalidate iatts/stats of some of the children of the directory [1]. For
>>> lack of better alternatives I added a dentry list to parent inode which
>>> contains all children that've been linked (through lookup or readdirp on
>>> directory). I myself am not too comfortable with this solution as it might
>>> eat up significant memory for large directories.
>>>
>>> Thoughts?
>>>
>>
>>
>> Reading [2] makes me wonder if write-behind is the appropriate place for
>> this change. Shouldn't md-cache be made aware of inode generations or
>> something similar?
>>
>>
> Thanks for the pointers Vijay. But, now what happens if user keeps
> write-behind and turns off md-cache? (like virt profile and block profile).
>
> Raghavendra, while this patch fixes the problem specific to the usecase,
> all these changes break the boundary of translators, which were supposed to
> deal with just 1 fop (or one feature).
>

We have to fix bugs for existing users :).


> It makes sense for us to move towards a global 'caching' xlator which can
> give better performance,  and has visibility to all the information about
> the file, and all fop. That will reduce all this complexity of what should
> be done for certain specific cases, type of problems.
>
> Also, is there any performance numbers possible with this patch and
> without this patch in regular readdirp operations ?
>

Will try to get when I've free cycles.


> Regards,
> Amar
>
> Thanks,
>> Vijay
>>
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1512691#c18
>>
>>
>>
>>>
>>> [1] https://review.gluster.org/20413
>>>
>>> regards,
>>> Raghavendra
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Amar Tumballi (amarts)
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Storing list of dentries of children in parent inode

2018-06-27 Thread Raghavendra Gowdappa
All,

There is a requirement in write-behind where during readdirp we may have to
invalidate iatts/stats of some of the children of the directory [1]. For
lack of better alternatives I added a dentry list to parent inode which
contains all children that've been linked (through lookup or readdirp on
directory). I myself am not too comfortable with this solution as it might
eat up significant memory for large directories.

Thoughts?

[1] https://review.gluster.org/20413

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [features/locks] Fetching lock info in lookup

2018-06-20 Thread Raghavendra Gowdappa
On Thu, Jun 21, 2018 at 6:55 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Jun 20, 2018 at 9:09 PM, Xavi Hernandez 
> wrote:
>
>> On Wed, Jun 20, 2018 at 4:29 PM Raghavendra Gowdappa 
>> wrote:
>>
>>> Krutika,
>>>
>>> This patch doesn't seem to be getting counts per domain, like number of
>>> inodelks or entrylks acquired in a domain "xyz". Am I right? If per domain
>>> stats are not available, passing interested domains in xdata_req would be
>>> needed. Any suggestions on that?
>>>
>>
>> We have GLUSTERFS_INODELK_DOM_COUNT. Its data should be a domain name for
>> which we want to know the number of inodelks (the count is returned into
>> GLUSTERFS_INODELK_COUNT though).
>>
>> It only exists for inodelk. If you need it for entrylk, it would need to
>> be implemented.
>>
>
> Yes. Realised that after going through the patch a bit more deeply.
> Thanks. I'll implement a domain based entrylk count.
>

I think I need to have a dynamic key for responses. Otherwise its difficult
to support requests on multiple domain in the same call. Embedding the
domain name in key helps us to keep per domain results separate. Also
needed is ways to send multiple domains in requests. If EC/AFR is already
using it, there is high chance of overwriting previously set requests for
different domains. Currently this is not consumed in lookup path by
EC/AFR/Shard (DHT is interested for this information in lookup path) and
hence not a pressing problem. But, we cannot rely on that.

what do you think is a better interface among following alternatives?

In request path,

1. Separate keys with domain name embedded - For eg.,
glusterfs.inodelk.xyz.count. Value is ignored.
2. A single key like GLUSTERFS_INODELK_DOM_COUNT. Value is a string of
interested domains separated by a delimiter (which character to use as
delimiter?)

In response path,
1. Separate keys with domain name embedded - For eg.,
glusterfs.inodelk.xyz.count. Value is the total number of locks (granted +
blocked).
2. A single key like GLUSTERFS_INODELK_DOM_COUNT. Value is a string of
interested domains and lock count separated by a delimiter (which character
to use as delimiter?)

I personally prefer the domain name embedded in key approach as it avoids
the string parsing by consumers. Any other approaches you can think of?

As of now response returned is number of (granted + blocked) locks. For
consumers using write-locks granted locks is always 1 and hence blocked
locks can be inferred. But read-locks consumers this is not possible as
there can be more than one read-lock consumers. For the requirement in DHT,
we don't need the exact number. Instead we need the information about are
there any granted locks, which can be given by the existing implementation.
So, I am not changing that.



>
>> Xavi
>>
>>
>>> regards,
>>> Raghavendra
>>>
>>> On Wed, Jun 20, 2018 at 12:58 PM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 20, 2018 at 12:06 PM, Krutika Dhananjay <
>>>> kdhan...@redhat.com> wrote:
>>>>
>>>>> We do already have a way to get inodelk and entrylk count from a bunch
>>>>> of fops, introduced in http://review.gluster.org/10880.
>>>>> Can you check if you can make use of this feature?
>>>>>
>>>>
>>>> Thanks Krutika. Yes, this feature meets DHT's requirement. We might
>>>> need a GLUSTERFS_PARENT_INODELK, but that can be easily added along the
>>>> lines of other counts. If necessary I'll send a patch to implement
>>>> GLUSTERFS_PARENT_INODELK.
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>>
>>>>> On Wed, Jun 20, 2018 at 9:17 AM, Amar Tumballi 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 20, 2018 at 9:06 AM, Raghavendra Gowdappa <
>>>>>> rgowd...@redhat.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> We've a requirement in DHT [1] to query the number of locks granted
>>>>>>> on an inode in lookup fop. I am planning to use xdata_req in lookup to 
>>>>>>> pass
>>>>>>> down the relevant arguments for this query. I am proposing following
>>>>>>> signature:
>>>>>>>
>>>>>>> In lookup request path following key value pairs will be passed in
>>>>>>> xdata_req:
>>>>>>> * "glu

Re: [Gluster-devel] [features/locks] Fetching lock info in lookup

2018-06-20 Thread Raghavendra Gowdappa
On Wed, Jun 20, 2018 at 9:09 PM, Xavi Hernandez 
wrote:

> On Wed, Jun 20, 2018 at 4:29 PM Raghavendra Gowdappa 
> wrote:
>
>> Krutika,
>>
>> This patch doesn't seem to be getting counts per domain, like number of
>> inodelks or entrylks acquired in a domain "xyz". Am I right? If per domain
>> stats are not available, passing interested domains in xdata_req would be
>> needed. Any suggestions on that?
>>
>
> We have GLUSTERFS_INODELK_DOM_COUNT. Its data should be a domain name for
> which we want to know the number of inodelks (the count is returned into
> GLUSTERFS_INODELK_COUNT though).
>
> It only exists for inodelk. If you need it for entrylk, it would need to
> be implemented.
>

Yes. Realised that after going through the patch a bit more deeply. Thanks.
I'll implement a domain based entrylk count.


> Xavi
>
>
>> regards,
>> Raghavendra
>>
>> On Wed, Jun 20, 2018 at 12:58 PM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Jun 20, 2018 at 12:06 PM, Krutika Dhananjay >> > wrote:
>>>
>>>> We do already have a way to get inodelk and entrylk count from a bunch
>>>> of fops, introduced in http://review.gluster.org/10880.
>>>> Can you check if you can make use of this feature?
>>>>
>>>
>>> Thanks Krutika. Yes, this feature meets DHT's requirement. We might need
>>> a GLUSTERFS_PARENT_INODELK, but that can be easily added along the lines of
>>> other counts. If necessary I'll send a patch to implement
>>> GLUSTERFS_PARENT_INODELK.
>>>
>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Wed, Jun 20, 2018 at 9:17 AM, Amar Tumballi 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 20, 2018 at 9:06 AM, Raghavendra Gowdappa <
>>>>> rgowd...@redhat.com> wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> We've a requirement in DHT [1] to query the number of locks granted
>>>>>> on an inode in lookup fop. I am planning to use xdata_req in lookup to 
>>>>>> pass
>>>>>> down the relevant arguments for this query. I am proposing following
>>>>>> signature:
>>>>>>
>>>>>> In lookup request path following key value pairs will be passed in
>>>>>> xdata_req:
>>>>>> * "glusterfs.lock.type"
>>>>>> - values can be "glusterfs.posix", "glusterfs.inodelk",
>>>>>> "glusterfs.entrylk"
>>>>>> * If the value of "glusterfs.lock.type" is "glusterfs.entrylk", then
>>>>>> basename is passed as a value in xdata_req for key
>>>>>> "glusterfs.entrylk.basename"
>>>>>> * key "glusterfs.lock-on?" will differentiate whether the lock
>>>>>> information is on current inode ("glusterfs.current-inode") or 
>>>>>> parent-inode
>>>>>> ("glusterfs.parent-inode"). For a nameless lookup 
>>>>>> "glusterfs.parent-inode"
>>>>>> is invalid.
>>>>>> * "glusterfs.blocked-locks" - Information should be limited to
>>>>>> blocked locks.
>>>>>> * "glusterfs.granted-locks" - Information should be limited to
>>>>>> granted locks.
>>>>>> * If necessary other information about granted locks, blocked locks
>>>>>> can be added. Since, there is no requirement for now, I am not adding 
>>>>>> these
>>>>>> keys.
>>>>>>
>>>>>> Response dictionary will have information in following format:
>>>>>> * "glusterfs.entrylk...granted-locks" - number of
>>>>>> granted entrylks on inode "gfid" with "basename" (usually this value will
>>>>>> be either 0 or 1 unless we introduce read/write lock semantics).
>>>>>> * "glusterfs.inodelk..granted-locks" - number of granted
>>>>>> inodelks on "basename"
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>> I personally feel, it is good to get as much information possible in
>>>>> lookup, so it helps to take some high level decisions better, in all
>>>>> translators. So, very broad answer would be to say go for it. The main
>>>>> reason the xdata is provided in all fops is to do these extra information
>>>>> fetching/overloading anyways.
>>>>>
>>>>> As you have clearly documented the need, it makes it even better to
>>>>> review and document it with commit. So, all for it.
>>>>>
>>>>> Regards,
>>>>> Amar
>>>>>
>>>>>
>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c28
>>>>>>
>>>>>>
>>>>> ___
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel@gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>
>>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [features/locks] Fetching lock info in lookup

2018-06-20 Thread Raghavendra Gowdappa
Krutika,

This patch doesn't seem to be getting counts per domain, like number of
inodelks or entrylks acquired in a domain "xyz". Am I right? If per domain
stats are not available, passing interested domains in xdata_req would be
needed. Any suggestions on that?

regards,
Raghavendra

On Wed, Jun 20, 2018 at 12:58 PM, Raghavendra Gowdappa 
wrote:

>
>
> On Wed, Jun 20, 2018 at 12:06 PM, Krutika Dhananjay 
> wrote:
>
>> We do already have a way to get inodelk and entrylk count from a bunch of
>> fops, introduced in http://review.gluster.org/10880.
>> Can you check if you can make use of this feature?
>>
>
> Thanks Krutika. Yes, this feature meets DHT's requirement. We might need a
> GLUSTERFS_PARENT_INODELK, but that can be easily added along the lines of
> other counts. If necessary I'll send a patch to implement
> GLUSTERFS_PARENT_INODELK.
>
>
>> -Krutika
>>
>>
>> On Wed, Jun 20, 2018 at 9:17 AM, Amar Tumballi 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 20, 2018 at 9:06 AM, Raghavendra Gowdappa <
>>> rgowd...@redhat.com> wrote:
>>>
>>>> All,
>>>>
>>>> We've a requirement in DHT [1] to query the number of locks granted on
>>>> an inode in lookup fop. I am planning to use xdata_req in lookup to pass
>>>> down the relevant arguments for this query. I am proposing following
>>>> signature:
>>>>
>>>> In lookup request path following key value pairs will be passed in
>>>> xdata_req:
>>>> * "glusterfs.lock.type"
>>>> - values can be "glusterfs.posix", "glusterfs.inodelk",
>>>> "glusterfs.entrylk"
>>>> * If the value of "glusterfs.lock.type" is "glusterfs.entrylk", then
>>>> basename is passed as a value in xdata_req for key
>>>> "glusterfs.entrylk.basename"
>>>> * key "glusterfs.lock-on?" will differentiate whether the lock
>>>> information is on current inode ("glusterfs.current-inode") or parent-inode
>>>> ("glusterfs.parent-inode"). For a nameless lookup "glusterfs.parent-inode"
>>>> is invalid.
>>>> * "glusterfs.blocked-locks" - Information should be limited to blocked
>>>> locks.
>>>> * "glusterfs.granted-locks" - Information should be limited to granted
>>>> locks.
>>>> * If necessary other information about granted locks, blocked locks can
>>>> be added. Since, there is no requirement for now, I am not adding these
>>>> keys.
>>>>
>>>> Response dictionary will have information in following format:
>>>> * "glusterfs.entrylk...granted-locks" - number of
>>>> granted entrylks on inode "gfid" with "basename" (usually this value will
>>>> be either 0 or 1 unless we introduce read/write lock semantics).
>>>> * "glusterfs.inodelk..granted-locks" - number of granted
>>>> inodelks on "basename"
>>>>
>>>> Thoughts?
>>>>
>>>>
>>> I personally feel, it is good to get as much information possible in
>>> lookup, so it helps to take some high level decisions better, in all
>>> translators. So, very broad answer would be to say go for it. The main
>>> reason the xdata is provided in all fops is to do these extra information
>>> fetching/overloading anyways.
>>>
>>> As you have clearly documented the need, it makes it even better to
>>> review and document it with commit. So, all for it.
>>>
>>> Regards,
>>> Amar
>>>
>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c28
>>>>
>>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [features/locks] Fetching lock info in lookup

2018-06-20 Thread Raghavendra Gowdappa
On Wed, Jun 20, 2018 at 12:06 PM, Krutika Dhananjay 
wrote:

> We do already have a way to get inodelk and entrylk count from a bunch of
> fops, introduced in http://review.gluster.org/10880.
> Can you check if you can make use of this feature?
>

Thanks Krutika. Yes, this feature meets DHT's requirement. We might need a
GLUSTERFS_PARENT_INODELK, but that can be easily added along the lines of
other counts. If necessary I'll send a patch to implement
GLUSTERFS_PARENT_INODELK.


> -Krutika
>
>
> On Wed, Jun 20, 2018 at 9:17 AM, Amar Tumballi 
> wrote:
>
>>
>>
>> On Wed, Jun 20, 2018 at 9:06 AM, Raghavendra Gowdappa <
>> rgowd...@redhat.com> wrote:
>>
>>> All,
>>>
>>> We've a requirement in DHT [1] to query the number of locks granted on
>>> an inode in lookup fop. I am planning to use xdata_req in lookup to pass
>>> down the relevant arguments for this query. I am proposing following
>>> signature:
>>>
>>> In lookup request path following key value pairs will be passed in
>>> xdata_req:
>>> * "glusterfs.lock.type"
>>> - values can be "glusterfs.posix", "glusterfs.inodelk",
>>> "glusterfs.entrylk"
>>> * If the value of "glusterfs.lock.type" is "glusterfs.entrylk", then
>>> basename is passed as a value in xdata_req for key
>>> "glusterfs.entrylk.basename"
>>> * key "glusterfs.lock-on?" will differentiate whether the lock
>>> information is on current inode ("glusterfs.current-inode") or parent-inode
>>> ("glusterfs.parent-inode"). For a nameless lookup "glusterfs.parent-inode"
>>> is invalid.
>>> * "glusterfs.blocked-locks" - Information should be limited to blocked
>>> locks.
>>> * "glusterfs.granted-locks" - Information should be limited to granted
>>> locks.
>>> * If necessary other information about granted locks, blocked locks can
>>> be added. Since, there is no requirement for now, I am not adding these
>>> keys.
>>>
>>> Response dictionary will have information in following format:
>>> * "glusterfs.entrylk...granted-locks" - number of
>>> granted entrylks on inode "gfid" with "basename" (usually this value will
>>> be either 0 or 1 unless we introduce read/write lock semantics).
>>> * "glusterfs.inodelk..granted-locks" - number of granted inodelks
>>> on "basename"
>>>
>>> Thoughts?
>>>
>>>
>> I personally feel, it is good to get as much information possible in
>> lookup, so it helps to take some high level decisions better, in all
>> translators. So, very broad answer would be to say go for it. The main
>> reason the xdata is provided in all fops is to do these extra information
>> fetching/overloading anyways.
>>
>> As you have clearly documented the need, it makes it even better to
>> review and document it with commit. So, all for it.
>>
>> Regards,
>> Amar
>>
>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c28
>>>
>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [features/locks] Fetching lock info in lookup

2018-06-19 Thread Raghavendra Gowdappa
All,

We've a requirement in DHT [1] to query the number of locks granted on an
inode in lookup fop. I am planning to use xdata_req in lookup to pass down
the relevant arguments for this query. I am proposing following signature:

In lookup request path following key value pairs will be passed in
xdata_req:
* "glusterfs.lock.type"
- values can be "glusterfs.posix", "glusterfs.inodelk",
"glusterfs.entrylk"
* If the value of "glusterfs.lock.type" is "glusterfs.entrylk", then
basename is passed as a value in xdata_req for key
"glusterfs.entrylk.basename"
* key "glusterfs.lock-on?" will differentiate whether the lock information
is on current inode ("glusterfs.current-inode") or parent-inode
("glusterfs.parent-inode"). For a nameless lookup "glusterfs.parent-inode"
is invalid.
* "glusterfs.blocked-locks" - Information should be limited to blocked
locks.
* "glusterfs.granted-locks" - Information should be limited to granted
locks.
* If necessary other information about granted locks, blocked locks can be
added. Since, there is no requirement for now, I am not adding these keys.

Response dictionary will have information in following format:
* "glusterfs.entrylk...granted-locks" - number of granted
entrylks on inode "gfid" with "basename" (usually this value will be either
0 or 1 unless we introduce read/write lock semantics).
* "glusterfs.inodelk..granted-locks" - number of granted inodelks on
"basename"

Thoughts?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1581306#c28

regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

  1   2   3   4   5   >