[Nfs-ganesha-devel] RPC queue enqueued/dequeued counter size

2017-05-15 Thread Sachin Punadikar
Hi,
Recently I came across below 2 counters for RPC queue.
static uint32_t enqueued_reqs;
static uint32_t dequeued_reqs;

Shouldn't the counter size be uint64_t ?
Having size as uint32_t will allow the counters to grow until 4294967295.
Increasing the size would help production environments.

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Need clarification on GSH_CACHE_PAD

2017-05-18 Thread Sachin Punadikar
Hi,
I come across the macro GSH_CACHE_PAD.
what I can guess from its name is it is used for padding, but unable to
understand, where exactly it is needed, also what the number indicates.
To be specific I am talking about below structure:

struct req_q_pair {
const char *s;
GSH_CACHE_PAD(0);
struct req_q producer;  /* from decoder */
GSH_CACHE_PAD(1);
struct req_q consumer;  /* to executor */
GSH_CACHE_PAD(2);
};

In above if I need to add another field like below, do I need to add the
pad ?
struct req_q_pair {
const char *s;
GSH_CACHE_PAD(0);
struct req_q producer;  /* from decoder */
GSH_CACHE_PAD(1);
struct req_q consumer;  /* to executor */
GSH_CACHE_PAD(2);
uint64_t total; /* cumulative */
};

Thanks in advance.

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Unable to find epoll_create

2017-08-29 Thread Sachin Punadikar
Hi All,
While preparing to build latest version of Ganesha 2.6, I am facing below
errors:
-- Looking for include file sys/epoll.h
-- Looking for include file sys/epoll.h - found
-- Looking for epoll_create
-- Looking for epoll_create - not found
CMake Error at cmake/modules/FindPackageHandleStandardArgs.cmake:109
(message):
  Could NOT find EPOLL (missing: EPOLL_FUNC)
Call Stack (most recent call first):
  cmake/modules/FindPackageHandleStandardArgs.cmake:317
(_FPHSA_FAILURE_MESSAGE)
  cmake/modules/FindEPOLL.cmake:21 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  libntirpc/CMakeLists.txt:118 (find_package)


-- Configuring incomplete, errors occurred!

The command I used is as below:
cmake ../src -DBUILD_CONFIG=rpmbuild -DCMAKE_BUILD_TYPE=Release
-DUSE_FSAL_GPFS=ON -DUSE_ADMIN_TOOLS=ON -DUSE_GUI_ADMIN_TOOLS=OFF
-DUSE_DBUS=ON -D_MSPAC_SUPPORT=ON

Let me know what config changes I need to do.
-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha 2.3 - assert in dec_state_owner_ref

2017-10-23 Thread Sachin Punadikar
Hi All,
A customer observing assert for Ganesha 2.3 code base as shown below in the
back trace:
(gdb) where
#0  0x7fbf55cb31d7 in raise () from /lib64/libc.so.6
#1  0x7fbf55cb48c8 in abort () from /lib64/libc.so.6
#2  0x7fbf55cac146 in __assert_fail_base () from /lib64/libc.so.6
#3  0x7fbf55cac1f2 in __assert_fail () from /lib64/libc.so.6
#4  0x004beb61 in dec_state_owner_ref (owner=0x7fbadc002c48)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/SAL/state_misc.c:1007
#5  0x004beed6 in uncache_nfs4_owner (nfs4_owner=0x7fbadc002c98)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/SAL/state_misc.c:1100
#6  0x004506d6 in reap_expired_open_owners ()
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/MainNFSD/nfs_reaper_thread.c:185
#7  0x0045093c in reaper_run (ctx=0x3b10fc0)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/MainNFSD/nfs_reaper_thread.c:249
#8  0x00521562 in fridgethr_start_routine (arg=0x3b10fc0)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/support/fridgethr.c:561
#9  0x7fbf566b4dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x7fbf55d7576d in __lseek_nocancel () from /lib64/libc.so.6
#11 0x in ?? ()

In the function "reap_expired_open_owners", I observe that the texpire (a
local variable) has value 0.
texpire = atomic_fetch_time_t(_owner->cache_expire);

>From the core:
(gdb) frame 6
#6  0x004506d6 in reap_expired_open_owners ()
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm47-0.1.1-Source/MainNFSD/nfs_reaper_thread.c:185
185 uncache_nfs4_owner(nfs4_owner);
(gdb) p tnow
$1 = 1508596802
(gdb) p texpire
$2 = 0

In the code nfs4_owner->cache_expire is set to 0, only in function
"uncache_nfs4_owner" (which is yet to be called for this crash).
So wondering how this is happening ?

Going ahead will it be good to safe guard the call to uncache_nfs4_owner,
(the reaper code) as below in the function reap_expired_open_owners (as per
2.3 code) ?















*} else {if (texpire != 0)
{   /* This cached owner has expired, uncache
it. */
uncache_nfs4_owner(nfs4_owner);
count++;}/* Get the next
owner to examine. */owner =
glist_first_entry(
_open_owners,
state_owner_t,
so_owner.so_nfs4_owner.so_state_list);    }*
-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha 2.5 - mdc_readdir_chunk_object :INODE :CRIT :Collision while adding dirent for .nfsFD8E

2017-11-07 Thread Sachin Punadikar
Hello,
During tests on Ganesha 2.5, we are getting below logs with the critical
message:


*2017-11-03 05:30:05 : epoch 000100d3 : c40abc1pn13.gpfs.net
<http://c40abc1pn13.gpfs.net> : ganesha.nfsd-36297[work-226]
mdcache_avl_insert_ck :INODE :WARN :Already existent when inserting dirent
0x3ffbe8015a60 for .nfsFD8E on entry=0x3ffb08019ed0 FSAL cookie=7fff,
duplicated directory cookies make READDIR unreliable.2017-11-03 05:30:05 :
epoch 000100d3 : c40abc1pn13.gpfs.net <http://c40abc1pn13.gpfs.net> :
ganesha.nfsd-36297[work-226] mdc_readdir_chunk_object :INODE :CRIT
:Collision while adding dirent for .nfsFD8E*

Would like to understand what exactly mean by FSAL cookie collision ? Does
it mean same operation has been done by UPCALL thread ? Is the message
really CRIT ?
If I compare with 2.3 code (I know there is lot of change related to
caching), there we are not throwing any CRIT message.

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Ganesha 2.3 and 2.5 - crash in free_nfs_request

2017-10-31 Thread Sachin Punadikar
William,
You are right, gsh_calloc is getting invoked (even for 2.3 code).
Interestingly for the core we got in testing, has almost all the fields
filled with 0xFF. So wondering is it something to do with underneath glibc
or RHEL in general.
Here is the gdb o/p indicating the same.

(gdb) p reqdata->r_u.req.svc
$7 = {rq_prog = 4294967295, rq_vers = 4294967295, rq_proc = 4294967295,
rq_cred = {oa_flavor = -1,
oa_base = 0x , oa_length = 4294967295},
  rq_clntcred = 0x7f183c0a83e0, rq_xprt = 0x7f1932423830,
  rq_clntname = 0x ,
  rq_svcname = 0x , rq_msg = 0x7f183c0a8020, rq_context = 0x0,
  rq_u1 = 0x, rq_u2 = 0x, rq_cksum =
18446744073709551615, rq_xid = 4294967295, rq_verf = {
oa_flavor = -1, oa_base = 0x , oa_length = 4294967295},
  rq_auth = 0x, rq_ap1 = 0x, rq_ap2 =
0x, rq_raddr = {ss_family = 65535,
__ss_align = 18446744073709551615, __ss_padding = '\377' }, rq_daddr = {ss_family = 65535,
__ss_align = 18446744073709551615, __ss_padding = '\377' }, rq_raddr_len = 0, rq_daddr_len = 0}
(gdb) p reqdata->r_u.req
$8 = {xprt = 0x7f1932423830, svc = {rq_prog = 4294967295, rq_vers =
4294967295, rq_proc = 4294967295, rq_cred = {
  oa_flavor = -1, oa_base = 0x , oa_length = 4294967295},
rq_clntcred = 0x7f183c0a83e0, rq_xprt = 0x7f1932423830,
rq_clntname = 0x ,
rq_svcname = 0x , rq_msg = 0x7f183c0a8020, rq_context = 0x0,
rq_u1 = 0x, rq_u2 = 0x, rq_cksum =
18446744073709551615, rq_xid = 4294967295,
rq_verf = {oa_flavor = -1, oa_base = 0x ,
  oa_length = 4294967295}, rq_auth = 0x, rq_ap1 =
0x, rq_ap2 = 0x,
rq_raddr = {ss_family = 65535, __ss_align = 18446744073709551615,
__ss_padding = '\377' },
rq_daddr = {ss_family = 65535, __ss_align = 18446744073709551615,
__ss_padding = '\377' },
rq_raddr_len = 0, rq_daddr_len = 0}, lookahead = {flags = 4294967295,
read = 65535, write = 65535}, arg_nfs = {
arg_getattr3 = {object = {data = {data_len = 4294967295,
  data_val = 0x }}}, arg_setattr3 = {object = {data = {
  data_len = 4294967295, data_val = 0x }},
  new_attributes = {mode = {set_it = -1, set_mode3_u = {mode =
4294967295}}, uid = {set_it = -1, set_uid3_u = {
uid = 4294967295}}, gid = {set_it = -1, set_gid3_u = {gid =
4294967295}}, size = {set_it = -1, set_size3_u = {
size = 18446744073709551615}}, atime = {
  set_it = (SET_TO_SERVER_TIME | SET_TO_CLIENT_TIME | unknown:
4294967292), set_atime_u = {atime = {
  tv_sec = 4294967295, tv_nsec = 4294967295}}}, mtime = {
  set_it = (SET_TO_SERVER_TIME | SET_TO_CLIENT_TIME | unknown:
4294967292), set_mtime_u = {mtime = {
  tv_sec = 4294967295, tv_nsec = 4294967295, guard = {check
= -1, sattrguard3_u = {obj_ctime = {
tv_sec = 4294967295, tv_nsec = 4294967295, arg_lookup3 =
{what = {dir = {data = {data_len = 4294967295,
data_val = 0x }},
name = 0x }}, arg_access3 = {object = {data = {
  data_len = 4294967295, data_val = 0x }},
  access = 4294967295}, arg_readlink3 = {symlink = {data = {data_len =
4294967295,
  data_val = 0x }}}, arg_read3 = {file = {data = {
  data_len = 4294967295, data_val = 0x }},
  offset = 18446744073709551615, count = 4294967295}, arg_write3 =
{file = {data = {data_len = 4294967295,
  data_val = 0x }}, offset = 18446744073709551615,
  count = 4294967295, stable = (DATA_SYNC | FILE_SYNC | unknown:
4294967292), data = {data_len = 4294967295,
data_val = 0x }}, arg_create3 = {where = {dir = {data = {
data_len = 4294967295, data_val = 0x }},
name = 0x }, how = {
mode = (GUARDED | EXCLUSIVE | unknown: 4294967292), createhow3_u =
{obj_attributes = {mode = {set_it = -1,
---Type  to continue, or q  to quit---

Let me know if any one observed this kind of behavior.
Thanks in advance.

On Mon, Oct 30, 2017 at 9:38 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 10/27/17 7:56 AM, Sachin Punadikar wrote:
>
>> Ganesha 2.3 got segfault with below :
>> [...]
>> After analyzing the core and related code found that - In
>> "thr_decode_rpc_request" function, if call to SVC_RECV fails, then
>> free_nfs_request is invoked to free the resources. But so far one of the
>> field "reqdata->r_u.req.svc.rq_auth" is not initialized nor allocated,
>> which is leading to segfault.
>>
>> The code in this area is same for Ganesha 2.3 and 2.5.
>> I

[Nfs-ganesha-devel] Ganesha 2.5, crash /segfault while executing nlm4_Unlock

2018-06-26 Thread Sachin Punadikar
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Hi All,
Recently a crash was reported by customer for Ganesha 2.5.
(gdb) where
#0  0x7f475872900b in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1  0x0041eac9 in fsal_obj_handle_fini (obj=0x7f4378028028) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/commonlib.c:192
#2  0x0053180f in mdcache_lru_clean (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:589
#3  0x00536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
flags=0, func=0x5a9380 <__func__.23209> "cih_remove_checked", line=406)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#4  0x00543e91 in cih_remove_checked (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:406
#5  0x00544b26 in mdc_clean_entry (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:235
#6  0x0053181e in mdcache_lru_clean (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592
#7  0x00536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
flags=0, func=0x5a70af <__func__.23112> "mdcache_put", line=190)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#8  0x00539666 in mdcache_put (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190
#9  0x0053f062 in mdcache_put_ref (obj_hdl=0x7f4378028028) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1709
#10 0x0049bf0f in nlm4_Unlock (args=0x7f4294165830,
req=0x7f4294165028, res=0x7f43f001e0e0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/Protocols/NLM/nlm_Unlock.c:128
#11 0x0044c719 in nfs_rpc_execute (reqdata=0x7f4294165000) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1290
#12 0x0044cf23 in worker_run (ctx=0x3c200e0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1562
#13 0x0050a3e7 in fridgethr_start_routine (arg=0x3c200e0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/support/fridgethr.c:550
#14 0x7f4758725dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x7f4757de673d in clone () from /lib64/libc.so.6

A closer look at the backtrace indicates that there was cyclic flow of
execution as below:
nlm4_Unlock -> mdcache_put_ref -> mdcache_put -> _mdcache_lru_unref ->
mdcache_lru_clean -> fsal_obj_handle_fini and then mdc_clean_entry ->
cih_remove_checked ->   (purposely coping next flow on below line)

-> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini
(currently crashing here)

Do we see any code issue here ? Any hints on how to RCA this issue ?
Thanks in advance.

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha 2.3 and 2.5 - crash in free_nfs_request

2017-10-27 Thread Sachin Punadikar
Hello,
Ganesha 2.3 got segfault with below :
















*Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N N'.Program terminated with signal 11,
Segmentation fault.#0  0x0044b4dd in free_nfs_request
(reqdata=0x7f19c5e48010)at
/usr/src/debug/nfs-ganesha-2.3.2-ibm51-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:14901490
SVCAUTH_RELEASE(reqdata->r_u.req.svc.rq_auth,Missing separate debuginfos,
use: debuginfo-install dbus-libs-1.6.12-13.el7.x86_64
glibc-2.17-105.el7.x86_64 gssproxy-0.4.1-7.el7.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64
libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-26.el7.x86_64
libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64
libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-26.el7.x86_64
pcre-8.32-15.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64(gdb) where#0
0x0044b4dd in free_nfs_request (reqdata=0x7f19c5e48010)at
/usr/src/debug/nfs-ganesha-2.3.2-ibm51-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1490#1
0x0044c297 in thr_decode_rpc_request (context=0x0,
xprt=0x7f1932423830)at
/usr/src/debug/nfs-ganesha-2.3.2-ibm51-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1836#2
0x0044c355 in thr_decode_rpc_requests (thr_ctx=0x7f17c00b6f10)
at
/usr/src/debug/nfs-ganesha-2.3.2-ibm51-0.1.1-Source/MainNFSD/nfs_rpc_dispatcher_thread.c:1858#3
0x00520bc6 in fridgethr_start_routine (arg=0x7f17c00b6f10)at
/usr/src/debug/nfs-ganesha-2.3.2-ibm51-0.1.1-Source/support/fridgethr.c:561#4
0x7f19c462bdc5 in start_thread () from /lib64/libpthread.so.0#5
0x7f19c3ceb1cd in clone () from /lib64/libc.so.6*

After analyzing the core and related code found that - In
"thr_decode_rpc_request" function, if call to SVC_RECV fails, then
free_nfs_request is invoked to free the resources. But so far one of the
field "reqdata->r_u.req.svc.rq_auth" is not initialized nor allocated,
which is leading to segfault.

The code in this area is same for Ganesha 2.3 and 2.5.
I have created below patch to overcome this issue. Please review and if
suitable merge with Ganesha 2.5 stable.
https://github.com/sachinpunadikar/nfs-ganesha/commit/91baffa8bd197c78eff106f42927a370155ae6b4

Ganesha 2.6 code in this area has lot of changes. Was not able to check
whether 2.6 is affected or not.
-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] READDIR doesn't return all entries.

2018-02-12 Thread Sachin Punadikar
Pradeep,
The patch is to catch FSAL cookie related issues and when got one, it will
ask NFS client to read again.
Will you please provide Ganesha logs & tcpdump for both working (RC2) and
non-working (RC6) cases?
 - Sachin.

On Tue, Feb 13, 2018 at 7:05 AM, Pradeep <pradeeptho...@gmail.com> wrote:

> Hello,
>
> I noticed that with large number of directory entries, READDIR does not
> return all entries. It happened with RC5; but works fine in RC2. I looked
> through the changes and the offending change seems to be this one:
>
> https://github.com/nfs-ganesha/nfs-ganesha/commit/
> 985564cbd147b6acc5dd6de61a3ca8fbc6062eda
>
> (reverted the change and verified that all entries are returned without
> this change)
>
> Still looking into why it broke READDIR for me. Any insights on debugging
> this would be helpful.
>
> Thanks,
> Pradeep
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>


-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha crash in dec_nlm_state_ref

2018-08-29 Thread Sachin Punadikar
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Hello,
Recently a customer reported below crash:
 #0  0x3fff7dbd39ac in raise (sig=) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
37return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM
(THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install
dbus-libs-1.10.24-7.el7.ppc64le elfutils-libelf-0.170-4.el7.ppc64le
elfutils-libs-0.170-4.el7.ppc64le
(gdb) where
#0  0x3fff7dbd39ac in raise (sig=) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x10070b38 in crash_handler (signo=11, info=0x3ffaacffcdc8,
ctx=0x3ffaacffc050) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/MainNFSD/nfs_init.c:225
#2  
#3  0x101b4b70 in mdc_cur_export () at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:544
#4  0x101b72e0 in mdcache_close2 (obj_hdl=0x3ffbb0897e98,
state=0x3ffe8ceafb30)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:1047
#5  0x10135054 in dec_nlm_state_ref (state=0x3ffe8ceafb30) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/nlm_state.c:340
#6  0x100f7894 in dec_state_t_ref (state=0x3ffe8ceafb30) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/include/sal_functions.h:445
#7  0x100fa5d4 in remove_from_locklist (lock_entry=0x3ffe8c8916e0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/state_lock.c:769
#8  0x100fd2b0 in try_to_grant_lock (lock_entry=0x3ffe8c8916e0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/state_lock.c:1834
#9  0x100fd45c in process_blocked_lock_upcall
(block_data=0x3ffe8c1f4630) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/state_lock.c:1850
#10 0x100f62e4 in state_blocked_lock_caller (ctx=0x3ffaa167a1c0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/state_async.c:68
#11 0x10169144 in fridgethr_start_routine (arg=0x3ffaa167a1c0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/support/fridgethr.c:550
#12 0x3fff7dbc8728 in start_thread (arg=0x3ffaacffe810) at
pthread_create.c:310
#13 0x3fff7da07ae0 in clone () at
../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

To fix the same I have uploaded patch :
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/423882

- Sachin

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha abort due to double free

2018-08-31 Thread Sachin Punadikar
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Hello,
A customer reported Ganesha crash/abort due to double free.
The stack trace is as below :
(gdb) where
#0 0x3fff889c39ac in raise (sig=) at
../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1 0x10070b38 in crash_handler (signo=6, info=0x3ffefc7fc728,
ctx=0x3ffefc7fb9b0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/MainNFSD/nfs_init.c:225
#2 
#3 0x3fff8871e578 in __GI_raise (sig=) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4 0x3fff887206fc in __GI_abort () at abort.c:90
#5 0x3fff88764844 in __libc_message (do_abort=,
fmt=0x3fff888656d0 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#6 0x3fff8876f284 in malloc_printerr (ar_ptr=0x3ffa9020,
ptr=, str=0x3fff88865798 "double free or corruption
(fasttop)",
action=3) at malloc.c:5013
#7 _int_free (av=0x3ffa9020, p=, have_lock=) at malloc.c:3835
#8 0x100f6edc in gsh_free (p=0x3ffa9a00) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/include/abstract_mem.h:271
#9 0x1010460c in cancel_all_nlm_blocked () at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/state_lock.c:3799
#10 0x1012a154 in nfs_release_nlm_state (release_ip=0x10031a3c3d6
"10.200.10.107")
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/nfs4_recovery.c:1213
#11 0x10125588 in nfs4_start_grace (gsp=0x3ffefc7fd978) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/SAL/nfs4_recovery.c:106
#12 0x1007bc18 in admin_dbus_grace (args=0x3ffefc7fdaa0,
reply=0x1002ea01350, error=0x3ffefc7fda80)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/MainNFSD/nfs_admin_thread.c:166
#13 0x101ca3e4 in dbus_message_entrypoint (conn=0x1002ea00e10,
msg=0x1002ea011b0, user_data=0x102414c0 )
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/dbus/dbus_server.c:512
#14 0x3fff88d5164c in _dbus_object_tree_dispatch_and_unlock () from
/lib64/libdbus-1.so.3
#15 0x3fff88d3b950 in dbus_connection_dispatch () from
/lib64/libdbus-1.so.3
#16 0x3fff88d3bda8 in _dbus_connection_read_write_dispatch () from
/lib64/libdbus-1.so.3
#17 0x101cb360 in gsh_dbus_thread (arg=0x0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.04-0.1.1-Source/dbus/dbus_server.c:741
#18 0x3fff889b8728 in start_thread (arg=0x3ffefc7fe810) at
pthread_create.c:310
#19 0x3fff887f7ae0 in clone () at
../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

I have uploaded a patch, which can potentially avoid double free.
https://review.gerrithub.io/#/c/424260/

I have below patch which can potentially fix the double free.
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/424260
-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: "mdc_lookup" do not dispatch to FSAL

2018-02-27 Thread Sachin Punadikar
Bill,
The issue is always re-producible for the customer.
In another email chain, I have provided the Ganesha logs, which clearly
indicates the lookup do not goes to FSAL layer to fetch the more current
data if uncached is set to false.

On Fri, Feb 16, 2018 at 9:40 PM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 2/15/18 6:44 AM, GerritHub wrote:
>
>> Sachin Punadikar has uploaded this change for *review*.
>>
>> View Change <https://review.gerrithub.io/400037>
>>
>> "mdc_lookup" do not dispatch to FSAL
>>
>> Are you sure?  Do you have an actual reproducible error case?
>
>
> "mdc_lookup" function first attempts to get the entry from cache
>> via function "mdc_try_get_cached". On getting ESATLE error, it
>> should dispatch to FSAL, but was again calling "mdc_try_get_cached".
>> Rectified code to make call to "mdc_lookup_uncached", so FSAL code
>> gets invoked.
>>
>> I'm not the mdcache expert, but don't think this is correct.  The
> comments already explain.
>
> It tries under read lock (fastest).  If stale, it write locks and
> tries again.  If still fails, at the uncached label, then it does
> the mdc_lookup_uncached().
>
> mdc_try_get_cached() is likely faster than mdc_lookup_uncached().
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesh 2.3 : NFSv4 client gets error NFS4ERR_OLD_STATEID

2018-04-09 Thread Sachin Punadikar
nfs4_Compound :NFS4 :DEBUG :Status of OP_READ in position 1 =
NFS4ERR_OLD_STATEID
-

-- 
with regards,
Sachin Punadikar
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] Ganesha crash in lock_avail

2018-12-06 Thread Sachin Punadikar
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.Hello,
Customer reported below crash:
(gdb) where
#0  0x7fa70c161fcb in raise () from /lib64/libpthread.so.0
#1  0x00454884 in crash_handler (signo=11, info=0x7fa5a1ff9f30,
ctx=0x7fa5a1ff9e00)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/MainNFSD/nfs_init.c:225
#2  
#3  0x in ?? ()
#4  0x00435084 in lock_avail (vec=0x18f07c8, file=0x7fa420157fd8,
owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_top.c:179
#5  0x005386eb in mdc_up_lock_avail (vec=0x18f07c8,
file=0x7fa420157fd8, owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:380
#6  0x00439c72 in queue_lock_avail (ctx=0x7fa40c039c40)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_async.c:247
#7  0x0050a32c in fridgethr_start_routine (arg=0x7fa40c039c40)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/support/fridgethr.c:550
#8  0x7fa70c15adc5 in start_thread () from /lib64/libpthread.so.0
#9  0x7fa70b81a1cd in clone () from /lib64/libc.so.6

It was found that op_ctx was not proper.
(gdb) frame 4
#4  0x00435084 in lock_avail (vec=0x18f07c8, file=0x7fa420157fd8,
owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_top.c:179
179obj->obj_ops.put_ref(obj);
(gdb) p *obj
$2 = {handles = {next = 0x0, prev = 0x0}, fs = 0x193e240, fsal = 0x0,
obj_ops = {get_ref = 0x0,
put_ref = 0x0, release = 0x0, merge = 0x0, lookup = 0x0, readdir = 0x0,
compute_readdir_cookie = 0x0, dirent_cmp = 0x0, create = 0x0, mkdir =
0x0, mknode = 0x0,
symlink = 0x0, readlink = 0x0, test_access = 0x0, getattrs = 0x0,
setattrs = 0x0, link = 0x0,
fs_locations = 0x0, rename = 0x0, unlink = 0x0, open = 0x0, reopen =
0x0, status = 0x0,
read = 0x0, read_plus = 0x0, write = 0x0, write_plus = 0x0, seek = 0x0,
io_advise = 0x0,
commit = 0x0, lock_op = 0x0, share_op = 0x0, close = 0x0,
list_ext_attrs = 0x0,
getextattr_id_by_name = 0x0, getextattr_value_by_name = 0x0,
getextattr_value_by_id = 0x0,
setextattr_value = 0x0, setextattr_value_by_id = 0x0,
remove_extattr_by_id = 0x0,
remove_extattr_by_name = 0x0, handle_is = 0x0, handle_to_wire = 0x0,
handle_to_key = 0x0,
handle_cmp = 0x0, layoutget = 0x0, layoutreturn = 0x0, layoutcommit =
0x0, getxattrs = 0x0,
setxattrs = 0x0, removexattrs = 0x0, listxattrs = 0x0, open2 = 0x0,
check_verifier = 0x0,
status2 = 0x0, reopen2 = 0x0, read2 = 0x0, write2 = 0x0, seek2 = 0x0,
io_advise2 = 0x0,
commit2 = 0x0, lock_op2 = 0x0, setattr2 = 0x0, close2 = 0x0}, obj_lock
= {__data = {
  __lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup =
0,
  __nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 0,
__shared = 0, __pad1 = 0,
  __pad2 = 0, __flags = 0}, __size = '\000' , __align
= 0},
  type = REGULAR_FILE, fsid = {major = 11073324921844891658, minor = 1},
fileid = 229392385,
  state_hdl = 0x7fa51006aea0}

(gdb) frame 5
#5  0x005386eb in mdc_up_lock_avail (vec=0x18f07c8,
file=0x7fa420157fd8,
owner=0x7fa4f8189fc0, lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:380
380rc = myself->super_up_ops.lock_avail(vec, file, owner,
(gdb) p op_ctx
$3 = (struct req_op_context *) 0x7fa5a1ffa430
(gdb) p *op_ctx
$4 = {creds = 0x0, original_creds = {caller_uid = 0, caller_gid = 0,
caller_glen = 0,
caller_garray = 0x0}, caller_gdata = 0x0, caller_garray_copy = 0x0,
managed_garray_copy = 0x0,
  cred_flags = 0, caller_addr = 0x0, clientid = 0x0, nfs_vers = 0,
nfs_minorvers = 0,
  req_type = 0, client = 0x0, ctx_export = 0x18efc78, fsal_export =
0x18f0680, export_perms = 0x0,
  start_time = 0, queue_wait = 0, fsal_private = 0x0, fsal_module = 0x0,
fsal_pnfs_ds = 0x0}
(gdb)

In the above it shows that op_ctx is not set properly. "fsal_module" is
NULL.

To fix this issue I have posted a patch.
https://review.gerrithub.io/#/c/436356/
-- 
with regards,
Sachin Punadikar
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel