[Gluster-devel] Hello, I have a question about the erasure code translator, hope someone give me some advice, thank you!

2019-04-07 Thread PSC
Hi, I am a storage software coder who is interested in Gluster. I am trying to 
improve the read/write performance of it.  
 

 
 
I noticed that gluster is using Vandermonde matrix in erasure code encoding and 
decoding process. However, it is quite complicate to generate inverse matrix of 
a Vandermonde matrix, which is necessary for decode. The cost is O(n³).  
 

 
 
Use a Cauchy matrix, can greatly cut down the cost of the process to find an 
inverse matrix. Which is O(n²).
 

 
 
I use intel storage accelerate library to replace the original ec encode/decode 
part of gluster. And it reduce the encode and decode time to about 50% of the 
original one.
 

 
 
However, when I test the whole system. The read/write performance is almost the 
same as the original gluster.  
 

 
 
I test it on three machines as servers. Each one had two bricks, both of them 
are SSD. So the total  amount of bricks is 6. Use two of them as coding bricks. 
That is a 4+2 disperse volume configure.
 

 
 
The capability of network card is 1Mbps. Theoretically it can support read 
and write with the speed faster than 1000MB/s.
 

 
 
The actually performance of read is about 492MB/s.
 
The actually performance of write is about 336MB/s.
 

 
 
While the original one read at 461MB/s, write at 322MB/s
 

 
 
Is there someone who can give me some advice about how to improve its 
performance? Which part is the critical defect on its performance if it’s not 
the ec translator?  
 

 
 
I did a time count on translators. It show me EC translator just take 7% in the 
whole read\write process. Even though I knew that some translators are run 
asynchronous, so the real percentage can be some how lager than that.  
 

 
 
Sincerely thank you for your patient to read my question!___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] glusterd stuck for glusterfs with version 3.12.15

2019-04-07 Thread Zhou, Cynthia (NSB - CN/Hangzhou)
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not 
responding to any gluster commands, when I checked this issue I find that 
glusterd thread 9 and thread 8 is dealing with the same socket, I thought 
following patch should be able to solve this issue, however after I merged this 
patch this issue still exist. When I looked into this code, it seems 
socket_event_poll_in called event_handled before rpc_transport_pollin_destroy, 
I think this gives the chance for another poll for the exactly the same socket. 
And caused this glusterd stuck issue, also, I find there is no   
LOCK_DESTROY(&iobref->lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your 
opinion on this issue, thanks!

SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537

* socket: fix issue on concurrent handle of a socket



GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in 
iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f9ed700 (LWP 1932)):
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fd2b42 in __pthread_mutex_cond_lock () from 
/lib64/libpthread.so.0
#2  0x7f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#3  0x7f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, 
idx=27) at socket.c:1201
#4  0x7f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2480
#5  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edfffee84) at event-epoll.c:583
#6  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180cf20) at 
event-epoll.c:659
#7  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6

(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3  0x7f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) 
at rpc-transport.c:123
#4  0x7f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, 
notify_handled=_gf_true) at socket.c:2322
#5  0x7f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, 
data=0x7f9ed0049cc0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6  0x7f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, 
event=0x7f9edf7fde84) at event-epoll.c:583
#7  0x7f9eeb2828ab in event_dispatch_epoll_worker (data=0x180d0c0) at 
event-epoll.c:659
#8  0x7f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9  0x7f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2  0x7f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, 
__owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
__elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
  __size = 
"\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
 __align = 953482739714}}, ref = -256, iobrefs = 0x, alloced = 
-1, used = -1}
(gdb) quit
A debugging session is active.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Weekly Untriaged Bugs

2019-04-07 Thread jenkins
[...truncated 6 lines...]
https://bugzilla.redhat.com/1688226 / core: Brick Still Died After Restart 
Glusterd & Glusterfsd Services
https://bugzilla.redhat.com/1695416 / core: client log flooding with 
intentional socket shutdown message when a brick is down
https://bugzilla.redhat.com/1691833 / core: Client sends 128KByte network 
packet for 0 length file copy
https://bugzilla.redhat.com/1695480 / core: Global Thread Pool
https://bugzilla.redhat.com/1694943 / core: parallel-readdir slows down 
directory listing
https://bugzilla.redhat.com/1696721 / geo-replication: geo-replication failing 
after upgrade from 5.5 to 6.0
https://bugzilla.redhat.com/1694637 / geo-replication: Geo-rep: Rename to an 
existing file name destroys its content on slave
https://bugzilla.redhat.com/1689981 / geo-replication: OSError: [Errno 1] 
Operation not permitted - failing with socket files?
https://bugzilla.redhat.com/1694139 / glusterd: Error waiting for job 
'heketi-storage-copy-job' to complete on one-node k3s deployment.
https://bugzilla.redhat.com/1695099 / glusterd: The number of glusterfs 
processes keeps increasing, using all available resources
https://bugzilla.redhat.com/1690454 / posix-acl: mount-shared-storage.sh does 
not implement mount options
https://bugzilla.redhat.com/1696518 / project-infrastructure: builder203 does 
not have a valid hostname set
https://bugzilla.redhat.com/1691617 / project-infrastructure: clang-scan tests 
are failing nightly.
https://bugzilla.redhat.com/1691357 / project-infrastructure: core archive link 
from regression jobs throw not found error
https://bugzilla.redhat.com/1692349 / project-infrastructure: 
gluster-csi-containers job is failing
https://bugzilla.redhat.com/1693385 / project-infrastructure: request to change 
the version of fedora in fedora-smoke-job
https://bugzilla.redhat.com/1693295 / project-infrastructure: rpc.statd not 
started on builder204.aws.gluster.org
https://bugzilla.redhat.com/1691789 / project-infrastructure: rpc-statd service 
stops on AWS builders
https://bugzilla.redhat.com/1695484 / project-infrastructure: smoke fails with 
"Build root is locked by another process"
https://bugzilla.redhat.com/1693184 / replicate: A brick process(glusterfsd) 
died with 'memory violation'
https://bugzilla.redhat.com/1696075 / replicate: Client lookup is unable to 
heal missing directory GFID entry
https://bugzilla.redhat.com/1696633 / tests: GlusterFs v4.1.5 Tests from 
/tests/bugs/ module failing on Intel
https://bugzilla.redhat.com/1694976 / unclassified: On Fedora 29 GlusterFS 4.1 
repo has bad/missing rpm signs
[...truncated 2 lines...]

build.log
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel