Re: [Gluster-users] Create gluster volume on machines with one hard disc

2016-05-23 Thread Lindsay Mathieson
On 20 May 2016 at 20:19, David Comeyne 
wrote:

> for example I have this: 1 physical machine with 7 nodes. Only 1 SSD per
> node.


I must confess I don't understand your terminology in this context - what
are you meaning by a "node"? a VM?


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] performance issue in gluster volume

2016-05-23 Thread Ramavtar

Hi Ravi,

I am using gluster volume and it's having 2.7 TB data ( mp4 and jpeg 
files)  with nginx webserver


I am facing performance related issue with gluster volume please help me 
please find  the gluster details :



[root@webnode3 ~]# gluster --version
glusterfs 3.7.11 built on Apr 27 2016 14:09:22
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. 
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU 
General Publ

ic License.


[root@webnode3 ~]# gluster volume info

Volume Name: DATA-STORE
Type: Distributed-Replicate
Volume ID: b64c1fea-1500-4014-b36a-0487818fa893
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: webhost75:/home/DATABRINK
Brick2: webhost90:/home/DATABRINK
Brick3: mysqlhost1:/home/DATABRINK
Brick4: mysqlhost2:/home/DATABRINK
Options Reconfigured:
performance.readdir-ahead: on

I mounted this volume on same server with fuse.

please help me.

Thanks,
Ram


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Create gluster volume on machines with one hard disc

2016-05-23 Thread David Comeyne
Hi all, question...

Is it useful to create gluster volume on machines with one hard disc?

for example I have this: 1 physical machine with 7 nodes. Only 1 SSD per node.
Now I have shared storage on the master node that is shared across all other 
nodes using NFS.
Is it handy to create distributed gluster volume across all nodes? Keeping in 
mind there is only 1 SSD per node...

I was thinking about creating something like this:
Volume Name: sharedvolume
Type: Distribute
Number of Bricks: 8
Transport-type: tcp
Bricks:
Brick1: node0:/data/brick/shared
Brick2: node1:/data/brick/shared
Brick3: node2:/data/brick/shared
Brick4: node3:/data/brick/shared
Brick5: node4:/data/brick/shared
Brick6: node5:/data/brick/shared
Brick7: node6:/data/brick/shared
Brick8: node7:/data/brick/shared

And then share it on each node as:
nodeX:/sharedvolume /storage/shared glusterfs defaults,_netdev 0 0

A little more information on the set-up:
MASTER NODE (1x):
/dev/sda: 480.1 GB
/dev/mapper/centos_node0-root: 445.8 GB
/dev/mapper/centos_node0-swap: 33.8 GB

WORKER NODE (6x):
/dev/sda: 240.1 GB
/dev/mapper/centos_nodeX-root: 215.5 GB
/dev/mapper/centos_nodeX-swap: 24.0 GB

The root / needs a lot of space for /tmp. But the /storage/shared is also on 
the root /.
It sounds dangerous to create another logical volume for the shared storage. If 
the /tmp is full and the /storage/shared is not using a lot of space then this 
extra LV is not handy.

David Comeyne
System Engineer

[http://www.applied-maths.com/pct/logo.gif]
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Distro for Building Gluster

2016-05-23 Thread Lindsay Mathieson
Looking at testing the 3.8 branch and of course will be building for
src. In general I find the least hassle by doing that with the distro
the dev's develop for :) Would that be redhat/centos?

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Lindsay Mathieson
On 24 May 2016 at 09:09, Kevin Lemonnier  wrote:
> So I just did that, seems to have worked fine.

Excellent

>30 - 40 minutes of VM freez. Looking forward to updating to
> 3.7.11 to avoid that !

It has been working very well for me - had a power outage yesterday
and two nodes didn't respond to the shutdown notification from the UPS
and kept writing when node 3 shutdown. Everything came back up ok and
there were several thousand shards that needed healing, but all the
VM's started ok while it did that.

>
> Anyway, at least for now, seems to be resolved. I hope it won't do that again
> though

Crossed Paws.




-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Kevin Lemonnier
So I just did that, seems to have worked fine.
I stopped the gluster daemon and killed what remained (the brick),
the heal just finished. Took about an hour to complete, with something
like 30 - 40 minutes of VM freez. Looking forward to updating to
3.7.11 to avoid that !

Anyway, at least for now, seems to be resolved. I hope it won't do that again
though

Thanks


On Mon, May 23, 2016 at 07:05:56PM +0200, Kevin Lemonnier wrote:
> On Mon, May 23, 2016 at 04:06:06PM +0100, Anant wrote:
> >Have you tried to stop all services of gluster ??  Like - glusterd ,
> >glusterfsd
> > 
> 
> /etc/init.d/gluster-server is the only thing I have. But anyway when I stop it
> I can see the brick process is still up, so that's clearly the problem. I'll
> try to kill it at night, for now the other two nodes are running fine and I 
> just
> can't afford to freez all the VM for the hour it'll take to heal right now.
> 
> 
> -- 
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111



> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Lindsay Mathieson
On 24 May 2016 at 00:31, Kevin Lemonnier  wrote:
> Status: Brick is not connected
>
> The network seems to be working fine, the 3 nodes are in cluster and can ping
> each other. gluster volume status lists the 3 bricks and I restarted the 
> daemon
> on the third node to be sure, still the same result.


I had that exact same problem recently when running tests with 3.7.11,
the only thing that resolved was to kill *all* gluster processes
(glusterd, glusterfs, glusterfsd) on the node and restart glusterd

-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Kevin Lemonnier
On Mon, May 23, 2016 at 04:06:06PM +0100, Anant wrote:
>Have you tried to stop all services of gluster ??  Like - glusterd ,
>glusterfsd
> 

/etc/init.d/gluster-server is the only thing I have. But anyway when I stop it
I can see the brick process is still up, so that's clearly the problem. I'll
try to kill it at night, for now the other two nodes are running fine and I just
can't afford to freez all the VM for the hour it'll take to heal right now.


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Anant
Have you tried to stop all services of gluster ??  Like - glusterd , 
glusterfsd



On 23/05/16 15:51, Kevin Lemonnier wrote:

Looks like it crashed, if I do a /etc/init.d/gluster-server stop on the third 
node and
ps aux | grep gluster after that there is still a lot of processes listed.
Should I kill everything ?

On Mon, May 23, 2016 at 04:31:28PM +0200, Kevin Lemonnier wrote:

Hi,

We have in production 3 nodes for a volume storing VM files on 3.7.6,
suddenly the third node isn't seen by the others. For exemple in the heal
infos :

Status: Brick is not connected

The network seems to be working fine, the 3 nodes are in cluster and can ping
each other. gluster volume status lists the 3 bricks and I restarted the daemon
on the third node to be sure, still the same result.
The brick is stored on xfs on /mnt/vg1-storage and all the files seems to be 
there,
it's not read only or anything.

Where can I check ?
Thanks


--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users



--
Anant Saraswat

System Administrator

Direct: +91 124 4548387
Tel: +91 124 4548383 Ext- 1018
UK: +44 845 0047 142 Ext- 5020

TBS Website 
Techblue Software Pvt. Ltd
The Palms, Plot No 73, Sector 5, IMT Manesar,
Gurgaon- 122050 (Hr.)

www.techbluesoftware.co.in 


	TBS Facebook 
 
TBS Twitter  TBS Google+ 
 TBS Linked In 



TBS Branding 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Questions about healing

2016-05-23 Thread Alastair Neil
yes it's configurable with:

network.ping-timeout

and default is 42 seconds I believe.

On 22 May 2016 at 03:39, Kevin Lemonnier  wrote:

> > Let's assume 10.000 shard on a server being healed.
> > Gluster heal 1 shard at once, so the other 9.999 pieces would be read
> > from the other servers
> > to keep VM running ? If yes, this is good. If not, in this case, the
> > whole VM need to be healed
> > and thus, the whole VM would hangs
>
> Yes, that seems to be what's hapenning on 3.7.11.
> Couldn't notice any freez during heals, except for a brief one when
> a node just went down : looks like gluster hangs for a few seconds
> while waiting for the node before deciding to mark it down and continue
> without it.
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Ravishankar N



On 05/23/2016 08:21 PM, Kevin Lemonnier wrote:

Looks like it crashed, if I do a /etc/init.d/gluster-server stop on the third 
node and
ps aux | grep gluster after that there is still a lot of processes listed.
Should I kill everything ?


When you create a file from the mount, does it get replicated to the 
third brick?
In any case you can kill all brick processes of a node with `pkill 
glusterfsd`. Then if you restart glusterd on the same node with `service 
glusterd restart`, it should bring back all brick processes. You might 
want to just try restarting glusterd without actually killing the bricks 
first.




On Mon, May 23, 2016 at 04:31:28PM +0200, Kevin Lemonnier wrote:

Hi,

We have in production 3 nodes for a volume storing VM files on 3.7.6,
suddenly the third node isn't seen by the others. For exemple in the heal
infos :

Status: Brick is not connected

The network seems to be working fine, the 3 nodes are in cluster and can ping
each other. gluster volume status lists the 3 bricks and I restarted the daemon
on the third node to be sure, still the same result.
The brick is stored on xfs on /mnt/vg1-storage and all the files seems to be 
there,
it's not read only or anything.

Where can I check ?
Thanks


--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick not seen - urgent

2016-05-23 Thread Kevin Lemonnier
Looks like it crashed, if I do a /etc/init.d/gluster-server stop on the third 
node and
ps aux | grep gluster after that there is still a lot of processes listed.
Should I kill everything ?

On Mon, May 23, 2016 at 04:31:28PM +0200, Kevin Lemonnier wrote:
> Hi,
> 
> We have in production 3 nodes for a volume storing VM files on 3.7.6,
> suddenly the third node isn't seen by the others. For exemple in the heal
> infos :
> 
> Status: Brick is not connected
> 
> The network seems to be working fine, the 3 nodes are in cluster and can ping
> each other. gluster volume status lists the 3 bricks and I restarted the 
> daemon
> on the third node to be sure, still the same result.
> The brick is stored on xfs on /mnt/vg1-storage and all the files seems to be 
> there,
> it's not read only or anything.
> 
> Where can I check ?
> Thanks
> 
> 
> -- 
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111



> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Brick not seen - urgent

2016-05-23 Thread Kevin Lemonnier
Hi,

We have in production 3 nodes for a volume storing VM files on 3.7.6,
suddenly the third node isn't seen by the others. For exemple in the heal
infos :

Status: Brick is not connected

The network seems to be working fine, the 3 nodes are in cluster and can ping
each other. gluster volume status lists the 3 bricks and I restarted the daemon
on the third node to be sure, still the same result.
The brick is stored on xfs on /mnt/vg1-storage and all the files seems to be 
there,
it's not read only or anything.

Where can I check ?
Thanks


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Possible error not being returned

2016-05-23 Thread Ankireddypalle Reddy
Xavier,
Please find below logs from ws-glus.lg where /ws/glus is the 
gluster mount point in question.

2016-05-18 21:13:00.477609] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-SDSStoragePool-disperse-2: Heal failed [Transport endpoint is not connected]
[2016-05-18 21:13:00.556553] E [MSGID: 114030] 
[client-rpc-fops.c:3022:client3_3_readv_cbk] 0-SDSStoragePool-client-1: XDR 
decoding failed [Invalid argument]
[2016-05-18 21:13:00.556588] W [MSGID: 114031] 
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-SDSStoragePool-client-1: remote 
operation failed [Invalid argument]
[2016-05-18 21:13:00.557754] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-0: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=5, bad=2)
[2016-05-18 21:13:00.566626] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-0: Operation failed 
on some subvolumes (up=7, mask=5, remaining=0, good=5, bad=2)
[2016-05-18 21:13:00.568694] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-0: Operation failed 
on some subvolumes (up=7, mask=5, remaining=0, good=5, bad=2)
[2016-05-18 21:13:00.620563] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-SDSStoragePool-disperse-0: Heal failed [Transport endpoint is not connected]
[2016-05-18 21:13:00.631860] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-SDSStoragePool-disperse-0: Heal failed [Transport endpoint is not connected]
[2016-05-18 21:13:01.576095] E [MSGID: 114030] 
[client-rpc-fops.c:3022:client3_3_readv_cbk] 0-SDSStoragePool-client-13: XDR 
decoding failed [Invalid argument]
[2016-05-18 21:13:01.576137] W [MSGID: 114031] 
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-SDSStoragePool-client-13: remote 
operation failed [Invalid argument]
[2016-05-18 21:13:01.576574] E [MSGID: 114030] 
[client-rpc-fops.c:3022:client3_3_readv_cbk] 0-SDSStoragePool-client-12: XDR 
decoding failed [Invalid argument]
[2016-05-18 21:13:01.576598] W [MSGID: 114031] 
[client-rpc-fops.c:3050:client3_3_readv_cbk] 0-SDSStoragePool-client-12: remote 
operation failed [Invalid argument]
[2016-05-18 21:13:01.576810] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-4: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=6, bad=1)
[2016-05-18 21:13:01.582926] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-4: Operation failed 
on some subvolumes (up=7, mask=7, remaining=0, good=5, bad=2)
[2016-05-18 21:13:01.590063] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-4: Operation failed 
on some subvolumes (up=7, mask=6, remaining=0, good=6, bad=1)
[2016-05-18 21:13:01.590798] E [MSGID: 122034] 
[ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-4: Insufficient 
available childs for this request (have 1, need 2)
[2016-05-18 21:13:01.592769] W [MSGID: 122053] 
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-4: Operation failed 
on some subvolumes (up=7, mask=6, remaining=0, good=6, bad=1)
The message "W [MSGID: 122053] [ec-common.c:116:ec_check_status] 
0-SDSStoragePool-disperse-4: Operation failed on some subvolumes (up=7, mask=6, 
remaining=0, good=6, bad=1)" repeated 3 times between [2016-05-18 
21:13:01.592769] and [2016-05-18 21:13:01.598369]
[2016-05-18 21:13:01.613843] I [MSGID: 122058] [ec-heal.c:2338:ec_heal_do] 
0-SDSStoragePool-disperse-4: 
/Folder_05.11.2016_19.56/CV_MAGNETIC/V_22378/CHUNK_209062/SFILE_CONTAINER_020: 
name heal successful on 7
[2016-05-18 21:13:01.699580] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 
0-SDSStoragePool-disperse-2: Heal failed [Transport endpoint is not connected]
[2016-05-18 21:13:01.833863] I [MSGID: 122058] [ec-heal.c:2338:ec_heal_do] 
0-SDSStoragePool-disperse-4: 
/Folder_05.11.2016_19.56/CV_MAGNETIC/V_22378/CHUNK_209062/SFILE_CONTAINER_020: 
name heal successful on 7
The message "I [MSGID: 122058] [ec-heal.c:2338:ec_heal_do] 
0-SDSStoragePool-disperse-4: 
/Folder_05.11.2016_19.56/CV_MAGNETIC/V_22378/CHUNK_209062/SFILE_CONTAINER_020: 
name heal successful on 7" repeated 5 times between [2016-05-18 
21:13:01.833863] and [2016-05-18 21:13:02.098833]  

The read from file 
/ws/glus/Folder_05.11.2016_19.56/CV_MAGNETIC/V_22378/CHUNK_209062/SFILE_CONTAINER_020
 succeeded but the CRC verification failed for the data read.

Thanks and Regards,
Ram

-Original Message-
From: Xavier Hernandez [mailto:xhernan...@datalab.es] 
Sent: Monday, May 23, 2016 8:11 AM
To: Ankireddypalle Reddy; gluster-users@gluster.org
Subject: Re: [Gluster-users] Possible error not being returned

In this case you should have more warnings/errors in your log files beside the 
ones related to EC. Can you post them ?

EC tries to keep track of good and bad bricks, however if multiple internal 
operations are failing (specially if related with communications), maybe some 
special case happens and it's unable to determine that some brick is bad 

Re: [Gluster-users] Possible error not being returned

2016-05-23 Thread Xavier Hernandez
In this case you should have more warnings/errors in your log files 
beside the ones related to EC. Can you post them ?


EC tries to keep track of good and bad bricks, however if multiple 
internal operations are failing (specially if related with 
communications), maybe some special case happens and it's unable to 
determine that some brick is bad when it should.


If that's the case, we need to know how this happens to try to solve it.

Xavi

On 23/05/16 11:22, Ankireddypalle Reddy wrote:

Xavier,
We are using disperse volume to save data being backed up by 
Commvault Simpana software. We are performing stress testing and have noticed 
that the issue happens when the 10G link is completely saturated consistently. 
The read would succeed but would return incorrect data. CRC checks fail on the 
returned data. The brick daemons are up and operational. To us it mostly 
appears to be communication issues. We have noticed lot of these issues when 1G 
NIC's were used. The error frequency went down drastically after moving the 
gluster traffic to 10G.

Volume Name: SDSStoragePool
Type: Distributed-Disperse
Volume ID: c5ebb780-669f-4c31-9970-e12dae1f473c
Status: Started
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: cvltpbba1sds:/ws/disk1/ws_brick
Brick2: cvltpbba3sds:/ws/disk1/ws_brick
Brick3: cvltpbba4sds:/ws/disk1/ws_brick
Brick4: cvltpbba1sds:/ws/disk2/ws_brick
Brick5: cvltpbba3sds:/ws/disk2/ws_brick
Brick6: cvltpbba4sds:/ws/disk2/ws_brick
Brick7: cvltpbba1sds:/ws/disk3/ws_brick
Brick8: cvltpbba3sds:/ws/disk3/ws_brick
Brick9: cvltpbba4sds:/ws/disk3/ws_brick
Brick10: cvltpbba1sds:/ws/disk4/ws_brick
Brick11: cvltpbba3sds:/ws/disk4/ws_brick
Brick12: cvltpbba4sds:/ws/disk4/ws_brick
Brick13: cvltpbba1sds:/ws/disk5/ws_brick
Brick14: cvltpbba3sds:/ws/disk5/ws_brick
Brick15: cvltpbba4sds:/ws/disk5/ws_brick
Brick16: cvltpbba1sds:/ws/disk6/ws_brick
Brick17: cvltpbba3sds:/ws/disk6/ws_brick
Brick18: cvltpbba4sds:/ws/disk6/ws_brick
Brick19: cvltpbba1sds:/ws/disk7/ws_brick
Brick20: cvltpbba3sds:/ws/disk7/ws_brick
Brick21: cvltpbba4sds:/ws/disk7/ws_brick
Brick22: cvltpbba1sds:/ws/disk8/ws_brick
Brick23: cvltpbba3sds:/ws/disk8/ws_brick
Brick24: cvltpbba4sds:/ws/disk8/ws_brick
Options Reconfigured:
performance.readdir-ahead: on

Thanks and Regards,
Ram

-Original Message-
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Monday, May 23, 2016 3:22 AM
To: Ankireddypalle Reddy; gluster-users@gluster.org
Subject: Re: [Gluster-users] Possible error not being returned

It's possible that the operation that failed is an internal one made by 
disperse itself or any other translator, so this error is not reported to the 
application.

The read issued by the application will only fail if anything fails while 
processing the read itself. If everything goes well, the read will succeed and 
it should contain healthy data.

What configuration are you using ? (gluster volume info) What are you doing 
exactly ? (workload) Why is one brick down/damaged ? are you doing tests ? how 
are you doing them ?

Best regards,

Xavi

On 20/05/16 16:54, Ankireddypalle Reddy wrote:

Hi,

Did anyone get a chance to check this. We are intermittently
receiving corrupted data in read operations because of this.



Thanks and Regards,

Ram



*From:*gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] *On Behalf Of
*Ankireddypalle Reddy
*Sent:* Thursday, May 19, 2016 3:59 PM
*To:* gluster-users@gluster.org
*Subject:* [Gluster-users] Possible error not being returned



Hi,

   A disperse volume  was configured on  servers with limited
network bandwidth. Some of the read operations failed with error



[2016-05-16 18:38:36.035559] E [MSGID: 122034]
[ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-2:
Insufficient available childs for this request (have 1, need 2)

[2016-05-16 18:38:36.035713] W [fuse-bridge.c:2213:fuse_readv_cbk]
0-glusterfs-fuse: 155121179: READ => -1 (Input/output error)



For some read operations just the following error was logged but the
I/O did not fail.

[2016-05-16 18:42:45.401570] E [MSGID: 122034]
[ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-3:
Insufficient available childs for this request (have 1, need 2)

[2016-05-16 18:42:45.402054] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-3:
Operation failed on some subvolumes (up=7, mask=6, remaining=0,
good=6, bad=1)



We are receiving corrupted data in the read operation when the error
is logged but the read call did not return any error.



Thanks and Regards,

Ram









***Legal Disclaimer***

"This communication may contain confidential and privileged material
for the

sole use of the intended recipient. Any unauthorized review, use or
distribution

by others is strictly prohibited. If you have received the message by
mistake,

please advise the sender by reply email and delete the 

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-23 Thread Kevin Lemonnier
Hi,

I didn't specify it but I use "localhost" to add the storage in proxmox.
My thinking is that every proxmox node is also a glusterFS node, so that
should work fine.

I don't want to use the "normal" way of setting a regular address in there
because you can't change it afterwards in proxmox, but could that be the source 
of
the problem, maybe during livre migration there is write comming from
two different servers at the same time ?



On Wed, May 18, 2016 at 07:11:08PM +0530, Krutika Dhananjay wrote:
>Hi,
> 
>I will try to recreate this issue tomorrow on my machines with the steps
>that Lindsay provided in this thread. I will let you know the result soon
>after that.
> 
>-Krutika
> 
>On Wednesday, May 18, 2016, Kevin Lemonnier  wrote:
>> Hi,
>>
>> Some news on this.
>> Over the week end the RAID Card of the node ipvr2 died, and I thought
>> that maybe that was the problem all along. The RAID Card was changed
>> and yesterday I reinstalled everything.
>> Same problem just now.
>>
>> My test is simple, using the website hosted on the VMs all the time
>> I reboot ipvr50, wait for the heal to complete, migrate all the VMs off
>> ipvr2 then reboot it, wait for the heal to complete then migrate all
>> the VMs off ipvr3 then reboot it.
>> Everytime the first database VM (which is the only one really using the
>disk
>> durign the heal) starts showing I/O errors on it's disk.
>>
>> Am I really the only one with that problem ?
>> Maybe one of the drives is dying too, who knows, but SMART isn't saying
>anything ..
>>
>>
>> On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote:
>>> Hi,
>>>
>>> I had a problem some time ago with 3.7.6 and freezing during heals,
>>> and multiple persons advised to use 3.7.11 instead. Indeed, with that
>>> version the freez problem is fixed, it works like a dream ! You can
>>> almost not tell that a node is down or healing, everything keeps
>working
>>> except for a little freez when the node just went down and I assume
>>> hasn't timed out yet, but that's fine.
>>>
>>> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are
>proxmox
>>> VMs with qCow2 disks stored on the gluster volume.
>>> Here is the config :
>>>
>>> Volume Name: gluster
>>> Type: Replicate
>>> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ipvr2.client:/mnt/storage/gluster
>>> Brick2: ipvr3.client:/mnt/storage/gluster
>>> Brick3: ipvr50.client:/mnt/storage/gluster
>>> Options Reconfigured:
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> features.shard: on
>>> features.shard-block-size: 64MB
>>> cluster.data-self-heal-algorithm: full
>>> performance.readdir-ahead: on
>>>
>>>
>>> As mentioned, I rebooted one of the nodes to test the freezing issue I
>had
>>> on previous versions and appart from the initial timeout, nothing, the
>website
>>> hosted on the VMs keeps working like a charm even during heal.
>>> Since it's testing, there isn't any load on it though, and I just tried
>to refresh
>>> the database by importing the production one on the two MySQL VMs, and
>both of them
>>> started doing I/O errors. I tried shutting them down and powering them
>on again,
>>> but same thing, even starting full heals by hand doesn't solve the
>problem, the disks are
>>> corrupted. They still work, but sometimes they remount their partitions
>read only ..
>>>
>>> I believe there is a few people already using 3.7.11, no one noticed
>corruption problems ?
>>> Anyone using Proxmox ? As already mentionned in multiple other threads
>on this mailing list
>>> by other users, I also have pretty much always shards in heal info, but
>nothing "stuck" there,
>>> they always go away in a few seconds getting replaced by other shards.
>>>
>>> Thanks
>>>
>>> --
>>> Kevin Lemonnier
>>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>>
>>
>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> --
>> Kevin Lemonnier
>> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>>

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing 

Re: [Gluster-users] Possible error not being returned

2016-05-23 Thread Ankireddypalle Reddy
Xavier,
We are using disperse volume to save data being backed up by 
Commvault Simpana software. We are performing stress testing and have noticed 
that the issue happens when the 10G link is completely saturated consistently. 
The read would succeed but would return incorrect data. CRC checks fail on the 
returned data. The brick daemons are up and operational. To us it mostly 
appears to be communication issues. We have noticed lot of these issues when 1G 
NIC's were used. The error frequency went down drastically after moving the 
gluster traffic to 10G. 

Volume Name: SDSStoragePool
Type: Distributed-Disperse
Volume ID: c5ebb780-669f-4c31-9970-e12dae1f473c
Status: Started
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: cvltpbba1sds:/ws/disk1/ws_brick
Brick2: cvltpbba3sds:/ws/disk1/ws_brick
Brick3: cvltpbba4sds:/ws/disk1/ws_brick
Brick4: cvltpbba1sds:/ws/disk2/ws_brick
Brick5: cvltpbba3sds:/ws/disk2/ws_brick
Brick6: cvltpbba4sds:/ws/disk2/ws_brick
Brick7: cvltpbba1sds:/ws/disk3/ws_brick
Brick8: cvltpbba3sds:/ws/disk3/ws_brick
Brick9: cvltpbba4sds:/ws/disk3/ws_brick
Brick10: cvltpbba1sds:/ws/disk4/ws_brick
Brick11: cvltpbba3sds:/ws/disk4/ws_brick
Brick12: cvltpbba4sds:/ws/disk4/ws_brick
Brick13: cvltpbba1sds:/ws/disk5/ws_brick
Brick14: cvltpbba3sds:/ws/disk5/ws_brick
Brick15: cvltpbba4sds:/ws/disk5/ws_brick
Brick16: cvltpbba1sds:/ws/disk6/ws_brick
Brick17: cvltpbba3sds:/ws/disk6/ws_brick
Brick18: cvltpbba4sds:/ws/disk6/ws_brick
Brick19: cvltpbba1sds:/ws/disk7/ws_brick
Brick20: cvltpbba3sds:/ws/disk7/ws_brick
Brick21: cvltpbba4sds:/ws/disk7/ws_brick
Brick22: cvltpbba1sds:/ws/disk8/ws_brick
Brick23: cvltpbba3sds:/ws/disk8/ws_brick
Brick24: cvltpbba4sds:/ws/disk8/ws_brick
Options Reconfigured:
performance.readdir-ahead: on

Thanks and Regards,
Ram

-Original Message-
From: Xavier Hernandez [mailto:xhernan...@datalab.es] 
Sent: Monday, May 23, 2016 3:22 AM
To: Ankireddypalle Reddy; gluster-users@gluster.org
Subject: Re: [Gluster-users] Possible error not being returned

It's possible that the operation that failed is an internal one made by 
disperse itself or any other translator, so this error is not reported to the 
application.

The read issued by the application will only fail if anything fails while 
processing the read itself. If everything goes well, the read will succeed and 
it should contain healthy data.

What configuration are you using ? (gluster volume info) What are you doing 
exactly ? (workload) Why is one brick down/damaged ? are you doing tests ? how 
are you doing them ?

Best regards,

Xavi

On 20/05/16 16:54, Ankireddypalle Reddy wrote:
> Hi,
>
> Did anyone get a chance to check this. We are intermittently 
> receiving corrupted data in read operations because of this.
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:*gluster-users-boun...@gluster.org
> [mailto:gluster-users-boun...@gluster.org] *On Behalf Of 
> *Ankireddypalle Reddy
> *Sent:* Thursday, May 19, 2016 3:59 PM
> *To:* gluster-users@gluster.org
> *Subject:* [Gluster-users] Possible error not being returned
>
>
>
> Hi,
>
>A disperse volume  was configured on  servers with limited 
> network bandwidth. Some of the read operations failed with error
>
>
>
> [2016-05-16 18:38:36.035559] E [MSGID: 122034] 
> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-2:
> Insufficient available childs for this request (have 1, need 2)
>
> [2016-05-16 18:38:36.035713] W [fuse-bridge.c:2213:fuse_readv_cbk]
> 0-glusterfs-fuse: 155121179: READ => -1 (Input/output error)
>
>
>
> For some read operations just the following error was logged but the 
> I/O did not fail.
>
> [2016-05-16 18:42:45.401570] E [MSGID: 122034] 
> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-3:
> Insufficient available childs for this request (have 1, need 2)
>
> [2016-05-16 18:42:45.402054] W [MSGID: 122053] 
> [ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-3: 
> Operation failed on some subvolumes (up=7, mask=6, remaining=0, 
> good=6, bad=1)
>
>
>
> We are receiving corrupted data in the read operation when the error 
> is logged but the read call did not return any error.
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
>
>
>
>
>
>
> ***Legal Disclaimer***
>
> "This communication may contain confidential and privileged material 
> for the
>
> sole use of the intended recipient. Any unauthorized review, use or 
> distribution
>
> by others is strictly prohibited. If you have received the message by 
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **
>
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly