Re: [Gluster-users] Geo-replication configuration issue

2016-07-18 Thread Aravinda

Hi,

Looks like Master Pem keys are not copied to Slave nodes properly, 
Please cleanup /root/.ssh/authorized_keys in Slave nodes and run Geo-rep 
create force again.


gluster volume geo-replication  :: 
create push-pem force


Do you observe any errors related to hook scripts in glusterd log file?

regards
Aravinda

On 07/18/2016 10:11 PM, Alexandre Besnard wrote:

Hello

On a fresh Gluster 3.8 install, I am not able to configure a geo-replicated 
volume. Everything works fine up to starting of the volume however Gluster 
reports a faulty status.

When looking at the logs (gluster_error):

[2016-07-18 16:30:04.371686] I [cli.c:730:main] 0-cli: Started running gluster 
with version 3.8.0
[2016-07-18 16:30:04.435854] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2016-07-18 16:30:04.435921] I [socket.c:2468:socket_event_handler] 
0-transport: disconnecting now
[2016-07-18 16:30:04.997986] I [input.c:31:cli_batch] 0-: Exiting with: 0



 From the geo-replicated logs, it seems I have a SSH configuration issue:

2016-07-18 16:35:28.293524] I [monitor(monitor):266:monitor] Monitor: 

[2016-07-18 16:35:28.293740] I [monitor(monitor):267:monitor] Monitor: starting 
gsyncd worker
[2016-07-18 16:35:28.352266] I [gsyncd(/gluster/backupvol):710:main_i] : 
syncing: gluster://localhost:backupvol -> ssh://root@ks4:gluster://localhost:backupvol
[2016-07-18 16:35:28.352489] I [changelogagent(agent):73:__init__] 
ChangelogAgent: Agent listining...
[2016-07-18 16:35:28.492474] E 
[syncdutils(/gluster/backupvol):252:log_raise_exception] : connection to 
peer is broken
[2016-07-18 16:35:28.492706] E [resource(/gluster/backupvol):226:errlog] Popen: command 
"ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-Fs2XND/b63292d563144e7818235d683516731d.sock root@ks4 
/nonexistent/gsyncd --session-owner 3281242a-ab45-4a0d-99e5-2965b4ac5840 -N --listen 
--timeout 120 gluster://localhost:backupvol" returned with 255, saying:
[2016-07-18 16:35:28.492794] E [resource(/gluster/backupvol):230:logerr] Popen: 
ssh> key_load_public: invalid format
[2016-07-18 16:35:28.492863] E [resource(/gluster/backupvol):230:logerr] Popen: 
ssh> Permission denied (publickey,password).
[2016-07-18 16:35:28.493004] I [syncdutils(/gluster/backupvol):220:finalize] 
: exiting.
[2016-07-18 16:35:28.494045] I [repce(agent):92:service_loop] RepceServer: 
terminating on reaching EOF.
[2016-07-18 16:35:28.494204] I [syncdutils(agent):220:finalize] : exiting.
[2016-07-18 16:35:28.494143] I [monitor(monitor):333:monitor] Monitor: 
worker(/gluster/backupvol) died before establishing connection


I tried to fix them to the best of my knowledge but I am missing something.

Can you help me to fix it?

Thanks,
Alex
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-Maintainers] Gluster Events API - Help required to identify the list of Events from each component

2016-07-18 Thread Atin Mukherjee
So the framework is in now mainline branch [1]. I'd request all of you to
start thinking about all the important events required to be captured as a
next step and feedback.

[1] http://review.gluster.org/14248

~Atin

On Thu, Jul 14, 2016 at 1:45 PM, Aravinda  wrote:

> +gluster-users
>
> regards
> Aravinda
>
> On 07/13/2016 09:03 PM, Vijay Bellur wrote:
>
>> On 07/13/2016 10:23 AM, Aravinda wrote:
>>
>>> Hi,
>>>
>>> We are working on Eventing feature for Gluster, Sent feature patch for
>>> the same.
>>> Design: http://review.gluster.org/13115
>>> Patch:  http://review.gluster.org/14248
>>> Demo: http://aravindavk.in/blog/10-mins-intro-to-gluster-eventing
>>>
>>> Following document lists the events(mostly user driven events are
>>> covered in the doc). Please let us know the Events from your components
>>> to be supported by the Eventing Framework.
>>>
>>>
>>> https://docs.google.com/document/d/1oMOLxCbtryypdN8BRdBx30Ykquj4E31JsaJNeyGJCNo/edit?usp=sharing
>>>
>>>
>>>
>> Thanks for putting this together, Aravinda! Might be worth to poll -users
>> ML also about events of interest.
>>
>> -Vijay
>>
>
> ___
> maintainers mailing list
> maintain...@gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
>



-- 

--Atin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New cluster - first experience

2016-07-18 Thread Pranith Kumar Karampuri
On Mon, Jul 18, 2016 at 11:58 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 18/07/2016 20:13, Alastair Neil ha scritto:
>
>> It does not seem to me that this is a gluster issue.  I just quickly
>> reviewed the thread and you said that you saw 60 MB/s with plain nfs to the
>> bricks and with gluster and no sharding you got 59 MB/s
>>
>
> That's true but I have to use sharding that kills my transfer rate.
>

What is the shard size you are looking to set?


> Additionally, I would like to optimize as much as i can the current
> network and I'm looking for
> some suggestions by gluster users, as this network is totally dedicated to
> a gluster cluster.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Block storage

2016-07-18 Thread Gandalf Corvotempesta
Is block storage xlator stable and usable in production?
Any docs about this?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New cluster - first experience

2016-07-18 Thread Gandalf Corvotempesta

Il 18/07/2016 20:13, Alastair Neil ha scritto:
It does not seem to me that this is a gluster issue.  I just quickly  
reviewed the thread and you said that you saw 60 MB/s with plain nfs 
to the bricks and with gluster and no sharding you got 59 MB/s


That's true but I have to use sharding that kills my transfer rate.
Additionally, I would like to optimize as much as i can the current 
network and I'm looking for
some suggestions by gluster users, as this network is totally dedicated 
to a gluster cluster.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] gluster NFS/rpcbind conflict

2016-07-18 Thread Yannick Perret

Hello,
just a note to give feedback on a known problem:
I have 2 replica servers and for some reasons I use NFS mounts on one of 
my clients (because it is an old one with which I have troubles with 
glusterfs native client).
I managed to performs NFS mounts from on of the servers but failed on 
the other.


I was happy to find a thread about this problem: it is rpcbind started 
by default with "-w" option that lead rpcbind to re-use NFS-server ports 
even if no more NFS server is running (but it did on this machine). 
Removing the "-w" option and restarting rpcbind works fine.


This mail is only to suggest to add this on documentations pages for 
glusterfs as it seems than other people met this problem.
In a more general way why not adding a "troubleshooting" section to 
documentation? I parsed official documentations and found the solution 
reading bugreports threads. It seems that this problem still exists on 
(at least) recent Debians - that I'm using - so it may save time to 
other users.


Other suggestion: indicating on docs that it may be disk-saving to 
switch volumes to WARNING level-log (for clients). INFO is far too 
verbose for production (at least on 3.6.x) and should only be used when 
starting using glusterfs.



Note: this is just improvement suggestions, that may save time to other 
people. glusterfs is very fine for our needs and we are happy to use it :)


Best regards,
--
Y.




smime.p7s
Description: Signature cryptographique S/MIME
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New cluster - first experience

2016-07-18 Thread Alastair Neil
It does not seem to me that this is a gluster issue.  I just quickly
reviewed the thread and you said that you saw 60 MB/s with plain nfs to the
bricks and with gluster and no sharding you got 59 MB/s

With plain NFS (no gluster involved) i'm getting almost the same
> speed: about 60MB/s
>
> Without sharding:
> # echo 3 > /proc/sys/vm/drop_caches; dd if=/dev/zero of=test bs=1M
> count=1000 conv=fsync
> 1000+0 record dentro
> 1000+0 record fuori
> 1048576000 byte (1,0 GB) copiati, 17,759 s, 59,0 MB/s




On 18 July 2016 at 06:35, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2016-07-16 15:07 GMT+02:00 Gandalf Corvotempesta
> :
> > 2016-07-16 15:04 GMT+02:00 Gandalf Corvotempesta
> > :
> >> [ ID] Interval   Transfer Bandwidth
> >> [  3]  0.0-10.0 sec  2.31 GBytes  1.98 Gbits/sec
> >
> > Obviously i did the same test with all gluster server. Speed is always
> > near 2gbit, so, the network is not an issue here.
>
> Any help? I would like to start real-test with virtual machines and
> proxmox before the August holiday.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Determine og force healing occurring

2016-07-18 Thread Pranith Kumar Karampuri
On Sat, Jul 16, 2016 at 9:53 PM, Jesper Led Lauridsen TS Infra server <
j...@dr.dk> wrote:

> On 07/16/2016 04:10 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Fri, Jul 15, 2016 at 5:20 PM, Jesper Led Lauridsen TS Infra server <
> j...@dr.dk> wrote:
>
>> Hi,
>>
>> How do I determine in which log etc. that a healing is in progress or
>> startet and how do I if not startet force it.
>>
>> Additional info, is that I have some problem with a volume if I execute
>> 'gluster volume heal  info' the command just hangs but if I
>> execute  'gluster volume heal  info split-brain' it return that no
>> file are in split-brain. Yet there is and I have successfully recovered
>> another one.
>>
>
> If the command hangs there is a chance that operations on the file may
> have lead to stale locks. Could you give the output of statedump?
> You can follow
> https://gluster.readthedocs.io/en/latest/Troubleshooting/statedump/ to
> generat the files.
>
>
> Thanks for you response. You are right there was a stale lock. But I am
> sorry I booted all my cluster nodes, so I guess (without knowing) that
> there is no reason to give you the output of a statedump?
>
> What I can confirm and give of information is:
>   * All the servers failed to reboot so I had to push the button. They all
> failed with the message
>  "Unmounting pipe file system: Cannot create link /etc/mtab~
>  Perhaps there is a stale lock file?"
>   * After 2 nodes had rebooted the command executed without any problem
> and reported a couple off split-brain (Both Directory and Files)
>   * strace the command showed that it was just looping, so basically the
> command didn't hanging. It just couldn't finish.
>   * I am using "glusterfs-3.6.2-1.el6.x86_64". But hoping to upgrade to
> 3.6.9 this weekend.
>   * The file I refereed to here. Now has the same output on both replicas
> when getting getfattr information. The
> grusted.afr.glu_rhevtst_dr2_data_01-client-[0,1] and trusted.afr.dirty are
> now all zero
>

If you are anyway looking to upgrade, why not upgrade to 3.7.13, which is
the latest stable version?


>
>
>
>
>> I just have problem with this one. I can determine if there is a healing
>> process running or not
>>
>> I have change 'trusted.afr.glu_rhevtst_dr2_data_01-client-1' to
>> 0x on the file located on glustertst03 and executed
>> a 'ls -lrt' on the file on the gluster-mount.
>>
>> [root@glustertst04 ]# getfattr -d -m . -e hex
>> /bricks/brick1/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> bricks/brick1/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.dirty=0x
>> trusted.afr.glu_rhevtst_dr2_data_01-client-0=0x4c70
>> trusted.afr.glu_rhevtst_dr2_data_01-client-1=0x
>> trusted.gfid=0x7575f870875b4c899fd81ef16be3b1a1
>>
>> trusted.glusterfs.quota.70145d52-bb80-42ce-b437-64be6ee4a7d4.contri=0x0001606dc000
>> trusted.pgfid.70145d52-bb80-42ce-b437-64be6ee4a7d4=0x0001
>>
>>  [root@glustertst03 ]# getfattr -d -m . -e hex
>> /bricks/brick1/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> bricks/brick1/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab
>>
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
>> trusted.afr.dirty=0x0027
>> trusted.afr.glu_rhevtst_dr2_data_01-client-0=0x
>> trusted.afr.glu_rhevtst_dr2_data_01-client-1=0x
>> trusted.gfid=0x7575f870875b4c899fd81ef16be3b1a1
>>
>> trusted.glusterfs.quota.70145d52-bb80-42ce-b437-64be6ee4a7d4.contri=0x000160662000
>> trusted.pgfid.70145d52-bb80-42ce-b437-64be6ee4a7d4=0x0001
>>
>> [root@glustertst04 ]# stat
>> /var/run/gluster/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab
>>   File:
>> `/var/run/gluster/glu_rhevtst_dr2_data_01/6bdc67d1-4ae5-47e3-86c3-ef0916996862/images/7669ca25-028e-40a5-9dc8-06c716101709/a1ae3612-bb89-45d8-8041-134c34592eab'
>>   Size: 21474836480 Blocks: 11548384   IO Block: 131072 regular file
>> Device: 31h/49d Inode: 11517990069246079393  Links: 1
>> Access: (0660/-rw-rw)  Uid: (   36/vdsm)   Gid: (   36/ kvm)
>> Access: 2016-07-15 13:33:47.860224289 +0200
>> Modify: 2016-07-15 13:34:44.396125458 +0200
>> Change: 2016-07-15 13:34:44.397125492 +0200

[Gluster-users] Geo-replication configuration issue

2016-07-18 Thread Alexandre Besnard
Hello

On a fresh Gluster 3.8 install, I am not able to configure a geo-replicated 
volume. Everything works fine up to starting of the volume however Gluster 
reports a faulty status.

When looking at the logs (gluster_error):

[2016-07-18 16:30:04.371686] I [cli.c:730:main] 0-cli: Started running gluster 
with version 3.8.0
[2016-07-18 16:30:04.435854] I [MSGID: 101190] 
[event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2016-07-18 16:30:04.435921] I [socket.c:2468:socket_event_handler] 
0-transport: disconnecting now
[2016-07-18 16:30:04.997986] I [input.c:31:cli_batch] 0-: Exiting with: 0



>From the geo-replicated logs, it seems I have a SSH configuration issue:

2016-07-18 16:35:28.293524] I [monitor(monitor):266:monitor] Monitor: 

[2016-07-18 16:35:28.293740] I [monitor(monitor):267:monitor] Monitor: starting 
gsyncd worker
[2016-07-18 16:35:28.352266] I [gsyncd(/gluster/backupvol):710:main_i] : 
syncing: gluster://localhost:backupvol -> 
ssh://root@ks4:gluster://localhost:backupvol
[2016-07-18 16:35:28.352489] I [changelogagent(agent):73:__init__] 
ChangelogAgent: Agent listining...
[2016-07-18 16:35:28.492474] E 
[syncdutils(/gluster/backupvol):252:log_raise_exception] : connection to 
peer is broken
[2016-07-18 16:35:28.492706] E [resource(/gluster/backupvol):226:errlog] Popen: 
command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
/var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S 
/tmp/gsyncd-aux-ssh-Fs2XND/b63292d563144e7818235d683516731d.sock root@ks4 
/nonexistent/gsyncd --session-owner 3281242a-ab45-4a0d-99e5-2965b4ac5840 -N 
--listen --timeout 120 gluster://localhost:backupvol" returned with 255, saying:
[2016-07-18 16:35:28.492794] E [resource(/gluster/backupvol):230:logerr] Popen: 
ssh> key_load_public: invalid format
[2016-07-18 16:35:28.492863] E [resource(/gluster/backupvol):230:logerr] Popen: 
ssh> Permission denied (publickey,password).
[2016-07-18 16:35:28.493004] I [syncdutils(/gluster/backupvol):220:finalize] 
: exiting.
[2016-07-18 16:35:28.494045] I [repce(agent):92:service_loop] RepceServer: 
terminating on reaching EOF.
[2016-07-18 16:35:28.494204] I [syncdutils(agent):220:finalize] : exiting.
[2016-07-18 16:35:28.494143] I [monitor(monitor):333:monitor] Monitor: 
worker(/gluster/backupvol) died before establishing connection


I tried to fix them to the best of my knowledge but I am missing something.

Can you help me to fix it?

Thanks,
Alex
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Shard storage suggestions

2016-07-18 Thread Krutika Dhananjay
On Mon, Jul 18, 2016 at 3:55 PM, Krutika Dhananjay 
wrote:

> Hi,
>
> The suggestion you gave was in fact considered at the time of writing
> shard translator.
> Here are some of the considerations for sticking with a single directory
> as opposed to a two-tier classification of shards based on the initial
> chars of the uuid string:
> i) Even for a 4TB disk with the smallest possible shard size of 4MB, there
> will only be a max of 1048576 entries
>  under /.shard in the worst case - a number far less than the max number
> of inodes that are supported by most backend file systems.
>
> ii) Entry self-heal for a single directory even with the simplest case of
> 1 entry deleted/created while a replica is down required crawling the whole
> sub-directory tree, figuring which entry is present/absent between src and
> sink and then healing it to the sink. With granular entry self-heal [1], we
> no longer have to live under this limitation.
>
> iii) Resolving shards from the original file name as given by the
> application to the corresponding shard within a single directory (/.shard
> in the existing case) would mean, looking up the parent dir /.shard first
> followed by lookup on the actual shard that is to be operated on. But
> having a two-tier sub-directory structure means that we not only have to
> resolve (or look-up) /.shard first, but also the directories '/.shard/d2',
> '/.shard/d2/18', and '/.shard/d2/18/d218cd1c-4bd9-40d7-9810-86b3f7932509'
> before finally looking up the shard, which is a lot of network operations.
> Yes, these are all one-time operations and the results can be cached in the
> inode table, but still on account of having to have dynamic gfids (as
> opposed to just /.shard, which has a fixed gfid -
> be318638-e8a0-4c6d-977d-7a937aa84806), it is trivial to resolve the name of
> the shard to gfid, or the parent name to parent gfid _even_ in memory.
>

s/trivial/non-trivial/ in the last sentence above.


Oh and [1] -
https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/granular-entry-self-healing.md

-Krutika


>
>
> Are you unhappy with the performance? What's your typical VM image size,
> shard block size and the capacity of individual bricks?
>
> -Krutika
>
> On Mon, Jul 18, 2016 at 2:43 PM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> 2016-07-18 9:53 GMT+02:00 Oleksandr Natalenko :
>> > I'd say, like this:
>> >
>> > /.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509.1
>>
>> Yes, something like this.
>> I was on mobile when I wrote. Your suggestion is better than mine.
>>
>> Probably, using a directory for the whole shard is also better and
>> keep the directory structure clear:
>>
>>
>>  
>> /.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509/D218CD1C-4BD9-40D7-9810-86B3F7932509.1
>>
>> The current shard directory structure doesn't scale at all.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New cluster - first experience

2016-07-18 Thread Gandalf Corvotempesta
2016-07-16 15:07 GMT+02:00 Gandalf Corvotempesta
:
> 2016-07-16 15:04 GMT+02:00 Gandalf Corvotempesta
> :
>> [ ID] Interval   Transfer Bandwidth
>> [  3]  0.0-10.0 sec  2.31 GBytes  1.98 Gbits/sec
>
> Obviously i did the same test with all gluster server. Speed is always
> near 2gbit, so, the network is not an issue here.

Any help? I would like to start real-test with virtual machines and
proxmox before the August holiday.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Shard storage suggestions

2016-07-18 Thread Gandalf Corvotempesta
2016-07-18 12:25 GMT+02:00 Krutika Dhananjay :
> Hi,
>
> The suggestion you gave was in fact considered at the time of writing shard
> translator.
> Here are some of the considerations for sticking with a single directory as
> opposed to a two-tier classification of shards based on the initial chars of
> the uuid string:
> i) Even for a 4TB disk with the smallest possible shard size of 4MB, there
> will only be a max of 1048576 entries
>  under /.shard in the worst case - a number far less than the max number of
> inodes that are supported by most backend file systems.

This with just 1 single file.
What about thousands of huge sharded files? In a petabyte scale cluster, having
thousands of huge file should be considered normal.

> iii) Resolving shards from the original file name as given by the
> application to the corresponding shard within a single directory (/.shard in
> the existing case) would mean, looking up the parent dir /.shard first
> followed by lookup on the actual shard that is to be operated on. But having
> a two-tier sub-directory structure means that we not only have to resolve
> (or look-up) /.shard first, but also the directories '/.shard/d2',
> '/.shard/d2/18', and '/.shard/d2/18/d218cd1c-4bd9-40d7-9810-86b3f7932509'
> before finally looking up the shard, which is a lot of network operations.
> Yes, these are all one-time operations and the results can be cached in the
> inode table, but still on account of having to have dynamic gfids (as
> opposed to just /.shard, which has a fixed gfid -
> be318638-e8a0-4c6d-977d-7a937aa84806), it is trivial to resolve the name of
> the shard to gfid, or the parent name to parent gfid _even_ in memory.

What about just 1 single level?
/.shard/d218cd1c-4bd9-40d7-9810-86b3f7932509/d218cd1c-4bd9-40d7-9810-86b3f7932509.1
?

You have the GFID, thus there is no need to crawl multiple levels,
just direct-access to the proper path.

With this soulution, you have 1.048.576 entries with a 4TB shared file
with 4MB shard size.
With the current implementation, you have 1.048.576 for each sharded
file. If I have 100 4TB files, i'll end
with 1.048.576*100 = 104.857.600 files in a single directory.

> Are you unhappy with the performance? What's your typical VM image size,
> shard block size and the capacity of individual bricks?

No, i'm just thinking about this optimization.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Shard storage suggestions

2016-07-18 Thread Krutika Dhananjay
Hi,

The suggestion you gave was in fact considered at the time of writing shard
translator.
Here are some of the considerations for sticking with a single directory as
opposed to a two-tier classification of shards based on the initial chars
of the uuid string:
i) Even for a 4TB disk with the smallest possible shard size of 4MB, there
will only be a max of 1048576 entries
 under /.shard in the worst case - a number far less than the max number of
inodes that are supported by most backend file systems.

ii) Entry self-heal for a single directory even with the simplest case of 1
entry deleted/created while a replica is down required crawling the whole
sub-directory tree, figuring which entry is present/absent between src and
sink and then healing it to the sink. With granular entry self-heal [1], we
no longer have to live under this limitation.

iii) Resolving shards from the original file name as given by the
application to the corresponding shard within a single directory (/.shard
in the existing case) would mean, looking up the parent dir /.shard first
followed by lookup on the actual shard that is to be operated on. But
having a two-tier sub-directory structure means that we not only have to
resolve (or look-up) /.shard first, but also the directories '/.shard/d2',
'/.shard/d2/18', and '/.shard/d2/18/d218cd1c-4bd9-40d7-9810-86b3f7932509'
before finally looking up the shard, which is a lot of network operations.
Yes, these are all one-time operations and the results can be cached in the
inode table, but still on account of having to have dynamic gfids (as
opposed to just /.shard, which has a fixed gfid -
be318638-e8a0-4c6d-977d-7a937aa84806), it is trivial to resolve the name of
the shard to gfid, or the parent name to parent gfid _even_ in memory.


Are you unhappy with the performance? What's your typical VM image size,
shard block size and the capacity of individual bricks?

-Krutika

On Mon, Jul 18, 2016 at 2:43 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2016-07-18 9:53 GMT+02:00 Oleksandr Natalenko :
> > I'd say, like this:
> >
> > /.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509.1
>
> Yes, something like this.
> I was on mobile when I wrote. Your suggestion is better than mine.
>
> Probably, using a directory for the whole shard is also better and
> keep the directory structure clear:
>
>
>  
> /.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509/D218CD1C-4BD9-40D7-9810-86B3F7932509.1
>
> The current shard directory structure doesn't scale at all.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] lingering <gfid:*> entries in volume heal, gluster 3.6.3

2016-07-18 Thread Kingsley
On Fri, 2016-07-15 at 22:24 +0530, Ravishankar N wrote:
> On 07/15/2016 09:55 PM, Kingsley wrote:
> > This has revealed something. I'm now seeing lots of lines like this in
> > the shd log:
> >
> > [2016-07-15 16:20:51.098152] D [afr-self-heald.c:516:afr_shd_index_sweep] 
> > 0-callrec-replicate-0: got entry: eaa43674-b1a3-4833-a946-de7b7121bb88
> > [2016-07-15 16:20:51.099346] D 
> > [client-rpc-fops.c:1523:client3_3_inodelk_cbk] 0-callrec-client-2: remote 
> > operation failed: Stale file handle
> > [2016-07-15 16:20:51.100683] D 
> > [client-rpc-fops.c:2686:client3_3_opendir_cbk] 0-callrec-client-2: remote 
> > operation failed: Stale file handle. Path: 
> >  
> > (eaa43674-b1a3-4833-a946-de7b7121bb88)
> 
> Looks like the files are not present at all in client-2 which is why you 
> see these messages.
> Find out the files/directory names corresponding to these gfids from one 
> of the healthy bricks and see if they are present in client-2 as well. 
> If not try accessing them from the mount. That should create any missing 
> entries in client-2. Then launch heal again.
> 
> Hope this helps.
> Ravi

Hi,

Thanks - I found  the files that these entries corresponded to. Indeed
they weren't on client-2. From a client I did ls -lR of the directory
tree that they were all from and then self heal automatically fixed
everything, so now all is back in order.

Thank you for your help!

Cheers,
Kingsley.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Shard storage suggestions

2016-07-18 Thread Gandalf Corvotempesta
2016-07-18 9:53 GMT+02:00 Oleksandr Natalenko :
> I'd say, like this:
>
> /.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509.1

Yes, something like this.
I was on mobile when I wrote. Your suggestion is better than mine.

Probably, using a directory for the whole shard is also better and
keep the directory structure clear:

 
/.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509/D218CD1C-4BD9-40D7-9810-86B3F7932509.1

The current shard directory structure doesn't scale at all.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Shard storage suggestions

2016-07-18 Thread Oleksandr Natalenko

I'd say, like this:

/.shard/d2/18/D218CD1C-4BD9-40D7-9810-86B3F7932509.1

18.07.2016 10:31, Gandalf Corvotempesta написав:

AFAIK gluster store each shard on a single directory.
With huge files this could lead to millions of small shard file in the
same directory that certainly lead to a performance issue.

Why not moving each shard in a dedicated directory and may be also
with a defined nested structure? In example, from this:

/.shard/D218CD1C-4BD9-40D7-9810-86B3F7932509.1

To something like this:

/.shard/d/d2/D218CD1C-4BD9-40D7-9810-86B3F7932509/D218CD1C-4BD9-40D7-9810-86B3F7932509.1
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Shard storage suggestions

2016-07-18 Thread Gandalf Corvotempesta
AFAIK gluster store each shard on a single directory.
With huge files this could lead to millions of small shard file in the same
directory that certainly lead to a performance issue.

Why not moving each shard in a dedicated directory and may be also with a
defined nested structure? In example, from this:

/.shard/*d218cd1c-4bd9-40d7-9810-86b3f7932509*.1

To something like this:

/.shard/d/d2/*d218cd1c-4bd9-40d7-9810-86b3f7932509*/
*d218cd1c-4bd9-40d7-9810-86b3f7932509*.1
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users