Re: [Gluster-devel] Readdir d_off encoding

2015-01-07 Thread J. Bruce Fields
On Mon, Dec 22, 2014 at 02:04:37PM -0500, J. Bruce Fields wrote: It'd also be nice to see any proposals for a completely correct solution, even if it's something that will take a while. All I can think of is protocol extensions, but that's just what I know. I tried to think a little about

Re: [Gluster-devel] Readdir d_off encoding

2014-12-23 Thread Xavier Hernandez
On 12/22/2014 06:41 PM, Jeff Darcy wrote: An alternative would be to convert directories into regular files from the brick point of view. The benefits of this would be: * d_off would be controlled by gluster, so all bricks would have the same d_off and order. No need to use any d_off mapping

Re: [Gluster-devel] Readdir d_off encoding

2014-12-23 Thread Anand Avati
Please review http://review.gluster.org/9332/, as it undoes the introduction of itransform on d_off in AFR. This does not solve DHT-over-DHT or other future use cases, but at least fixes the regression in 3.6.x. Thanks On Tue Dec 23 2014 at 10:34:41 AM Anand Avati av...@gluster.org wrote:

Re: [Gluster-devel] Readdir d_off encoding

2014-12-23 Thread Jeff Darcy
Using GFID does not work for d_off. The GFID represents and inode, and a d_off represents a directory entry. Therefore using GFID as an alternative to d_off breaks down when you have hardlinks for the same inode in a single directory. Good point. So what *can* we do locally on a brick to

Re: [Gluster-devel] Readdir d_off encoding

2014-12-22 Thread Jeff Darcy
The birthday paradox says that with a 44-bit hash we're more likely than not to start seeing collisions somewhere around 2^22 directory entries. That 16-million-entry-directory would have a lot of collisions. This is really the key point. The risks of the bit-stealing approach have been

Re: [Gluster-devel] Readdir d_off encoding

2014-12-22 Thread J. Bruce Fields
On Mon, Dec 22, 2014 at 09:30:29AM -0500, Jeff Darcy wrote: The birthday paradox says that with a 44-bit hash we're more likely than not to start seeing collisions somewhere around 2^22 directory entries. That 16-million-entry-directory would have a lot of collisions. This is really the

Re: [Gluster-devel] Readdir d_off encoding

2014-12-22 Thread Jeff Darcy
An alternative would be to convert directories into regular files from the brick point of view. The benefits of this would be: * d_off would be controlled by gluster, so all bricks would have the same d_off and order. No need to use any d_off mapping or transformation. I don't think a

Re: [Gluster-devel] Readdir d_off encoding

2014-12-22 Thread J. Bruce Fields
On Mon, Dec 22, 2014 at 09:30:29AM -0500, Jeff Darcy wrote: By contrast, the failure mode for the map-caching approach - a simple failure in readdir - is relatively benign. Such failures are also likely to be less common, even if we adopt the *unprecedented* requirement that the cache be

Re: [Gluster-devel] Readdir d_off encoding

2014-12-22 Thread Jeff Darcy
On Mon, Dec 22, 2014 at 09:30:29AM -0500, Jeff Darcy wrote: By contrast, the failure mode for the map-caching approach - a simple failure in readdir - is relatively benign. Such failures are also likely to be less common, even if we adopt the *unprecedented* requirement that the cache be

Re: [Gluster-devel] Readdir d_off encoding

2014-12-18 Thread Shyam
On 12/17/2014 05:04 AM, Xavier Hernandez wrote: Just to consider all possibilities... Current architecture needs to create all directory structure on all bricks, and has the big problem that each directory in each brick will store the files in different order and with different d_off values.

Re: [Gluster-devel] Readdir d_off encoding

2014-12-17 Thread Xavier Hernandez
Just to consider all possibilities... Current architecture needs to create all directory structure on all bricks, and has the big problem that each directory in each brick will store the files in different order and with different d_off values. This is a serious scalability issue and have

Re: [Gluster-devel] Readdir d_off encoding

2014-12-16 Thread Dan Lambright
Sent: Tuesday, December 16, 2014 11:46:46 AM Subject: Re: [Gluster-devel] Readdir d_off encoding On 12/15/2014 09:06 PM, Anand Avati wrote: Replies inline On Mon Dec 15 2014 at 12:46:41 PM Shyam srang...@redhat.com mailto:srang...@redhat.com wrote: With the changes present in [1

Re: [Gluster-devel] Readdir d_off encoding

2014-12-16 Thread Shyam
On 12/16/2014 03:21 PM, J. Bruce Fields wrote: On Tue, Dec 16, 2014 at 11:46:46AM -0500, Shyam wrote: On 12/15/2014 09:06 PM, Anand Avati wrote: Replies inline On Mon Dec 15 2014 at 12:46:41 PM Shyam srang...@redhat.com mailto:srang...@redhat.com wrote: With the changes present in [1]

[Gluster-devel] Readdir d_off encoding

2014-12-15 Thread Shyam
With the changes present in [1] and [2], A short explanation of the change would be, we encode the subvol ID in the d_off, losing 'n + 1' bits in case the high order n+1 bits of the underlying xlator returned d_off is not free. (Best to read the commit message for [1] :) ) Although not