Re: [U2] RESIZE - dynamic files

Rick Nuckolls Fri, 06 Jul 2012 10:53:16 -0700

Logically, the graphed solution to varying the split.load value with an 
x-axis=modulus, y-axis=time_to_select_&_read_the_whole_file is going to be 
parabolic, having very slow performance at modulus=1 and modulus = # of records.


If you actually want to find the precise low point, ignore all this bs, create 
a bunch of files with copies of the same data, but different moduli, restart 
your system (including all disk drives & raid devices) in order to purge all 
buffers, and then run the same program against each file.  I think that we 
would all be curious about the results.

Easier yet, just ignore the bs and use the defaults. :)

-Rick

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Friday, July 06, 2012 9:56 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


So is there a performance increase in BASIC SELECTS by reducing overflow? Some 
people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our programs 
that read that table use a BASIC SELECT WITH.. 

for a BASIC select do you gain anything by reducing overflow?

Chris


> To: u2-users@listserver.u2ug.org
> From: wjhon...@aol.com
> Date: Thu, 5 Jul 2012 20:12:21 -0400
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> A BASIC SELECT cannot use criteria at all.
> It is going to walk through every record in the file, in order.
> And that's the sticky wicket. That whole "in order" business.
> The disk drive controller has no clue on linked frames, but it *will* do 
> optimistic look aheads for you.
> So you are much better off, for BASIC SELECTs having nothing in overflow, at 
> all. :)
> That way, when you go to ask for the *next* frame, it will always be 
> contiguous, and already sitting in memory.
> 
> 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Rick Nuckolls <r...@lynden.com>
> To: 'U2 Users List' <u2-users@listserver.u2ug.org>
> Sent: Thu, Jul 5, 2012 4:43 pm
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Most disks and disk systems cache huge amounts of information these days, 
> and, 
> epending on 20 factors or so, one solution will be better than another for a 
> iven file.
> For the wholesale, SELECT F WITH...., The fewest disk records will almost 
> always 
> in. For files that have ~10 records/group and have ~10% of the groups 
> verflowed, then perhaps 1% of record reads will do a second read for the 
> verflow buffer because the target key was not in the primary group.  Writing 
> a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
> owering the split.load will increase the number of splits slightly, and 
> ncrease the total number of groups considerably.  What you have shown is that 
> ou need to increase the the modulus (and select time) of a large file more 
> than 
> 0% in order to decrease the read and update times for you records 0.5% of the 
> ime (assuming, that you have only reduced the number of overflow groups by 
> 50%.)
> As Charles suggests, this is an interesting exercise, but your actual results 
> ill rapidly change if you actually add /remove records from your file, change 
> he load or number of files on your system, put in a new drive, cpu, memory 
> oard, or install a new release of Universe, move to raid, etc.
> -Rick
> -----Original Message-----
> rom: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
> ent: Thursday, July 05, 2012 2:38 PM
> o: u2-users@listserver.u2ug.org
> ubject: Re: [U2] RESIZE - dynamic files
> 
> he hardward "look ahead" of the disk drive reader will grab consecutive 
> frames" into memory, since it assumes you'll want the "next" frame next.
> o the less overflow you have, the faster a full file scan will become.
> t least that's my theory ;)
> 
> 
> ----Original Message-----
> rom: Rick Nuckolls <r...@lynden.com>
> o: 'U2 Users List' <u2-users@listserver.u2ug.org>
> ent: Thu, Jul 5, 2012 2:29 pm
> ubject: Re: [U2] RESIZE - dynamic files
> 
> hris,
> or the type of use that you described earlier; BASIC selects and reads, 
> ducing overflow will have negligible performance benefit, especially compared 
>  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
> latively small percentages, then it will never merge anyway (because you will 
> ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
>  your optimum minimum modulus solution will probably be "how ever large it 
> ows"  The overhead for a group split is not as bad as it sounds unless your 
> dates/sec count is extremely high, such as during a copy.
> f you do regular SELECT and SCANS of the entire file, then your goal should 
> be 
>  reduce the total disk size of the file, and not worry much about common 
> erflow. The important thing is that the file is dynamic, so you will never 
> counter the issues that undersized statically hashed files develop.
> e have thousands of dynamically hashed files on our (Solaris) systems, with 
> an 
> tremely low problem rate.
> ick
> ----Original Message-----
> om: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Chris Austin
> nt: Thursday, July 05, 2012 11:21 AM
> : u2-users@listserver.u2ug.org
> bject: Re: [U2] RESIZE - dynamic files
> ick,
> ou are correct, I should be using the smaller size (I just haven't changed it 
> t). Based on the reading I have done you should
> ly use the larger group size when the average record size is greater than 
> 1000 
> tes. 
> s far as being better off with the defaults that's basically what I'm trying 
> to 
> est (as well as learn how linear hashing works). I was able
>  reduce my overflow by 18% and I only increased my empty groups by a very 
> all amount as well as only increased my file size
>  8%. This in theory should be better for reads/writes than what I had before. 
> o test the performance I need to write a ton of records and then capture the 
> tput and compare the output using timestamps. 
> hris
>  From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Thu, 5 Jul 2012 09:22:02 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Chis,
> 
> I still am wondering what is prompting you to continue using the larger group 
> ze.
> 
> I think that Martin, and the UV documentation is correct in this case; you 
> uld be as well or better off with the defaults.
> 
> -Rick
> 
> On Jul 5, 2012, at 9:13 AM, "Martin Phillips" <martinphill...@ladybridge.com> 
> ote:
> coming
> > Hi,
> > 
> > The various suggestions about setting the minimum modulus to reduce 
> > overflow 
> e all very well but effectively you are turning a
> > dynamic file into a static one, complete with all the continual maintenance 
> rk needed to keep the parameters in step with the
> > data.
> > 
> > In most cases, the only parameter that is worth tuning is the group size to 
> y to pack things nicely. Even this is often fine left
> > alone though getting it to match the underlying o/s page size is helpful.
> > 
> > I missed the start of this thread but, unless you have a performance 
> > problem 
>  are seriously short of space, my recommendation
> > would be to leave the dynamic files to look after themselves.
> > 
> > A file without overflow is not necessarily the best solution. Winding the 
> lit load down to 70% means that at least 30% of the file
> > is dead space. The implication of this is that the file is larger and will 
> ke more disk reads to process sequentially from one end
> > to the other.
> > 
> > 
> > Martin Phillips
> > Ladybridge Systems Ltd
> > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> > +44 (0)1604-709200
> > 
> > 
> > 
> > -----Original Message-----
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Chris Austin
> > Sent: 05 July 2012 15:19
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > I was able to drop from 30% overflow to 12% by making 2 changes:
> > 
> > 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> ta + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> > 
> > My disk size only went up 8%..
> > 
> > My file looks like this now:
> > 
> > File name ..................   GENACCTRN_POSTED
> > Pathname ...................   GENACCTRN_POSTED
> > File type ..................   DYNAMIC
> > File style and revision ....   32BIT Revision 12
> > Hashing Algorithm ..........   GENERAL
> > No. of groups (modulus) ....   118681 current ( minimum 118681, 140 empty,
> >                                            14431 overflowed, 778 badly )
> > Number of records ..........   1292377
> > Large record size ..........   3267 bytes
> > Number of large records ....   180
> > Group size .................   4096 bytes
> > Load factors ...............   70% (split), 50% (merge) and 63% (actual)
> > Total size .................   546869248 bytes
> > Total size of record data ..   287789178 bytes
> > Total size of record IDs ...   21539538 bytes
> > Unused space ...............   237532340 bytes
> > Total space for records ....   546861056 bytes
> > 
> > Chris
> > 
> > 
> > 
> >> From: keith.john...@datacom.co.nz
> >> To: u2-users@listserver.u2ug.org
> >> Date: Wed, 4 Jul 2012 14:05:02 +1200
> >> Subject: Re: [U2] RESIZE - dynamic files
> >> 
> >> Doug may have had a key bounce in his input
> >> 
> >>> Let's do the math:
> >>> 
> >>> 258687736 (Record Size)
> >>> 192283300 (Key Size)
> >>> ========
> >> 
> >> The key size is actually 19283300 in Chris' figures
> >> 
> >> Regarding 68,063 being less than the current modulus of 82,850.  I think 
> e answer may lie in the splitting process.
> >> 
> >> As I understand it, the first time a split occurs group 1 is split and its 
> ntents are split between new group 1 and new group 2.
> > All the other groups effectively get 1 added to their number. The next 
> > split 
>  group 3 (which was 2) into 3 and 4 and so forth. A
> > pointer is kept to say where the next split will take place and also to 
> > help 
> rt out how to adjust the algorithm to identify which
> > group matches a given key.
> >> 
> >> Based on this, if you started with 1000 groups, by the time you have split 
> e 500th time you will have 1500 groups.  The first
> > 1000 will be relatively empty, the last 500 will probably be overflowed, 
> > but 
> t terribly badly.  By the time you get to the 1000th
> > split, you will have 2000 groups and they will, one hopes, be quite 
> asonably spread with very little overflow.
> >> 
> >> So I expect the average access times would drift up and down in a cycle.  
> e cycle time would get longer as the file gets bigger
> > but the worst time would be roughly the the same each cycle.
> >> 
> >> Given the power of two introduced into the algorithm by the before/after 
> e split thing, I wonder if there is such a need to
> > start off with a prime?
> >> 
> >> Regards, Keith
> >> 
> >> PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
> >> 
> >> _______________________________________________
> >> U2-Users mailing list
> >> U2-Users@listserver.u2ug.org
> >> http://listserver.u2ug.org/mailman/listinfo/u2-users
> >                         
> > _______________________________________________
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
> > 
> > _______________________________________________
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
>                                 
> _____________________________________________
> -Users mailing list
> -us...@listserver.u2ug.org
> tp://listserver.u2ug.org/mailman/listinfo/u2-users
> _____________________________________________
> -Users mailing list
> -us...@listserver.u2ug.org
> tp://listserver.u2ug.org/mailman/listinfo/u2-users
> _______________________________________________
> 2-Users mailing list
> 2-us...@listserver.u2ug.org
> ttp://listserver.u2ug.org/mailman/listinfo/u2-users
> ______________________________________________
> 2-Users mailing list
> 2-us...@listserver.u2ug.org
> ttp://listserver.u2ug.org/mailman/listinfo/u2-users
> 
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
                                          
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] RESIZE - dynamic files

Reply via email to