Re: [NF] INT vs MEDIUMINT vs SMALLINT vs TINYINT

Stephen Russell Tue, 31 Oct 2017 08:22:41 -0700

Here is a good video for B+ tree, and how they operate.

https://www.youtube.com/watch?v=tY9xu3vuZ10


About 5 min in it gets to the demonstration of the science. The creation of
leaves and how things can bounce around you insert.  Just think of Guid,
where you are inserting into branches that have 50 or 5,000 are all
different,   C7948B8E-9B98-48C0-B2EA-11DB5474A99D,
0639AB12-9B34-4A86-9C76-13A38B587C20, 0D14D7A2-DFBA-41C4-9102-14AAB2F77F6D
and
A59F2BA2-4CBF-4E70-B5E3-20F4E7C577DA.  Adding data to the end of the index
has a little performance, but adding all over the place you can get into
leaf/page tears and resetting of headers on leaves.  This is the
performance I was referring to.

In SQL you can define your fillfactor that is "free-space" in your index as
it rebuilds.  Some tables will do well for a lower fillfactor, or more free
space for future inserts when you have any sort of character data.  A
higher value for auto inc column of INT.

When you get an analysis of queries and it says Index Seek, Index Scan the
Seek used the these b+tree headers to find the location, and the index scan
scans the entire index to find the data you desire.  Redundant values in a
status column are a seek and not a scan because the variety of OPEN/Closed
is very limited.

You can seriously spend way too much time attempting to tune these suckers
and as they grow over time you have to come back and tune again.  :)
Tuning with 5,000 rows and in two years you have 1,500,000 rows you may
need to come back and verify a process is still in good shape.



On Tue, Oct 31, 2017 at 6:33 AM, Frank Cazabon <frank.caza...@gmail.com>
wrote:

> Morning Stephen,
>
> What do you mean by "pound script this"? Do you mean to run it over and
> over?
>
> I did that I get the SeqCol coming out in order increasing as per your
> example with the first couple characters in the SeqCol changing and the
> rest the same.
>
> I still don't understand your last 2 paragraphs. As I understood this when
> I researched it each new insert means it goes to the end so is always being
> added at the end so no paging complexity is involved. Could you clarify
> this for me, please?
>
> Frank.
>
> Frank Cazabon
>
>
> On 30/10/2017 03:26 PM, Stephen Russell wrote:
>
>> Frank, that puts the logic at the rdbms engine and no longer in your app,
>> in-front of your data.
>>
>> In reality, you get data all over the place but this bunch will be close.
>> Pound script this a few times and you will see the leading values are what
>> changes an not the last 4-5-6 characters.
>>
>> CREATE TABLE GUID_Example
>> (
>> SeqCol uniqueidentifier DEFAULT NewSequentialID()
>> ,IDCol uniqueidentifier DEFAULT NEWID(),)
>> ----Inserting five default values in table
>> INSERT INTO GUID_Example DEFAULT
>> VALUES
>> INSERT INTO GUID_Example DEFAULT
>> VALUES
>> INSERT INTO GUID_Example DEFAULT
>> VALUES
>> INSERT INTO GUID_Example DEFAULT
>> VALUES
>> ---------------------------------------------------------
>> SELECT *
>> FROM GUID_Example
>>
>> ----Clean up database
>>
>> DROP TABLE GUID_Example
>>
>> I got this as output :
>> SeqCol
>>   IDCol
>> 1E54CB01-A7BD-E711-9C54-D481D71992B4 120C2AD7-ECF1-487F-BE56-0FD36A78237F
>> 1F54CB01-A7BD-E711-9C54-D481D71992B4 E6F5D7B5-61F8-4FD3-988B-F0949A029E29
>> 2054CB01-A7BD-E711-9C54-D481D71992B4 B5C05851-FBA5-4DD7-864A-133EE1BC6C68
>> 2154CB01-A7BD-E711-9C54-D481D71992B4 638865DA-F2E4-4101-9534-E3DB83A0008E
>>
>> When the performance goes to insert all over the index pages where there
>> is
>> a lot of available room you may not have a performance hit at all.  On the
>> flip side using the newsequentialID it may make a compound insert into a
>> page that was starting to get tight and now is tight.
>>
>> Please remember folks that Fkey index is also a component in the insert
>> event as well.  The more indexes you maintain that you
>> really don't need, do get in your way on any platform.
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 30, 2017 at 1:23 PM, Frank Cazabon <frank.caza...@gmail.com>
>> wrote:
>>
>> I believe NewSequentialID() in SQL Server (or the UuidCreateSequential API
>>> call) avoids the paging problem behind Guids.
>>>
>>> Frank.
>>>
>>> Frank Cazabon
>>>
>>>
>>> On 30/10/2017 02:17 PM, Stephen Russell wrote:
>>>
>>> Good URLs you presented.
>>>>
>>>> To me from a performance POV the INSERT of the GUID is the only downside
>>>> with respect to the index.  It has to identify the page and add itself
>>>> to
>>>> there.  If need be it will tear the page and generate two pages with
>>>> access
>>>> holes to accept new index-data going forward.  Next. you look at the
>>>> type
>>>> of data you are presenting via a velocity of inserts.  Are your inserts
>>>> per
>>>> min to a table > 10,000?  If so the GUID may be the wrong thing.  Think
>>>> of
>>>> eBay in the closing seconds of an auction, or your stock trader in
>>>> changes in the market generating A LOT of transactions.  These
>>>> situations
>>>> are where next int is best because it always going to the last page of
>>>> the
>>>> index.
>>>>
>>>> If you are not in that type of data environment you can do either with
>>>> no
>>>> problem.
>>>>
>>>> M$ loves using GUID in their internal systems like CRM or SharePoint.
>>>> It
>>>> is Massive GUID driven in all of the tables.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 30, 2017 at 12:42 PM, <
>>>> mbsoftwaresoluti...@mbsoftwaresolutions.com> wrote:
>>>>
>>>> On 2017-10-30 11:09, Stephen Russell wrote:
>>>>
>>>>> Less efficient indexes?  Are you talking about space in a db compared
>>>>> to
>>>>>
>>>>>> an
>>>>>> int for a pointer or are you saying that the time to isolate the data
>>>>>> on
>>>>>> that row because of the data type of the pointer?  The flip side is
>>>>>> data
>>>>>> insertion.
>>>>>>
>>>>>> Can you tell us why you use less efficient?
>>>>>>
>>>>>>
>>>>>> Not sure of your wording, if you meant exactly that or not, so let me
>>>>>>
>>>>> try
>>>>> to respond:
>>>>>
>>>>> I like the guid v(40) indexes because if ever I needed to combine data,
>>>>> I'm not running into duplicate keys.  Plus, I like defining the key
>>>>> ahead
>>>>> of time and having complete control so I can work with
>>>>> parent/child/grandchild datasets easier (than if I had to contend with
>>>>> auto-inc keys).  The negative of this approach as I understood it is
>>>>> that
>>>>> the since the index is 4x larger in size than a 4-byte integer key, it
>>>>> would not be as efficient in memory, and the index tree needs
>>>>> reindexing
>>>>> more often so as to be balanced.
>>>>>
>>>>> Plenty of good article on the interweb discussing both:
>>>>> http://www.ovaistariq.net/733/understanding-btree-indexes-an
>>>>> d-how-they-impact-performance/#.WfdQDHYpCJA
>>>>> https://blog.codinghorror.com/primary-keys-ids-versus-guids/
>>>>> http://web.archive.org/web/20150511162734/http://databases.
>>>>> aspfaq.com/database/what-should-i-choose-for-my-primary-key.html
>>>>>
>>>>> I think I'll stick with app-generated GUIDs though for the portability
>>>>> and
>>>>> no-collision benefit if I merged/move data.  I also want to do
>>>>> replication
>>>>> where their database is stored locally but then replicates to a master
>>>>> database outside their office.
>>>>>
[excessive quoting removed by server]

_______________________________________________
Post Messages to: ProFox@leafe.com
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/cajidmyko+rdjxjvqabed7t96mbb45recnkizb4uw6av36dq...@mail.gmail.com
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Re: [NF] INT vs MEDIUMINT vs SMALLINT vs TINYINT

Reply via email to