Re[2]: Update of the default inline size for variable types
Huge +1 with Ilya I check your pr, this looks like stub : Pattern . compile( " \\ w+ \\ (( \\ d+) \\ ) " ); * Do we have some normalization before it ? varchar(whitespace + N) looks like not matching. * Can we obtain this info not from regexp ? >Hello! > >I can see where you are getting at but, as far as my experience tells me, >64 is already too large for the average use case. It will also start to >drag on the performance since you don't have too many entries in one page >anymore, and your tree starts to grow up, not to mention more i/o. > >I think we should benchmark it, see at which value we see a sharp decline. >Maybe 64 is OK after all, if it's a maximum for a complex index. Just make >sure that a single VARCHAR without length is still 10 and not 64. > >Regards, >-- >Ilya Kasnacheev > > >чт, 20 авг. 2020 г. в 11:15, Evgeniy Rudenko < e.a.rude...@gmail.com >: > >> Hi guys, >> >> Thank you for your feedback. >> >> Current calculation of the default size is not completely correct. If it >> meets a field of the variable length (such as byte array or string) it just >> stops any attempt to make index size more reasonable and uses >> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't >> seem correct to me in any case. First part of the update changes this logic >> and starts to calculate size based on all indexed columns. This update can >> even save some space for the users with varchars and high >> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value. >> >> Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. >> Please note that we are changing only upper bound of the default size. >> Obviously this can lead to some increase of the used space, but we are >> trading size for the speed here. Current default value is too small for the >> average usage case. Users which care about size of the data still can set >> exact size of each index or limit all sizes by >> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which >> would want to keep previous data size will just need to set >> IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10. >> >> >> >> On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov < vldpyat...@gmail.com > >> wrote: >> >> > Hi, >> > >> > In my mind, the inline size 64 will be able to significant grow of >> storage >> > size. >> > It can be difficult to understand by users. >> > >> > Earlier I remember we panned to replace inline value to hash code in the >> > case where size of value more than inline size. >> > It will help to comparison of "==", "!=", but will not grow size of >> > storage. >> > >> > I think optimization with hash code looks more preferable and in last way >> > anyone can to grow size of baseline though API. >> > >> > >> > On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky >> > < arzamas...@mail.ru.invalid > wrote: >> > >> > > >> > > >> > > >Hi guys, >> > > >> > > Evgeniy, hola! >> > > > >> > > >Currently if a varlength type (such as String or byte[]) is >> encountered >> > in >> > > >the composite index inline size just defaults to 10, which is almost >> > > always >> > > >not enough. I am going to change this and implement following changes: >> > > > >> > > >1) For a column of the variable length keep using 10 as the default >> size >> > > in >> > > >case of the one-column index. But if the index is composite the >> default >> > > >index size will be calculated as the sum of sizes of all indexed >> > columns. >> > > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default >> > > inline >> > > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each >> string). >> > > >> > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you >> have >> > > some logical base, statistical distribution or something near it, for >> now >> > > this look as your own decision and nothing more, i`m wrong ? >> > > > >> > > >2) For sql varchar and binary columns with defined length (for example >> > > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 >> > > extra >> > > >bytes for the inner representation of the type). >> > > >> > > The same question here, why you want o cover all varchar len ? do you >> > > compare with other vendors approach ? >> > > > >> > > >3) Maximum default index size still will be limited by >> > > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased >> > to >> > > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, >> > > VARCHAR, >> > > >VARCHAR, VARCHAR) default index size will be only 64. Same for the >> > columns >> > > >with defined length: by default VARCHAR(100) column will create index >> > only >> > > >with size equal to 64. >> > > > >> > > >Please tell if you have any concerns. Update can be found at >> > > > https://github.com/apache/ignite/pull/8161 >> > > > >> > > >Best regards, >> > > >Evgeniy >> > > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > -- >> > Vladislav Pyatkov >> > >> >> >> -- >> Best regards, >> Evgeniy >>
Re: Update of the default inline size for variable types
Hello! I can see where you are getting at but, as far as my experience tells me, 64 is already too large for the average use case. It will also start to drag on the performance since you don't have too many entries in one page anymore, and your tree starts to grow up, not to mention more i/o. I think we should benchmark it, see at which value we see a sharp decline. Maybe 64 is OK after all, if it's a maximum for a complex index. Just make sure that a single VARCHAR without length is still 10 and not 64. Regards, -- Ilya Kasnacheev чт, 20 авг. 2020 г. в 11:15, Evgeniy Rudenko : > Hi guys, > > Thank you for your feedback. > > Current calculation of the default size is not completely correct. If it > meets a field of the variable length (such as byte array or string) it just > stops any attempt to make index size more reasonable and uses > IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't > seem correct to me in any case. First part of the update changes this logic > and starts to calculate size based on all indexed columns. This update can > even save some space for the users with varchars and high > IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value. > > Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. > Please note that we are changing only upper bound of the default size. > Obviously this can lead to some increase of the used space, but we are > trading size for the speed here. Current default value is too small for the > average usage case. Users which care about size of the data still can set > exact size of each index or limit all sizes by > IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which > would want to keep previous data size will just need to set > IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10. > > > > On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov > wrote: > > > Hi, > > > > In my mind, the inline size 64 will be able to significant grow of > storage > > size. > > It can be difficult to understand by users. > > > > Earlier I remember we panned to replace inline value to hash code in the > > case where size of value more than inline size. > > It will help to comparison of "==", "!=", but will not grow size of > > storage. > > > > I think optimization with hash code looks more preferable and in last way > > anyone can to grow size of baseline though API. > > > > > > On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky > > wrote: > > > > > > > > > > > >Hi guys, > > > > > > Evgeniy, hola! > > > > > > > >Currently if a varlength type (such as String or byte[]) is > encountered > > in > > > >the composite index inline size just defaults to 10, which is almost > > > always > > > >not enough. I am going to change this and implement following changes: > > > > > > > >1) For a column of the variable length keep using 10 as the default > size > > > in > > > >case of the one-column index. But if the index is composite the > default > > > >index size will be calculated as the sum of sizes of all indexed > > columns. > > > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default > > > inline > > > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each > string). > > > > > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you > have > > > some logical base, statistical distribution or something near it, for > now > > > this look as your own decision and nothing more, i`m wrong ? > > > > > > > >2) For sql varchar and binary columns with defined length (for example > > > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 > > > extra > > > >bytes for the inner representation of the type). > > > > > > The same question here, why you want o cover all varchar len ? do you > > > compare with other vendors approach ? > > > > > > > >3) Maximum default index size still will be limited by > > > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased > > to > > > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, > > > VARCHAR, > > > >VARCHAR, VARCHAR) default index size will be only 64. Same for the > > columns > > > >with defined length: by default VARCHAR(100) column will create index > > only > > > >with size equal to 64. > > > > > > > >Please tell if you have any concerns. Update can be found at > > > >https://github.com/apache/ignite/pull/8161 > > > > > > > >Best regards, > > > >Evgeniy > > > > > > > > > > > > > > > > > > > > > > > > -- > > Vladislav Pyatkov > > > > > -- > Best regards, > Evgeniy >
Re: Update of the default inline size for variable types
Hi guys, Thank you for your feedback. Current calculation of the default size is not completely correct. If it meets a field of the variable length (such as byte array or string) it just stops any attempt to make index size more reasonable and uses IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT as its size. Such approach doesn't seem correct to me in any case. First part of the update changes this logic and starts to calculate size based on all indexed columns. This update can even save some space for the users with varchars and high IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT value. Second part of the update increases IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. Please note that we are changing only upper bound of the default size. Obviously this can lead to some increase of the used space, but we are trading size for the speed here. Current default value is too small for the average usage case. Users which care about size of the data still can set exact size of each index or limit all sizes by IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT. So after the update users which would want to keep previous data size will just need to set IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT=10. On Wed, Aug 19, 2020 at 5:20 PM Vladislav Pyatkov wrote: > Hi, > > In my mind, the inline size 64 will be able to significant grow of storage > size. > It can be difficult to understand by users. > > Earlier I remember we panned to replace inline value to hash code in the > case where size of value more than inline size. > It will help to comparison of "==", "!=", but will not grow size of > storage. > > I think optimization with hash code looks more preferable and in last way > anyone can to grow size of baseline though API. > > > On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky > wrote: > > > > > > > >Hi guys, > > > > Evgeniy, hola! > > > > > >Currently if a varlength type (such as String or byte[]) is encountered > in > > >the composite index inline size just defaults to 10, which is almost > > always > > >not enough. I am going to change this and implement following changes: > > > > > >1) For a column of the variable length keep using 10 as the default size > > in > > >case of the one-column index. But if the index is composite the default > > >index size will be calculated as the sum of sizes of all indexed > columns. > > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default > > inline > > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string). > > > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have > > some logical base, statistical distribution or something near it, for now > > this look as your own decision and nothing more, i`m wrong ? > > > > > >2) For sql varchar and binary columns with defined length (for example > > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 > > extra > > >bytes for the inner representation of the type). > > > > The same question here, why you want o cover all varchar len ? do you > > compare with other vendors approach ? > > > > > >3) Maximum default index size still will be limited by > > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased > to > > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, > > VARCHAR, > > >VARCHAR, VARCHAR) default index size will be only 64. Same for the > columns > > >with defined length: by default VARCHAR(100) column will create index > only > > >with size equal to 64. > > > > > >Please tell if you have any concerns. Update can be found at > > >https://github.com/apache/ignite/pull/8161 > > > > > >Best regards, > > >Evgeniy > > > > > > > > > > > > > > > -- > Vladislav Pyatkov > -- Best regards, Evgeniy
Re: Update of the default inline size for variable types
Hi, In my mind, the inline size 64 will be able to significant grow of storage size. It can be difficult to understand by users. Earlier I remember we panned to replace inline value to hash code in the case where size of value more than inline size. It will help to comparison of "==", "!=", but will not grow size of storage. I think optimization with hash code looks more preferable and in last way anyone can to grow size of baseline though API. On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky wrote: > > > >Hi guys, > > Evgeniy, hola! > > > >Currently if a varlength type (such as String or byte[]) is encountered in > >the composite index inline size just defaults to 10, which is almost > always > >not enough. I am going to change this and implement following changes: > > > >1) For a column of the variable length keep using 10 as the default size > in > >case of the one-column index. But if the index is composite the default > >index size will be calculated as the sum of sizes of all indexed columns. > >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default > inline > >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string). > > Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have > some logical base, statistical distribution or something near it, for now > this look as your own decision and nothing more, i`m wrong ? > > > >2) For sql varchar and binary columns with defined length (for example > >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 > extra > >bytes for the inner representation of the type). > > The same question here, why you want o cover all varchar len ? do you > compare with other vendors approach ? > > > >3) Maximum default index size still will be limited by > >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to > >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, > VARCHAR, > >VARCHAR, VARCHAR) default index size will be only 64. Same for the columns > >with defined length: by default VARCHAR(100) column will create index only > >with size equal to 64. > > > >Please tell if you have any concerns. Update can be found at > >https://github.com/apache/ignite/pull/8161 > > > >Best regards, > >Evgeniy > > > > > > -- Vladislav Pyatkov
Re: Update of the default inline size for variable types
>Hi guys, Evgeniy, hola! > >Currently if a varlength type (such as String or byte[]) is encountered in >the composite index inline size just defaults to 10, which is almost always >not enough. I am going to change this and implement following changes: > >1) For a column of the variable length keep using 10 as the default size in >case of the one-column index. But if the index is composite the default >index size will be calculated as the sum of sizes of all indexed columns. >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default inline >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string). Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have some logical base, statistical distribution or something near it, for now this look as your own decision and nothing more, i`m wrong ? > >2) For sql varchar and binary columns with defined length (for example >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 extra >bytes for the inner representation of the type). The same question here, why you want o cover all varchar len ? do you compare with other vendors approach ? > >3) Maximum default index size still will be limited by >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, >VARCHAR, VARCHAR) default index size will be only 64. Same for the columns >with defined length: by default VARCHAR(100) column will create index only >with size equal to 64. > >Please tell if you have any concerns. Update can be found at >https://github.com/apache/ignite/pull/8161 > >Best regards, >Evgeniy >
Update of the default inline size for variable types
Hi guys, Currently if a varlength type (such as String or byte[]) is encountered in the composite index inline size just defaults to 10, which is almost always not enough. I am going to change this and implement following changes: 1) For a column of the variable length keep using 10 as the default size in case of the one-column index. But if the index is composite the default index size will be calculated as the sum of sizes of all indexed columns. For example, for the index like (INT, VARCHAR, VARCHAR, INT) default inline size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string). 2) For sql varchar and binary columns with defined length (for example VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3 extra bytes for the inner representation of the type). 3) Maximum default index size still will be limited by IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to 64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR) default index size will be only 64. Same for the columns with defined length: by default VARCHAR(100) column will create index only with size equal to 64. Please tell if you have any concerns. Update can be found at https://github.com/apache/ignite/pull/8161 Best regards, Evgeniy