Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
Dňa 14.10.2016 o 10:08 Tony Whyman via Lazarus napísal(a): On 14/10/16 06:43, LacaK via Lazarus wrote: I do not know IBX, but don't you use overriden TDataSet.InternalInitFieldDefs ? It will allow you put extra info into FieldDef and then use overriden TDataSet.CreateFields, which will allow you pass extra info from TIBFieldDef into TIBStringField for example ... (AFAICS Zeos do it in this way also) That is basically what IBX does. Not only IBX I think ;-) I suppose that all TDataSet descendants must follow this, because in Delphi is TFieldDef.CreateField also not virtual. Probably there is logic, why it is designed as is (may be CreateFields/BindFields should care about TFieldDef->TField). My point is that it would be better to put the passing of the extra info into a subclassed TFieldDef rather than have it in TIBCustomDataSet. I understand, but I think, that you must care also about persistent fields, where you must hook into BindFields (in all cases there are only 3 lines of code in CreateFields which iteratte over FieldDefs and create field so IMO no big problem override it) -Laco. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
On 14/10/16 06:43, LacaK via Lazarus wrote: I do not know IBX, but don't you use overriden TDataSet.InternalInitFieldDefs ? It will allow you put extra info into FieldDef and then use overriden TDataSet.CreateFields, which will allow you pass extra info from TIBFieldDef into TIBStringField for example ... (AFAICS Zeos do it in this way also) That is basically what IBX does. My point is that it would be better to put the passing of the extra info into a subclassed TFieldDef rather than have it in TIBCustomDataSet. After all, isn't the whole point of OO programming to group related functionality into the same class. If you recommend subclassing the TFieldDef then surely it makes sense to make CreateField a virtual method. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
I am aware of it. I have not added all other MBCS because ! I doubt, which are realy used nowadays. My guess is that UTF-8 is far most used / supported as client character set. No problem to add them if there will be real demand from users ... Perhaps the correct answer is to let the database driver work this one out rather than have a fixed decision in the FCL. I would suggest the following change: function TStringField.GetDataSize: Integer; My intention was made TStringField independent from TFieldDef, at least because we can have FieldDef=nil (for lookup, calculated fields) Of course I can introduce class procedure (or regular procedure), with ACodePage parameter, which will be called from TStringField and TFieldDef, so all code will be in one place. In IBX, I have already done this using TIBFieldDef and TIBStringField as subclasses in order to pass character set information. However, because TFieldDef.CreateField is non-virtual, the implementation is not as elegant as it should be. That is the extra info is added to the TIBStringField as the dataset is opened I do not know IBX, but don't you use overriden TDataSet.InternalInitFieldDefs ? It will allow you put extra info into FieldDef and then use overriden TDataSet.CreateFields, which will allow you pass extra info from TIBFieldDef into TIBStringField for example ... (AFAICS Zeos do it in this way also) -Laco. rather than when the field is created. It is also less maintainable as the functionality should be in TIBFieldDef rather than in a different class altogether. Making those two methods virtual is the most important change. I can live with TStringField.GetDataSize as it is because that is already virtual and a future TIBStringField can readily override it. Tony Whyman MWA -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
On 11/10/16 19:21, LacaK via Lazarus wrote: I am aware of it. I have not added all other MBCS because ! I doubt, which are realy used nowadays. My guess is that UTF-8 is far most used / supported as client character set. No problem to add them if there will be real demand from users ... Perhaps the correct answer is to let the database driver work this one out rather than have a fixed decision in the FCL. I would suggest the following change: function TStringField.GetDataSize: Integer; begin Result := FieldDef.CharSize * Size + 1; // case FCodePage of //CP_UTF8: Result := 4*Size+1; //else Result := Size+1; // end; end; TFieldDef.GetCharSize uses the same algorithm so this avoids a code duplication anyway. But I also want to make TFieldDef.GetCharSize and TFieldDef.CreateField virtual methods. That way a database driver can readily expand upon the character sets supported to match what it supports rather than be limited by the FCL default. In IBX, I have already done this using TIBFieldDef and TIBStringField as subclasses in order to pass character set information. However, because TFieldDef.CreateField is non-virtual, the implementation is not as elegant as it should be. That is the extra info is added to the TIBStringField as the dataset is opened rather than when the field is created. It is also less maintainable as the functionality should be in TIBFieldDef rather than in a different class altogether. Making those two methods virtual is the most important change. I can live with TStringField.GetDataSize as it is because that is already virtual and a future TIBStringField can readily override it. Tony Whyman MWA -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
> > Which FPC version is this likely to be released in?> 3.0.2 - no 3.0.4 - ? 3.2.0 - yes > On a quick review of the code, all seems good. Just one point: >GetDataSize seems to acknowledge CP_UTF8 as the only multibyte >character set. The Firebird character set GB18030 (Chinese >characters) is multi-byte (see wikipedia) and has code page 54936. I >believe PostgreSQL also supports it. I am aware of it. I have not added all other MBCS because I doubt, which are realy used nowadays.My guess is that UTF-8 is far most used / supported as client character set. No problem to add them if there will be real demand from users ... -Laco. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
On Tuesday 11 October 2016 16:19:18 Tony Whyman via Lazarus wrote: > On 11/10/16 15:14, Martin Schreiber via Lazarus wrote: > > case i2 of > > 5,6,8,44,56,57,64: begin > > Agree with 5,6, 44, 56, 57 as two byte character sets. > > 8 doesn't seem to exist (at least in my version). > > 64 is KOI8U. According to Wikipedia "KOI8-U is an 8-bit character > encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. > It is based on KOI8-R, which covers Russian and Bulgarian, but replaces > eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in > both upper case and lower case." > > It should be the same character width as character set 63 (KOI8R) i.e. 1. I don't remember from where I got the numbers. Maybe by reading FB-sources or reading RDB$BYTES_PER_CHARACTER. On my current FB 2.5 installation "64" actually is 1 byte per character, thanks for pointing it out, it allready has been fixed in git master. In MSEgui and the newer ZEOSlib versions the datasize of a string in TDataset record buffer is sizeof(pointer). In MSEgui it is a UnicodeString in ZEOS IIRC a pointer to a special string record. Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
On 11/10/16 15:14, Martin Schreiber via Lazarus wrote: MSEgui uses below code in order to guess the Firebird character size: " FYI, this is the table IBX uses to look up character sets and code pages: CharSetMap: array [0..69] of TCharsetMap = ( (CharsetID: 0; CharSetName: 'NONE'; CharSetWidth: 1; CodePage: CP_NONE), (CharsetID: 1; CharSetName: 'OCTETS'; CharSetWidth: 1; CodePage: CP_NONE), (CharsetID: 2; CharSetName: 'ASCII'; CharSetWidth: 1; CodePage: CP_ASCII), (CharsetID: 3; CharSetName: 'UNICODE_FSS'; CharSetWidth: 3; CodePage: CP_UTF8), (CharsetID: 4; CharSetName: 'UTF8'; CharSetWidth: 4; CodePage: CP_UTF8), (CharsetID: 5; CharSetName: 'SJIS_0208'; CharSetWidth: 2; CodePage: 20932), (CharsetID: 6; CharSetName: 'EUCJ_0208'; CharSetWidth: 2; CodePage: 20932), (CharsetID: 7; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 8; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 9; CharSetName: 'DOS737'; CharSetWidth: 1; CodePage: 737), (CharsetID: 10; CharSetName: 'DOS437'; CharSetWidth: 1; CodePage: 437), (CharsetID: 11; CharSetName: 'DOS850'; CharSetWidth: 1; CodePage: 850), (CharsetID: 12; CharSetName: 'DOS865'; CharSetWidth: 1; CodePage: 865), (CharsetID: 13; CharSetName: 'DOS860'; CharSetWidth: 1; CodePage: 860), (CharsetID: 14; CharSetName: 'DOS863'; CharSetWidth: 1; CodePage: 863), (CharsetID: 15; CharSetName: 'DOS775'; CharSetWidth: 1; CodePage: 775), (CharsetID: 16; CharSetName: 'DOS858'; CharSetWidth: 1; CodePage: 858), (CharsetID: 17; CharSetName: 'DOS862'; CharSetWidth: 1; CodePage: 862), (CharsetID: 18; CharSetName: 'DOS864'; CharSetWidth: 1; CodePage: 864), (CharsetID: 19; CharSetName: 'NEXT'; CharSetWidth: 1; CodePage: CP_NONE), (CharsetID: 20; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 21; CharSetName: 'ISO8859_1'; CharSetWidth: 1; CodePage: 28591), (CharsetID: 22; CharSetName: 'ISO8859_2'; CharSetWidth: 1; CodePage: 28592), (CharsetID: 23; CharSetName: 'ISO8859_3'; CharSetWidth: 1; CodePage: 28593), (CharsetID: 24; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 25; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 26; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 27; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 28; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 29; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 30; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 31; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 32; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 33; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 34; CharSetName: 'ISO8859_4'; CharSetWidth: 1; CodePage: 28594), (CharsetID: 35; CharSetName: 'ISO8859_5'; CharSetWidth: 1; CodePage: 28595), (CharsetID: 36; CharSetName: 'ISO8859_6'; CharSetWidth: 1; CodePage: 28596), (CharsetID: 37; CharSetName: 'ISO8859_7'; CharSetWidth: 1; CodePage: 28597), (CharsetID: 38; CharSetName: 'ISO8859_8'; CharSetWidth: 1; CodePage: 28598), (CharsetID: 39; CharSetName: 'ISO8859_9'; CharSetWidth: 1; CodePage: 28599), (CharsetID: 40; CharSetName: 'ISO8859_13'; CharSetWidth: 1; CodePage: 28603), (CharsetID: 41; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 42; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 43; CharSetName: 'Unknown'; CharSetWidth: 0; CodePage: CP_NONE), (CharsetID: 44; CharSetName: 'KSC_5601'; CharSetWidth: 2; CodePage: 949), (CharsetID: 45; CharSetName: 'DOS852'; CharSetWidth: 1; CodePage: 852), (CharsetID: 46; CharSetName: 'DOS857'; CharSetWidth: 1; CodePage: 857), (CharsetID: 47; CharSetName: 'DOS861'; CharSetWidth: 1; CodePage: 861), (CharsetID: 48; CharSetName: 'DOS866'; CharSetWidth: 1; CodePage: 866), (CharsetID: 49; CharSetName: 'DOS869'; CharSetWidth: 1; CodePage: 869), (CharsetID: 50; CharSetName: 'CYRL'; CharSetWidth: 1; CodePage: 1251), (CharsetID: 51; CharSetName: 'WIN1250'; CharSetWidth: 1; CodePage: 1250), (CharsetID: 52; CharSetName: 'WIN1251'; CharSetWidth: 1; CodePage: 1251), (CharsetID: 53; CharSetName: 'WIN1252'; CharSetWidth: 1; CodePage: 1252), (CharsetID: 54; CharSetName: 'WIN1253'; CharSetWidth: 1; CodePage: 1253), (CharsetID: 55; CharSetName: 'WIN1254'; CharSetWidth: 1; CodePage: 1254), (CharsetID: 56; CharSetName: 'BIG_5'; CharSetWidth: 2; CodePage: 950), (CharsetID: 57; CharSetName: 'GB_2312'; CharSetWidth: 2; CodePage: 936), (CharsetID: 58; CharSetName: 'WIN1255'; CharSetWidth: 1; CodePage: 1255), (CharsetID: 59; CharSetName: 'WIN1256'; CharSetWidth: 1; CodePage: 1256), (CharsetID: 60; CharSetName: 'WIN1257'; CharSetWidth: 1; CodePage: 1257), (CharsetID: 61;
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
On 11/10/16 15:14, Martin Schreiber via Lazarus wrote: case i2 of 5,6,8,44,56,57,64: begin Agree with 5,6, 44, 56, 57 as two byte character sets. 8 doesn't seem to exist (at least in my version). 64 is KOI8U. According to Wikipedia "KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case." It should be the same character width as character set 63 (KOI8R) i.e. 1. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
Please look at changes in TRUNK. May be that not all is perfect, but you will see there direction ... -Laco. Which FPC version is this likely to be released in? On a quick review of the code, all seems good. Just one point: GetDataSize seems to acknowledge CP_UTF8 as the only multibyte character set. The Firebird character set GB18030 (Chinese characters) is multi-byte (see wikipedia) and has code page 54936. I believe PostgreSQL also supports it. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
An IBX user came to me with a problem and the problem seems to be a deep seated disconnect between multi-byte character sets, TStringField.Size and TDBEdit.MaxLength. Something needs to give - but I am not sure what should. Firstly documentation: If you go back to Delphi, TField.DataSize is the memory needed to hold the Field's value. The DisplayWidth is the number of characters to be displayed, and Size is, for datatype ftstring, "the maximum number of characters in the string". Right. TStringField.Size is size of characters, not bytes How literally this last definition should be taken, I'm not sure, as it may well have been written assuming a single byte character set. On the other hand, the FPC documentation is consistent with Delphi for DisplayWidth and DataSize, but more opaque for TField.Size where it is the "logical size" - whatever that means, although TStringField is more definitive by saying it is the maximum size (in characters) - their brackets not mine. That seems to be consistent with TDBEdit.Maxlength which should be the maximum number of characters that can appear in the control and, if you look at the code, TDBEdit will source the default value from FDatalink.Size (And also seems to ignore DisplayWidth). TDBEdit.MaxLength must correspond to TStringField.Size The problem comes when you look at the code for TStringField.GetValue, where it starts off as: function TStringField.GetValue(var AValue: string): Boolean; var Buf, TBuf : TStringFieldBuffer; DynBuf, TDynBuf : Array of char; begin if DataSize <= dsMaxStringSize then begin Result:=GetData(@Buf); Buf[Size]:=#0; //limit string to Size If Result then begin ... Look at TRUNK, there is already changed code, which takes DataSize ;-) If nothing else, this is a "bug in waiting". TStringField.GetDataSize always returns "Size+1", so "Buff[Size]:=#0; should work - but only as long as the virtual method "GetDataSize" is not overridden (GetValue is non-virtual) and Size is the byte length of the string! in TRUNK is GetDataSize changed also, it takes into account Fields CharSet and for UTF8 returns 4*Size+1 There is a built-in assumption here that "Size" is the byte length of the string and not the character length. this assumption came from old Delphi days, where it was so for SBCS If you have a multi-byte character set and set size to the number of characters and DataSize to e.g. for UTF8 4*(no of characters)+1, then you will get string corruption as a result of the above. IBX handles multi-byte character sets and does so by defining TIBStringField as a subclass of TStringFIeld and setting size to the byte length and the Default DisplayWidth to the character width. This is compatible with TStringField as it works today. It also seems to be compatible with TDBGrid, which uses Field.DisplayWidth. However, it does result in TDBEdit accepting too many characters. What should be done? It's a problem. Ideally, the TStringField code should be aligned with the documentation. However, that could break existing code and would need to handled carefully. TStringFIeld also needs fixing i.e. to Buf[DataSize-1]:=#0 in order to make this a reality. Size must be character size (used for visual components if they handle characters) DataSize must be byte size (used for record buffers to store character data) Alternatively, the documentation could be amended to reflect the implementation. This means that TDBEdit (and maybe more) have to be updated - but why doesn't TDBEdit respect the DisplayWidth property anyway? Perhaps, it is also about time that TStringField got a characterWidth property to hold the maximum number of bytes for each character. That would at least allow the DataSize to be automatically computed from the character width. There is new TFieldDef.CharSize which says how many bytes is one character If I had to write a bug report today, I would write it to avoid changes to IBX - but then is that the right answer? Please look at changes in TRUNK. May be that not all is perfect, but you will see there direction ... -Laco. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus
[Lazarus] TDBEdit, TStringField Size, DataSize, DisplayWidth and MaxLength
An IBX user came to me with a problem and the problem seems to be a deep seated disconnect between multi-byte character sets, TStringField.Size and TDBEdit.MaxLength. Something needs to give - but I am not sure what should. Firstly documentation: If you go back to Delphi, TField.DataSize is the memory needed to hold the Field's value. The DisplayWidth is the number of characters to be displayed, and Size is, for datatype ftstring, "the maximum number of characters in the string". How literally this last definition should be taken, I'm not sure, as it may well have been written assuming a single byte character set. On the other hand, the FPC documentation is consistent with Delphi for DisplayWidth and DataSize, but more opaque for TField.Size where it is the "logical size" - whatever that means, although TStringField is more definitive by saying it is the maximum size (in characters) - their brackets not mine. That seems to be consistent with TDBEdit.Maxlength which should be the maximum number of characters that can appear in the control and, if you look at the code, TDBEdit will source the default value from FDatalink.Size (And also seems to ignore DisplayWidth). The problem comes when you look at the code for TStringField.GetValue, where it starts off as: function TStringField.GetValue(var AValue: string): Boolean; var Buf, TBuf : TStringFieldBuffer; DynBuf, TDynBuf : Array of char; begin if DataSize <= dsMaxStringSize then begin Result:=GetData(@Buf); Buf[Size]:=#0; //limit string to Size If Result then begin ... If nothing else, this is a "bug in waiting". TStringField.GetDataSize always returns "Size+1", so "Buff[Size]:=#0; should work - but only as long as the virtual method "GetDataSize" is not overridden (GetValue is non-virtual) and Size is the byte length of the string! There is a built-in assumption here that "Size" is the byte length of the string and not the character length. If you have a multi-byte character set and set size to the number of characters and DataSize to e.g. for UTF8 4*(no of characters)+1, then you will get string corruption as a result of the above. IBX handles multi-byte character sets and does so by defining TIBStringField as a subclass of TStringFIeld and setting size to the byte length and the Default DisplayWidth to the character width. This is compatible with TStringField as it works today. It also seems to be compatible with TDBGrid, which uses Field.DisplayWidth. However, it does result in TDBEdit accepting too many characters. What should be done? It's a problem. Ideally, the TStringField code should be aligned with the documentation. However, that could break existing code and would need to handled carefully. TStringFIeld also needs fixing i.e. to Buf[DataSize-1]:=#0 in order to make this a reality. Alternatively, the documentation could be amended to reflect the implementation. This means that TDBEdit (and maybe more) have to be updated - but why doesn't TDBEdit respect the DisplayWidth property anyway? Perhaps, it is also about time that TStringField got a characterWidth property to hold the maximum number of bytes for each character. That would at least allow the DataSize to be automatically computed from the character width. If I had to write a bug report today, I would write it to avoid changes to IBX - but then is that the right answer? Tony Whyman MWA -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org http://lists.lazarus-ide.org/listinfo/lazarus