Re: [SQL] UTF8 encoding and non-text data types
Joe [EMAIL PROTECTED] writes: Tom Lane wrote: Oh? Interesting. But even if we wanted to teach Postgres about that, wouldn't there be a pretty strong risk of getting confused by Arabic's right-to-left writing direction? Wouldn't be real helpful if the entry came out as 4321 when the user wanted 1234. Definitely seems like something that had better be left to the application side, where there's more context about what the string means. The Arabic language is written right-to-left, except ... when it comes to numbers. I don't think that matters anyways. Unicode strings are always in logical order, not display order. Displaying the string in the right order is up to the display engine in the Unicode world-view. I'm not sure what to think about this though. It may be that Arabic notation are close enough that it would be straightforward (IIRC decimal notation was invented in the Arabic world after all). But other writing systems have some pretty baroque notations which would be far more difficult to convert. If anything I would expect this kind of conversion to live in the same place as things like roman numerals or other more flexible formatting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] UTF8 encoding and non-text data types
Joe writes: The Arabic language is written right-to-left, except ... when it comes to numbers. Perhaps they read their numbers right to left but use a little-endian notation. -- John Hasler [EMAIL PROTECTED] Elmwood, WI USA ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] UTF8 encoding and non-text data types
Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi On Jan 13, 2008 8:51 PM, Steve Midgley [EMAIL PROTECTED] wrote: At 02:22 PM 1/13/2008, [EMAIL PROTECTED] wrote: Date: Sat, 12 Jan 2008 14:21:00 -0800 From: Medi Montaseri [EMAIL PROTECTED] To: pgsql-sql@postgresql.org Subject: UTF8 encoding and non-text data types Message-ID: [EMAIL PROTECTED] I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve
Re: [SQL] UTF8 encoding and non-text data types
Medi Montaseri [EMAIL PROTECTED] writes: More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Uh, there is *no* difference between the ASCII and UTF8 representations of decimal digits, nor of any other character that would be allowed in input for a decimal field. I can't tell what your problem really is, but you have certainly misunderstood or misexplained it. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] UTF8 encoding and non-text data types
Hi Steve, Have you tried converting to a decimal type or cast for the cost field? If you are gathering this data from a text field and placing in a variable of type string then using that variable in the insert statement it may be rejected because it is not type decimal. This has been my experience with trying to get input data from user's textfields and placing in the db. dana. Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] UTF8 encoding and non-text data types
Sorry this should have been addressed to Medi dana. Hi Steve, Have you tried converting to a decimal type or cast for the cost field? If you are gathering this data from a text field and placing in a variable of type string then using that variable in the insert statement it may be rejected because it is not type decimal. This has been my experience with trying to get input data from user's textfields and placing in the db. dana. Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] UTF8 encoding and non-text data types
On Jan 13, 2008 8:51 PM, Steve Midgley mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: At 02:22 PM 1/13/2008, mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: Date: Sat, 12 Jan 2008 14:21:00 -0800 From: Medi Montaseri mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] To: mailto:pgsql-sql@postgresql.orgpgsql-sql@postgresql.org Subject: UTF8 encoding and non-text data types Message-ID: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve At 11:01 AM 1/14/2008, Medi Montaseri wrote: Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi Hi Medi, I agree that you should convert your values in Perl before handing to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 values it's attempting to quote them or do something with them, that a numeric field in Pg can't handle. Can you trap/monitor the exact sql statement that is generated by DBI and sent to Pg? That would help a lot in knowing what it is doing, but I suspect if you just convert your numbers from the HTML/UTF8 source values into actual Perl numeric values and then ship to DBI you'll be better off. And you'll get some input validation for free. I hope this helps, Steve
Re: [SQL] UTF8 encoding and non-text data types
Here is my traces from perl CGI code, I'll include two samples one in ASCII and one UTF so we know what to expect Here is actual SQL statement being executed in Perl and DBI. I do not quote the numerical value, just provided to DBI raw. insert into t1 (c1, cost) values ('tewt', 1234) this works find insert into t1 (c1, cost) values ('#1588;#1583;', #1777;#1778;#1779;#1780;) DBD::Pg::db do failed: ERROR: syntax error at or near ; at character 59, And the PG log itself is very similar and says ERROR: syntax error at or near ; at character 59 Char 59 by the way is the first accurance of semi-colon as in #1; which is being caught by PG parser. Medi On Jan 14, 2008 12:18 PM, Steve Midgley [EMAIL PROTECTED] wrote: On Jan 13, 2008 8:51 PM, Steve Midgley [EMAIL PROTECTED] wrote: At 02:22 PM 1/13/2008, [EMAIL PROTECTED] wrote: Date: Sat, 12 Jan 2008 14:21:00 -0800 From: Medi Montaseri [EMAIL PROTECTED] To: pgsql-sql@postgresql.org Subject: UTF8 encoding and non-text data types Message-ID: [EMAIL PROTECTED] I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve At 11:01 AM 1/14/2008, Medi Montaseri wrote: Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi Hi Medi, I agree that you should convert your values in Perl before handing to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 values it's attempting to quote them or do something with them, that a numeric field in Pg can't handle. Can you trap/monitor the exact sql statement that is generated by DBI and sent to Pg? That would help a lot in knowing what it is doing, but I suspect if you just convert your numbers from the HTML/UTF8 source values into actual Perl numeric values and then ship to DBI you'll be better off. And you'll get some input validation for free. I hope this helps, Steve
Re: [SQL] UTF8 encoding and non-text data types
At 12:43 PM 1/14/2008, Medi Montaseri wrote: Here is my traces from perl CGI code, I'll include two samples one in ASCII and one UTF so we know what to expect Here is actual SQL statement being executed in Perl and DBI. I do not quote the numerical value, just provided to DBI raw. insert into t1 (c1, cost) values ('tewt', 1234) this works find insert into t1 (c1, cost) values ('#1588;#1583;', #1777;#1778;#1779;#1780;) DBD::Pg::db do failed: ERROR: syntax error at or near ; at character 59, And the PG log itself is very similar and says ERROR: syntax error at or near ; at character 59 Char 59 by the way is the first accurance of semi-colon as in #1; which is being caught by PG parser. Medi On Jan 14, 2008 12:18 PM, Steve Midgley mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: On Jan 13, 2008 8:51 PM, Steve Midgley mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: At 02:22 PM 1/13/2008, mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: Date: Sat, 12 Jan 2008 14:21:00 -0800 From: Medi Montaseri mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] To: mailto:pgsql-sql@postgresql.orgpgsql-sql@postgresql.org Subject: UTF8 encoding and non-text data types Message-ID: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve At 11:01 AM 1/14/2008, Medi Montaseri wrote: Thanks Steve, Actually I do not insert text data into my numeric field. As I mentioned given create table t1 { name text, cost decimal } then I would like to insert numeric data into column cost because then I can later benefit from numerical operators like SUM, AVG, etc More specifically, I am using HTML, Perl and PG. So from the HTML point of view a textfield is just some strings. So my user would enter 12345 but expressed in UTF8. Perl would get this and use DBI to insert it into PG What I am experiencing now is that DB errors that I am trying to insert an incorrect data into column cost which is numeric and the data is coming in from HTML in UTF8 Mybe I have to convert it to ASCII numbers in Perl before inserting them into PG Thanks Medi Hi Medi, I agree that you should convert your values in Perl before handing to DBI. I'm not familiar with DBI but presumably if you're sending it UTF8 values it's attempting to quote them or do something with them, that a numeric field in Pg can't handle. Can you trap/monitor the exact sql statement that is generated by DBI and sent to Pg? That would help a lot in knowing what it is doing, but I suspect if you just convert your numbers from the HTML/UTF8 source values into actual Perl numeric values and then ship to DBI you'll be better off. And you'll get some input validation for free. I hope this helps, Steve Hi Medi, That structure for numeric values is never going to work, as best as I understand Postgres (and other sql pipes). You have to convert those UTF chars to straight numeric format. Hopefully that solves your problem? I hope it's not too hard for you to get at the code which is sending the numbers as UTF? Steve
Re: [SQL] UTF8 encoding and non-text data types
Medi Montaseri [EMAIL PROTECTED] writes: insert into t1 (c1, cost) values ('tewt', 1234) this works find insert into t1 (c1, cost) values ('#1588;#1583;', #1777;#1778;#1779;#1780;) DBD::Pg::db do failed: ERROR: syntax error at or near ; at character 59, Well, you've got two problems there. The first and biggest is that #NNN; is an HTML notation, not a SQL notation; no SQL database is going to think that that string in its input is a representation of a single Unicode character. The other problem is that even if this did happen, code points 1777 and nearby are not digits; they're something or other in Arabic, apparently. So I think you've got a problem in your Unicode conversions as well as a notational problem. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [SQL] UTF8 encoding and non-text data types
Joe [EMAIL PROTECTED] writes: Tom Lane wrote: Well, you've got two problems there. The first and biggest is that #NNN; is an HTML notation, not a SQL notation; no SQL database is going to think that that string in its input is a representation of a single Unicode character. The other problem is that even if this did happen, code points 1777 and nearby are not digits; they're something or other in Arabic, apparently. Precisely. 1777 through 1780 decimal equate to code points U+06F1 through U+06F4, which correspond to the Arabic numerals 1 through 4. Oh? Interesting. But even if we wanted to teach Postgres about that, wouldn't there be a pretty strong risk of getting confused by Arabic's right-to-left writing direction? Wouldn't be real helpful if the entry came out as 4321 when the user wanted 1234. Definitely seems like something that had better be left to the application side, where there's more context about what the string means. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] UTF8 encoding and non-text data types
Tom Lane wrote: Medi Montaseri [EMAIL PROTECTED] writes: insert into t1 (c1, cost) values ('tewt', 1234) this works find insert into t1 (c1, cost) values ('#1588;#1583;', #1777;#1778;#1779;#1780;) DBD::Pg::db do failed: ERROR: syntax error at or near ; at character 59, Well, you've got two problems there. The first and biggest is that #NNN; is an HTML notation, not a SQL notation; no SQL database is going to think that that string in its input is a representation of a single Unicode character. The other problem is that even if this did happen, code points 1777 and nearby are not digits; they're something or other in Arabic, apparently. Precisely. 1777 through 1780 decimal equate to code points U+06F1 through U+06F4, which correspond to the Arabic numerals 1 through 4. Joe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] UTF8 encoding and non-text data types
Tom Lane wrote: Oh? Interesting. But even if we wanted to teach Postgres about that, wouldn't there be a pretty strong risk of getting confused by Arabic's right-to-left writing direction? Wouldn't be real helpful if the entry came out as 4321 when the user wanted 1234. Definitely seems like something that had better be left to the application side, where there's more context about what the string means. The Arabic language is written right-to-left, except ... when it comes to numbers. http://www2.ignatius.edu/faculty/turner/arabic/anumbers.htm I agree that it's application specific. The HTML/Perl script ought to convert to Western numerals. Joe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] UTF8 encoding and non-text data types
At 02:22 PM 1/13/2008, [EMAIL PROTECTED] wrote: Date: Sat, 12 Jan 2008 14:21:00 -0800 From: Medi Montaseri [EMAIL PROTECTED] To: pgsql-sql@postgresql.org Subject: UTF8 encoding and non-text data types Message-ID: [EMAIL PROTECTED] I understand PG supports UTF-8 encoding and I have sucessfully inserted Unicode text into columns. I was wondering about other data types such as numbers, decimal, dates That is, say I have a table t1 with create table t1 { name text, cost decimal } I can insert UTF8 text datatype into this table with no problem But if my application attempts to insert numbers encloded in UTF8, then I get wrong datatype error Is the solution for the application layer (not database) to convert the non-text UTF8 numbers to ASCII and then insert it into database ? Thanks Medi Hi Medi, I have only limited experience in this area, but it sounds like you sending your numbers as strings? In your example: create table t1 { name text, cost decimal }; insert into t1 (name, cost) values ('name1', '1'); I can't think of how else you're sending numeric values as UTF8? I know that Pg will accept numbers as strings and convert internally (that has worked for me in some object relational environments where I don't choose to cope with data types), but I think it would be better if you simply didn't send your numeric data in quotations, whether as UTF8 or ASCII. If you don't have control over this layer (that quotes your values), then I'd say converting to ASCII would solve the problem. But better to convert to numeric and not ship quoted strings at all. I may be totally off-base and missing something fundamental and I'm very open to correction (by anyone), but that's what I can see here. Best regards, Steve ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org