Re: unicode, blobs, and the web

2019-09-01 Thread Janie Marlow via 4D_Tech
Thank you for that explanation, Miyako!



From: Keisuke Miyako via 4D_Tech <4d_tech@lists.4d.com>
Cc: Keisuke Miyako 

> 2019/08/29 12:31ÅAwebmaster namethatplant.net 
> ÇÃÉÅÅ[Éã:
>
> I still have a question.
> The write-up for PROCESS HTML TAGS contains the warning, "It is now highly 
> inadvisable to store texts in BLOBs".
>
> I don't use the command "PROCESS HTML TAGS" but I have used BLOB variables in 
> several other places throughout my website. These don't contain any 
> troublemaker characters and they are working well.
>
> Do I need to convert them to text variables?

storing text in BLOB is inadvisable for several reasons:

BLOBs are not ref-counted and memory inefficient. there are copied when passed 
as a method argument, for example.
BLOB to text conversion is costly, especially when the storage encoding is not 
UTF-8 (unicode to unicode conversion is quite efficient)
there is not inherent size advantage compared to text.
BLOB assumes that a byte is your atomic data unit, which is not true for Unicode
you are denied access to essential text commands such as position, match regex, 
replace string, etc.

but if things are working well,
of course you don't need to convert them.
**
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: unicode, blobs, and the web

2019-08-28 Thread webmaster namethatplant.net via 4D_Tech
A belated thank you, Miyako, for the clarification. I didn't think that was 
enough text to have filled up 2 GB, but I didn't know how to calculate its size.

I still have a question.
The write-up for PROCESS HTML TAGS contains the warning, "It is now highly 
inadvisable to store texts in BLOBs".

I don't use the command "PROCESS HTML TAGS" but I have used BLOB variables in 
several other places throughout my website. These don't contain any 
troublemaker characters and they are working well.

Do I need to convert them to text variables?

Thank you!
Janie

--
>2019/08/10 2:35, webmaster namethatplant.net via 
>4D_Tech <4d_tech@lists.4d.com>ÇÃ ÉÅÅ[Éã:
>
>I didn't see that important detail mentioned anywhere but in reference to that 
>one command, but, since text variables can now hold up to 2 GB, I tried 
>substituting a text variable.
>Doing that did clear up the multiplication sign issue, but now the data was 
>truncated. Only about 3000 line items were displayed.
>Would 4000 line items be enough to overflow 2 GB? (Each line item contains 
>only around 300 characters.)

no.

if each line contains 300 characters, and there are about 3,000 lines, you'd 
have 900,000 characters.
2GB is 2*1024*1024*1024 bytes. a unicode code point in UTF-8 is 1 to 6 bytes in 
theory,
although 5 and 6 byte characters are not defined yet. so you have plenty of 
room.

that said, BLOB to text conversion (4D legacy encoding a.k.a. MacRoman to 
Unicode) is limited to 32,000 bytes, for backward compatibility.

https://doc.4d.com/4Dv16/4D/16.6/BLOB-to-text.301-4445240.en.html
in other words, C_TEXT can hold up to a quarter billion bytes but the 
conversion from BLOB to text stops at 32k bytes. ...
**
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

Re: unicode, blobs, and the web

2019-08-14 Thread Keisuke Miyako via 4D_Tech


2019/08/10 2:35、webmaster namethatplant.net via 
4D_Tech <4d_tech@lists.4d.com>のメール:

I didn't see that important detail mentioned anywhere but in reference to that 
one command, but, since text variables can now hold up to 2 GB, I tried 
substituting a text variable.

Doing that did clear up the multiplication sign issue, but now the data was 
truncated. Only about 3000 line items were displayed.

Would 4000 line items be enough to overflow 2 GB? (Each line item contains only 
around 300 characters.)

no.

if each line contains 300 characters, and there are about 3,000 lines, you'd 
have 900,000 characters.
2GB is 2*1024*1024*1024 bytes. a unicode code point in UTF-8 is 1 to 6 bytes in 
theory,
although 5 and 6 byte characters are not defined yet. so you have plenty of 
room.

that said, BLOB to text conversion (4D legacy encoding a.k.a. MacRoman to 
Unicode) is limited to 32,000 bytes, for backward compatibility.

https://doc.4d.com/4Dv16/4D/16.6/BLOB-to-text.301-4445240.en.html

in other words, C_TEXT can hold up to a quarter billion bytes but the 
conversion from BLOB to text stops at 32k bytes.

you did not clarify how you transitioned from BLOB to text,
but I suspect you used BLOB to text+Mac text without length, or something 
similar,
which would be inappropriate in this context.

https://doc.4d.com/4Dv16/4D/16.6/Convert-to-text.301-795.en.html

I can split the data into two text variables, and that appears to work, but it 
seems like it should be unnecessary.
What are some recommended ways to handle this?



**
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**

unicode, blobs, and the web

2019-08-09 Thread webmaster namethatplant.net via 4D_Tech
v12.6  OS 10.6.8

Hello!
I was recently made aware of the fact that, on some web pages, a multiplication 
sign was displaying as what looked like a square root symbol + a lowercase 
accented "o".

The database is running in unicode; the web site is using UTF-8. The pages in 
question are displaying a blob variable that contains about 4000 line items. 
The blob was created using the constant UTF8 text without length.

The only clue I found was a brief note under PROCESS HTML TAGS: "Beginning with 
version 12 of 4D, when you use BLOB type parameters, the command automatically 
considers that the character set used for BLOBs is MacRoman."

I didn't see that important detail mentioned anywhere but in reference to that 
one command, but, since text variables can now hold up to 2 GB, I tried 
substituting a text variable.

Doing that did clear up the multiplication sign issue, but now the data was 
truncated. Only about 3000 line items were displayed.

Would 4000 line items be enough to overflow 2 GB? (Each line item contains only 
around 300 characters.)

I can split the data into two text variables, and that appears to work, but it 
seems like it should be unnecessary.

What are some recommended ways to handle this?

Thank you all in advance -
Janie


Janie Marlow
webmas...@namethatplant.net
Travelers Rest, SC
**
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**