php-windows Digest 28 May 2012 04:32:52 -0000 Issue 4048
Topics (messages 30890 through 30895):
Re: Characters in an uploaded text file being corrupted
30890 by: Jacob Kruger
30891 by: Jacob Kruger
30892 by: Niel Archer
30893 by: Jacob Kruger
30894 by: Carl Roett
30895 by: Jacob Kruger
Administrivia:
To subscribe to the digest, e-mail:
php-windows-digest-subscr...@lists.php.net
To unsubscribe from the digest, e-mail:
php-windows-digest-unsubscr...@lists.php.net
To post to the list, e-mail:
php-wind...@lists.php.net
----------------------------------------------------------------------
--- Begin Message ---
Ok, and, FWIW, this test text file is a word document that have saved as a
text file, and when now resaved it using specifically unicode encoding, it
seemd to eliminate this issue, but would have thought there might be a
relatively simple way to handle something like encoding conversion in PHP
itself..?
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message -----
From: "Jacob Kruger" <jac...@mailzone.co.za>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 11:22 AM
Subject: [PHP-WIN] Characters in an uploaded text file being corrupted
Using the following bit of code, I am then saving the contents of an
uploaded text file into a mysql database, but, the issue seems to be that
immediately after retrieving the contents of the temporary path of the
uploaded file, for example, things like ' characters are rendering as ?
characters in the assigned variable..?
//start of code segment
if (isset($_FILES["filContent"])) {
if ($_FILES["filContent"]["type"] == "text/plain") {
$ltContents = file_get_contents($_FILES["filContent"]["tmp_name"]);
}
}
//end of code segment
I am guessing that it's either due to something to do with upload part of
process - form tag is using post method, and has enctype set to
multipart/form-data - or else, it might have something to do with the server
handling character encoding or something..?
This is currently being run/tested under WAMP, on a windows7 64 bit machine,
FWIW.
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
--- End Message ---
--- Begin Message ---
Ok, and, again, it hasn't completely eliminated issue, but has seemed to
sort out certain characters, but, not all of them..?
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message -----
From: "Jacob Kruger" <jac...@mailzone.co.za>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 11:49 AM
Subject: Re: [PHP-WIN] Characters in an uploaded text file being corrupted
Ok, and, FWIW, this test text file is a word document that have saved as a
text file, and when now resaved it using specifically unicode encoding, it
seemd to eliminate this issue, but would have thought there might be a
relatively simple way to handle something like encoding conversion in PHP
itself..?
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message -----
From: "Jacob Kruger" <jac...@mailzone.co.za>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 11:22 AM
Subject: [PHP-WIN] Characters in an uploaded text file being corrupted
Using the following bit of code, I am then saving the contents of an
uploaded text file into a mysql database, but, the issue seems to be that
immediately after retrieving the contents of the temporary path of the
uploaded file, for example, things like ' characters are rendering as ?
characters in the assigned variable..?
//start of code segment
if (isset($_FILES["filContent"])) {
if ($_FILES["filContent"]["type"] == "text/plain") {
$ltContents = file_get_contents($_FILES["filContent"]["tmp_name"]);
}
}
//end of code segment
I am guessing that it's either due to something to do with upload part of
process - form tag is using post method, and has enctype set to
multipart/form-data - or else, it might have something to do with the
server handling character encoding or something..?
This is currently being run/tested under WAMP, on a windows7 64 bit
machine, FWIW.
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
Hi
> Ok, and, FWIW, this test text file is a word document that have saved as a
> text file, and when now resaved it using specifically unicode encoding, it
> seemd to eliminate this issue,
Make sure the MySQL Db, table, field are correctly set for the encoding.
MySQL does not default to UTF-8 so you have to manually set it when
creating the Db/table. IIRC, newer versions can also have the encoding
set per field.
> but would have thought there might be a
> relatively simple way to handle something like encoding conversion in PHP
> itself..?
PHP cannot read minds, unfortunately. There are ways to handle encoding
conversions, but I don't think anyone would call them 'simple'. ;-)
See the Multibyte String extension for one way.
--
Niel Archer
--- End Message ---
--- Begin Message ---
Ok, specifically told that storage field to make use of UTF8_unicode_ci, and
tried re-uploading text file that specifically saved from word using UTF8
encoding, and, the text content is still full of garbage characters..?
Suppose might try something like copying/pasting from text file in notepad,
into something like a textarea field, but, don't think that would always be
the perfect process in this site.
Will try a couple of other options with regard to DB encoding, and see if
come right, but, let's see...
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message -----
From: "Niel Archer" <spam-f...@blueyonder.co.uk>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 4:22 PM
Subject: Re: [PHP-WIN] Characters in an uploaded text file being corrupted
Hi
Ok, and, FWIW, this test text file is a word document that have saved as
a
text file, and when now resaved it using specifically unicode encoding,
it
seemd to eliminate this issue,
Make sure the MySQL Db, table, field are correctly set for the encoding.
MySQL does not default to UTF-8 so you have to manually set it when
creating the Db/table. IIRC, newer versions can also have the encoding
set per field.
but would have thought there might be a
relatively simple way to handle something like encoding conversion in PHP
itself..?
PHP cannot read minds, unfortunately. There are ways to handle encoding
conversions, but I don't think anyone would call them 'simple'. ;-)
See the Multibyte String extension for one way.
--
Niel Archer
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
Do not use Microsoft Word to save files that need to be read-in by other
software. Ever. It *always* finds a way to screw it up. Even if you cut and
paste it into the other program, I've heard cases where Word has put
unprintable control characters into the pasted output, that showed-up as
spaces in the form, then corrupted the output at runtime.
I suggest you open the saved file in NetBeans. In the NetBeans
settings dialogue, enable the options to "show control characters" and
"process file as unicode" (iirc). That will cause NetBeans to display all
characters your font set contains, display a square box for the ones it
doesn't, and display the various icons for control characters.
If column alignment in your file is important (for example a set number of
spaces between tokens and operators), you should be using a monospace font.
The best one I've ever used is available here:
http://code.google.com/p/buddypress-media/downloads/detail?name=ttf-bitstream-vera-1.10.zip
It's also really easy to read.
For text streams in production site, take a look at BP-Media's sanitizer
classes. Unicode to ASCII to HTML entity conversion is a tricky business,
and these functions will ensure you get the right kind of output no matter
what users throw at you. For example, people embedding unicode sequences in
an ASCII stream.
http://code.google.com/p/buddypress-media/source/browse/bp_media/trunk/core/database/class.database.sanitizers.php
^C^
===================================================
On Sun, May 27, 2012 at 8:57 AM, Jacob Kruger <jac...@mailzone.co.za> wrote:
> Ok, specifically told that storage field to make use of UTF8_unicode_ci,
> and tried re-uploading text file that specifically saved from word using
> UTF8 encoding, and, the text content is still full of garbage characters..?
>
> Suppose might try something like copying/pasting from text file in
> notepad, into something like a textarea field, but, don't think that would
> always be the perfect process in this site.
>
> Will try a couple of other options with regard to DB encoding, and see if
> come right, but, let's see...
>
> Jacob Kruger
> Blind Biker
> Skype: BlindZA
> '...fate had broken his body, but not his spirit...'
>
> ----- Original Message ----- From: "Niel Archer" <
> spam-f...@blueyonder.co.uk>
> To: <php-wind...@lists.php.net>
> Sent: Sunday, May 27, 2012 4:22 PM
> Subject: Re: [PHP-WIN] Characters in an uploaded text file being corrupted
>
>
> Hi
>>
>> Ok, and, FWIW, this test text file is a word document that have saved as
>>> a
>>> text file, and when now resaved it using specifically unicode encoding,
>>> it
>>> seemd to eliminate this issue,
>>>
>>
>> Make sure the MySQL Db, table, field are correctly set for the encoding.
>> MySQL does not default to UTF-8 so you have to manually set it when
>> creating the Db/table. IIRC, newer versions can also have the encoding
>> set per field.
>>
>> but would have thought there might be a
>>> relatively simple way to handle something like encoding conversion in PHP
>>> itself..?
>>>
>>
>> PHP cannot read minds, unfortunately. There are ways to handle encoding
>> conversions, but I don't think anyone would call them 'simple'. ;-)
>> See the Multibyte String extension for one way.
>>
>> --
>> Niel Archer
>>
>>
>> --
>> PHP Windows Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>>
>>
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--- End Message ---
--- Begin Message ---
Thanks - will check that out, but only issue is that this will need
end-users to try sort something like this out, and they won't necessarily be
the most technical types as such, and, I also wouldn't have too much control
over what they'd originally used to generate the content.
Am thinking that might be able to handle character set encoding processing a
bit more simply using python, which is currently my other primary
programming language, so might try out something like creating a bit of a
cross platform utility app that they can try use initially to control
something like this before uploading content, but, let's see...
Stay well
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message -----
From: "Carl Roett" <carlro...@gmail.com>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 6:47 PM
Subject: Re: [PHP-WIN] Characters in an uploaded text file being corrupted
Do not use Microsoft Word to save files that need to be read-in by other
software. Ever. It *always* finds a way to screw it up. Even if you cut
and
paste it into the other program, I've heard cases where Word has put
unprintable control characters into the pasted output, that showed-up as
spaces in the form, then corrupted the output at runtime.
I suggest you open the saved file in NetBeans. In the NetBeans
settings dialogue, enable the options to "show control characters" and
"process file as unicode" (iirc). That will cause NetBeans to display all
characters your font set contains, display a square box for the ones it
doesn't, and display the various icons for control characters.
If column alignment in your file is important (for example a set number of
spaces between tokens and operators), you should be using a monospace
font.
The best one I've ever used is available here:
http://code.google.com/p/buddypress-media/downloads/detail?name=ttf-bitstream-vera-1.10.zip
It's also really easy to read.
For text streams in production site, take a look at BP-Media's sanitizer
classes. Unicode to ASCII to HTML entity conversion is a tricky business,
and these functions will ensure you get the right kind of output no matter
what users throw at you. For example, people embedding unicode sequences
in
an ASCII stream.
http://code.google.com/p/buddypress-media/source/browse/bp_media/trunk/core/database/class.database.sanitizers.php
^C^
===================================================
On Sun, May 27, 2012 at 8:57 AM, Jacob Kruger <jac...@mailzone.co.za>
wrote:
Ok, specifically told that storage field to make use of UTF8_unicode_ci,
and tried re-uploading text file that specifically saved from word using
UTF8 encoding, and, the text content is still full of garbage
characters..?
Suppose might try something like copying/pasting from text file in
notepad, into something like a textarea field, but, don't think that
would
always be the perfect process in this site.
Will try a couple of other options with regard to DB encoding, and see if
come right, but, let's see...
Jacob Kruger
Blind Biker
Skype: BlindZA
'...fate had broken his body, but not his spirit...'
----- Original Message ----- From: "Niel Archer" <
spam-f...@blueyonder.co.uk>
To: <php-wind...@lists.php.net>
Sent: Sunday, May 27, 2012 4:22 PM
Subject: Re: [PHP-WIN] Characters in an uploaded text file being
corrupted
Hi
Ok, and, FWIW, this test text file is a word document that have saved
as
a
text file, and when now resaved it using specifically unicode encoding,
it
seemd to eliminate this issue,
Make sure the MySQL Db, table, field are correctly set for the encoding.
MySQL does not default to UTF-8 so you have to manually set it when
creating the Db/table. IIRC, newer versions can also have the encoding
set per field.
but would have thought there might be a
relatively simple way to handle something like encoding conversion in
PHP
itself..?
PHP cannot read minds, unfortunately. There are ways to handle encoding
conversions, but I don't think anyone would call them 'simple'. ;-)
See the Multibyte String extension for one way.
--
Niel Archer
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---