Re: [PHP] Character encoding hell
Have you tried using the utf8 meta tag rather than using the htmlentities() function? That should solve the first issue, as I reckon the problem lies with the way your encoding the filename. It seems that the filenames are ISO encoded as if I set the meta tag to ISO and remove the htmlentities() wrapper it displays the character fine. Not sure how that helps or if this is even PHP related anymore but thought I'd follow up. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Character encoding hell
Hi folks, I've got a problem with character encoding that's threatening to kill my little brain. Here we go: I have a directory with a bunch of PDFs in it that my webpage displays links to. All of the files have the french character  in them. The operating system is Linux (I did not experience this problem on a Windows machine). I don't want to type the display name of these files twice and the website has no database capability so it takes the filename, rips off the extension, and runs htmlentities() on it before displaying to the user. So far so good. Now to the anchor's href. The only encoding method I found which creates a proper link to the file is rawurlencode(), but the catch is that the filename isn't user friendly at all. My question then is what is the best solution to this problem? Ideally I would like the link to function and for the filename to be readable. Any hope/help is appreciated. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Oct 26, 2010, at 10:56 AM, Marc Guay wrote: Hi folks, I've got a problem with character encoding that's threatening to kill my little brain. Here we go: I have a directory with a bunch of PDFs in it that my webpage displays links to. All of the files have the french character  in them. The operating system is Linux (I did not experience this problem on a Windows machine). I don't want to type the display name of these files twice and the website has no database capability so it takes the filename, rips off the extension, and runs htmlentities() on it before displaying to the user. So far so good. Now to the anchor's href. The only encoding method I found which creates a proper link to the file is rawurlencode(), but the catch is that the filename isn't user friendly at all. My question then is what is the best solution to this problem? Ideally I would like the link to function and for the filename to be readable. Any hope/help is appreciated. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php Are you using UTF-8? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Tue, 26 Oct 2010 11:56:17 -0400 Marc Guay marc.g...@gmail.com wrote: I have a directory with a bunch of PDFs in it that my webpage displays links to. All of the files have the french character  in them. The operating system is Linux (I did not experience this problem on a Windows machine). I don't want to type the display name of these files twice and the website has no database capability If you are not constantly adding/changing the files there, you can use a csv file in place of a database. -- Simcha Younger sim...@syounger.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Are you using UTF-8? Could you be more specific? Do you mean in the browser/php header or in the filesystem? I created the file on a Windows machine, transferred them to a Linux machine, and the encoding of the page is UTF-8. I just noticed a strange thing which might shed some light. If I just run htmlentities() on the href, it shows this in the browser URL: LE CHÂT.pdf But the browser returns a not found error: LE%20CH%C3%82T.pdf It seems like the  character is being misunderstood as  -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Oct 26, 2010, at 11:38 AM, Marc Guay wrote: Are you using UTF-8? Could you be more specific? Do you mean in the browser/php header or in the filesystem? I created the file on a Windows machine, transferred them to a Linux machine, and the encoding of the page is UTF-8. I just noticed a strange thing which might shed some light. If I just run htmlentities() on the href, it shows this in the browser URL: LE CHÂT.pdf But the browser returns a not found error: LE%20CH%C3%82T.pdf It seems like the  character is being misunderstood as  I apologize for the vagueness. I was referring to the browser/php header or a meta tag. Something to the effect of this quick copy paste from a site that uses accent marks and umlauts: meta http-equiv=Content-Type content=text/html; charset=iso-8859-1 If I am understanding correctly, you are referring to a HTML specific issue where the HTML and browser configuration is displaying your characters improperly?
Re: [PHP] Character encoding hell
If I am understanding correctly, you are referring to a HTML specific issue where the HTML and browser configuration is displaying your characters improperly? No, the browser is displaying the characters of the filename fine (using htmlentities converts the ? unknown character into an Â. The problem is with the link/href to the file with the special character in it's name. I get a 404 not found unless I rawurlencode the href, turning it into the rather unreadable LE%20CH%C2T.pdf. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Again, if it helps, a link formatted in the same way to the same file links correctly on a windows machine. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Oct 26, 2010, at 12:00 PM, Marc Guay wrote: Again, if it helps, a link formatted in the same way to the same file links correctly on a windows machine. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php A windows server, or windows client to the same Linux server? I believe that this issue is starting to get a bit over my head, with the different operating systems involved and such. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
A windows server, or windows client to the same Linux server? I believe that this issue is starting to get a bit over my head, with the different operating systems involved and such. Windows server. This is over my head, too. I'm guessing that Windows and Linux encode filenames differently and when I transferred the file from one to the other, some kind of adjustment was made. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Have you tried using the utf8 meta tag rather than using the htmlentities() function? That should solve the first issue, as I reckon the problem lies with the way your encoding the filename. Linux filesystems have far less limitations on filenames, so it could be that windows is doing something odd which coincides with what php is doing. I'm not at my machine right now to test, but you should be able to pass the filename in the url with url_encode and on the server convert it back with url_decode to give you the original filename back. Lastly, have you made sure your php scripts are saved as utf8, as that can sometimes solve some odd problems with character encoding. Thanks, Ash http://www.ashleysheridan.co.uk - Reply message - From: Marc Guay marc.g...@gmail.com Date: Tue, Oct 26, 2010 18:00 Subject: [PHP] Character encoding hell To: php-general php-general@lists.php.net Again, if it helps, a link formatted in the same way to the same file links correctly on a windows machine. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Have you tried using the utf8 meta tag rather than using the htmlentities() function? That should solve the first issue, as I reckon the problem lies with the way your encoding the filename. The page is being encoded in UTF-8. Without htmlentities() the special character is displayed as a black triangle with a question mark in it. Does that indicate that the filename isn't being stored as UTF-8? Lastly, have you made sure your php scripts are saved as utf8, as that can sometimes solve some odd problems with character encoding. This didn't seem to change anything. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Oct 26, 2010, at 10:10 AM, Marc Guay wrote: A windows server, or windows client to the same Linux server? I believe that this issue is starting to get a bit over my head, with the different operating systems involved and such. Windows server. This is over my head, too. I'm guessing that Windows and Linux encode filenames differently and when I transferred the file from one to the other, some kind of adjustment was made. Marc I think one way to do this is something like this (untested): 1. Put all of your files in some directory on the server. 2. Change your a href=http://example.com/encoded-file-name.pdf;my file/a to a href=http://example.com/download-file.php?fileID=xxx;my file/a where xxx is the urlencoded version of encoded-file-name.pdf. (xxx could also be a fileID number if stored in a database.) 3. In download-file.php do something like this: ?php $parent_directory = /path/to/parent/directory/; // can be in or out of web root if (file_exists($parent_directory . encoded-file-name.pdf)) { $data = file_get_contents($parent_directory . encoded-file-name.pdf); $file_name_with_french_chars = rawurldecode(encoded-file-name.pdf); header(Content-type: application/octet-stream); header(Content-disposition: Attachment; filename=\$file_name_with_french_chars\); // this line assigns the nice looking name as the file name echo $data; } ? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Where is the filename coming from? Is it hard-coded in the script or is your script reading it from a directory listing? Have you checked to see if that filename is what you think it is on the Linux server? Was Apache the web server both times, or was iis used on windows? If it was, look for any errant .htaccess files causing problems. Lastly, what happens if you directly request that file from with the browser itself, without php scripts in the equation. Thanks, Ash http://www.ashleysheridan.co.uk - Reply message - From: Marc Guay marc.g...@gmail.com Date: Tue, Oct 26, 2010 18:22 Subject: [PHP] Character encoding hell To: a...@ashleysheridan.co.uk a...@ashleysheridan.co.uk Cc: php-general php-general@lists.php.net Have you tried using the utf8 meta tag rather than using the htmlentities() function? That should solve the first issue, as I reckon the problem lies with the way your encoding the filename. The page is being encoded in UTF-8. Without htmlentities() the special character is displayed as a black triangle with a question mark in it. Does that indicate that the filename isn't being stored as UTF-8? Lastly, have you made sure your php scripts are saved as utf8, as that can sometimes solve some odd problems with character encoding. This didn't seem to change anything. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
I think one way to do this is something like this (untested): This is a good idea, but I'm stubborn and believe it can be solved without adding more code. Thanks, though, I'll probably end up using it once I've ruined every other possibility. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Where is the filename coming from? Is it hard-coded in the script or is your script reading it from a directory listing? The filename is being read from the file via scandir(). File created on Windows, transferred to *nix. Have you checked to see if that filename is what you think it is on the Linux server? The character is shown as a question mark in putty. I've tried forcing a UTF-8 font to make sure it's not a rendering issue but it didn't seem to make a difference. I'm not convinced the encoding changed, though. Was Apache the web server both times, or was iis used on windows? If it was, look for any errant .htaccess files causing problems. Both are apache. Lastly, what happens if you directly request that file from with the browser itself, without php scripts in the equation. If I request the file directly from the Windows server, it opens Adobe Acrobat. If I request the same file directly from the Linux server, I get the 404 File Not Found. Marc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
On Tue, Oct 26, 2010 at 1:32 PM, Mari Masuda mbmas...@stanford.edu wrote: On Oct 26, 2010, at 10:10 AM, Marc Guay wrote: A windows server, or windows client to the same Linux server? I believe that this issue is starting to get a bit over my head, with the different operating systems involved and such. Windows server. This is over my head, too. I'm guessing that Windows and Linux encode filenames differently and when I transferred the file from one to the other, some kind of adjustment was made. Marc I think one way to do this is something like this (untested): 1. Put all of your files in some directory on the server. 2. Change your a href=http://example.com/encoded-file-name.pdf;my file/a to a href=http://example.com/download-file.php?fileID=xxx;my file/a where xxx is the urlencoded version of encoded-file-name.pdf. (xxx could also be a fileID number if stored in a database.) 3. In download-file.php do something like this: ?php $parent_directory = /path/to/parent/directory/; // can be in or out of web root if (file_exists($parent_directory . encoded-file-name.pdf)) { $data = file_get_contents($parent_directory . encoded-file-name.pdf); $file_name_with_french_chars = rawurldecode(encoded-file-name.pdf); header(Content-type: application/octet-stream); header(Content-disposition: Attachment; filename=\$file_name_with_french_chars\); // this line assigns the nice looking name as the file name echo $data; } ? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php This approach is what I wanted to suggest as well. You can simulated a db with an XML file if you wanted to and even assign the IDs as numerical to make life really easy -- Bastien Cat, the other other white meat -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
Have you checked to see if that filename is what you think it is on the Linux server? The character is shown as a question mark in putty. I've tried forcing a UTF-8 font to make sure it's not a rendering issue but it didn't seem to make a difference. I'm not convinced the encoding changed, though. You say that in putty it is converted to a '?'? so, on linux, the file name is no longer what you intended it to be, so wouldn't you then need to call the file EXACTLY as it is on the linux server? maybe storing a non-utf8 filename is not the way to go? it looks to me, that if the filename was fileÂ.pdf on windose, and is now file?.pdf on linux, no matter how much encoding you're going to do, you will never be able to reference the file on linux with fileÂ.pdf as it is now file?.pdf maybe i am just talking out my ass here... i have a tendency to do that once in a while :) side note: I had a script that was ported from windose to linux, and the guy who created it, used capitals in his file names, but referred to them in all lower case. windose and apache didn't care, it would just serve the page... ThisPage.php was the same as thispage.php... when we moved it to linux, non of the damned links worked...so we had to do a bunch of changes... Steve -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Character encoding hell
You say that in putty it is converted to a '?'? so, on linux, the file name is no longer what you intended it to be, so wouldn't you then need to call the file EXACTLY as it is on the linux server? I thought this too at first, but if I run htmlentites() on the filename it displays the  character so it must not have been lost completely, just encoded in a different way? I'm quite sure that the propblem with putty displaying it as a question mark is related to its display settings. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php