Re: [PHP] Character encoding hell

2010-10-27 Thread Marc Guay
 Have you tried using the utf8 meta tag rather than using the htmlentities()
 function? That should solve the first issue, as I reckon the problem lies
 with the way your encoding the filename.

It seems that the filenames are ISO encoded as if I set the meta tag
to ISO and remove the htmlentities() wrapper it displays the character
fine.  Not sure how that helps or if this is even PHP related anymore
but thought I'd follow up.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Character encoding hell

2010-10-26 Thread Marc Guay
Hi folks,

I've got a problem with character encoding that's threatening to kill
my little brain.  Here we go:

I have a directory with a bunch of PDFs in it that my webpage displays
links to.  All of the files have the french character  in them. The
operating system is Linux (I did not experience this problem on a
Windows machine). I don't want to type the display name of these files
twice and the website has no database capability so it takes the
filename, rips off the extension, and runs htmlentities() on it before
displaying to the user.  So far so good.  Now to the anchor's href.
The only encoding method I found which creates a proper link to the
file is rawurlencode(), but the catch is that the filename isn't user
friendly at all.  My question then is what is the best solution to
this problem?  Ideally I would like the link to function and for the
filename to be readable.

Any hope/help is appreciated.

Marc

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Nicholas Kell

On Oct 26, 2010, at 10:56 AM, Marc Guay wrote:

 Hi folks,
 
 I've got a problem with character encoding that's threatening to kill
 my little brain.  Here we go:
 
 I have a directory with a bunch of PDFs in it that my webpage displays
 links to.  All of the files have the french character  in them. The
 operating system is Linux (I did not experience this problem on a
 Windows machine). I don't want to type the display name of these files
 twice and the website has no database capability so it takes the
 filename, rips off the extension, and runs htmlentities() on it before
 displaying to the user.  So far so good.  Now to the anchor's href.
 The only encoding method I found which creates a proper link to the
 file is rawurlencode(), but the catch is that the filename isn't user
 friendly at all.  My question then is what is the best solution to
 this problem?  Ideally I would like the link to function and for the
 filename to be readable.
 
 Any hope/help is appreciated.
 
 Marc
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 


Are you using UTF-8?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Simcha Younger
On Tue, 26 Oct 2010 11:56:17 -0400
Marc Guay marc.g...@gmail.com wrote:

 
 I have a directory with a bunch of PDFs in it that my webpage displays
 links to.  All of the files have the french character  in them. The
 operating system is Linux (I did not experience this problem on a
 Windows machine). I don't want to type the display name of these files
 twice and the website has no database capability 

If you are not constantly adding/changing the files there, you can use a csv 
file in place of a database.


-- 
Simcha Younger sim...@syounger.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 Are you using UTF-8?

Could you be more specific?  Do you mean in the browser/php header or
in the filesystem?  I created the file on a Windows machine,
transferred them to a Linux machine, and the encoding of the page is
UTF-8.

I just noticed a strange thing which might shed some light.  If I just
run htmlentities() on the href, it shows this in the browser URL:

LE CHÂT.pdf

But the browser returns a not found error:

LE%20CH%C3%82T.pdf

It seems like the  character is being misunderstood as Â

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Nicholas Kell

On Oct 26, 2010, at 11:38 AM, Marc Guay wrote:

 Are you using UTF-8?
 
 Could you be more specific?  Do you mean in the browser/php header or
 in the filesystem?  I created the file on a Windows machine,
 transferred them to a Linux machine, and the encoding of the page is
 UTF-8.
 
 I just noticed a strange thing which might shed some light.  If I just
 run htmlentities() on the href, it shows this in the browser URL:
 
 LE CHÂT.pdf
 
 But the browser returns a not found error:
 
 LE%20CH%C3%82T.pdf
 
 It seems like the  character is being misunderstood as Â


I apologize for the vagueness. I was referring to the browser/php header or a 
meta tag. 

Something to the effect of this quick copy paste from a site that uses accent 
marks and umlauts: 

meta http-equiv=Content-Type content=text/html; charset=iso-8859-1
 
 If I am understanding correctly, you are referring to a HTML specific issue 
where the HTML and browser configuration is displaying your characters 
improperly?

Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
  If I am understanding correctly, you are referring to a HTML specific issue 
 where the HTML and browser configuration is displaying your characters 
 improperly?

No, the browser is displaying the characters of the filename fine
(using htmlentities converts the ? unknown character into an Â.  The
problem is with the link/href to the file with the special character
in it's name.  I get a 404 not found unless I rawurlencode the href,
turning it into the rather unreadable LE%20CH%C2T.pdf.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
Again, if it helps, a link formatted in the same way to the same file
links correctly on a windows machine.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Nicholas Kell

On Oct 26, 2010, at 12:00 PM, Marc Guay wrote:

 Again, if it helps, a link formatted in the same way to the same file
 links correctly on a windows machine.
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

A windows server, or windows client to the same Linux server? I believe that 
this issue is starting to get a bit over my head, with the different operating 
systems involved and such.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 A windows server, or windows client to the same Linux server? I believe that 
 this issue is starting to get a bit over my head, with the different 
 operating systems involved and such.

Windows server.  This is over my head, too.  I'm guessing that Windows
and Linux encode filenames differently and when I transferred the file
from one to the other, some kind of adjustment was made.

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread a...@ashleysheridan.co.uk
Have you tried using the utf8 meta tag rather than using the htmlentities() 
function? That should solve the first issue, as I reckon the problem lies with 
the way your encoding the filename.

Linux filesystems have far less limitations on filenames, so it could be that 
windows is doing something odd which coincides with what php is doing. I'm not 
at my machine right now to test, but you should be able to pass the filename in 
the url with url_encode and on the server convert it back with url_decode to 
give you the original filename back.

Lastly, have you made sure your php scripts are saved as utf8, as that can 
sometimes solve some odd problems with character encoding.

Thanks,
Ash
http://www.ashleysheridan.co.uk

- Reply message -
From: Marc Guay marc.g...@gmail.com
Date: Tue, Oct 26, 2010 18:00
Subject: [PHP] Character encoding hell
To: php-general php-general@lists.php.net

Again, if it helps, a link formatted in the same way to the same file
links correctly on a windows machine.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 Have you tried using the utf8 meta tag rather than using the htmlentities()
 function? That should solve the first issue, as I reckon the problem lies
 with the way your encoding the filename.

The page is being encoded in UTF-8.  Without htmlentities() the
special character is displayed as a black triangle with a question
mark in it.  Does that indicate that the filename isn't being stored
as UTF-8?

 Lastly, have you made sure your php scripts are saved as utf8, as that can
 sometimes solve some odd problems with character encoding.

This didn't seem to change anything.

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Mari Masuda

On Oct 26, 2010, at 10:10 AM, Marc Guay wrote:

 A windows server, or windows client to the same Linux server? I believe that 
 this issue is starting to get a bit over my head, with the different 
 operating systems involved and such.
 
 Windows server.  This is over my head, too.  I'm guessing that Windows
 and Linux encode filenames differently and when I transferred the file
 from one to the other, some kind of adjustment was made.
 
 Marc

I think one way to do this is something like this (untested):

1.  Put all of your files in some directory on the server.

2.  Change your a href=http://example.com/encoded-file-name.pdf;my file/a 
to a href=http://example.com/download-file.php?fileID=xxx;my file/a where 
xxx is the urlencoded version of encoded-file-name.pdf.  (xxx could also be a 
fileID number if stored in a database.)

3.  In download-file.php do something like this:

?php
  $parent_directory = /path/to/parent/directory/; // can be in or out of web 
root
  if (file_exists($parent_directory . encoded-file-name.pdf)) {
$data = file_get_contents($parent_directory . encoded-file-name.pdf);
$file_name_with_french_chars = rawurldecode(encoded-file-name.pdf);

header(Content-type: application/octet-stream);
header(Content-disposition: Attachment; 
filename=\$file_name_with_french_chars\); // this line assigns the nice 
looking name as the file name
echo $data;
  }
?
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread a...@ashleysheridan.co.uk
Where is the filename coming from? Is it hard-coded in the script or is your 
script reading it from a directory listing?

Have you checked to see if that filename is what you think it is on the Linux 
server?

Was Apache the web server both times, or was iis used on windows? If it was, 
look for any errant .htaccess files causing problems.

Lastly, what happens if you directly request that file from with the browser 
itself, without php scripts in the equation.

Thanks,
Ash
http://www.ashleysheridan.co.uk

- Reply message -
From: Marc Guay marc.g...@gmail.com
Date: Tue, Oct 26, 2010 18:22
Subject: [PHP] Character encoding hell
To: a...@ashleysheridan.co.uk a...@ashleysheridan.co.uk
Cc: php-general php-general@lists.php.net


 Have you tried using the utf8 meta tag rather than using the htmlentities()
 function? That should solve the first issue, as I reckon the problem lies
 with the way your encoding the filename.

The page is being encoded in UTF-8.  Without htmlentities() the
special character is displayed as a black triangle with a question
mark in it.  Does that indicate that the filename isn't being stored
as UTF-8?

 Lastly, have you made sure your php scripts are saved as utf8, as that can
 sometimes solve some odd problems with character encoding.

This didn't seem to change anything.

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 I think one way to do this is something like this (untested):

This is a good idea, but I'm stubborn and believe it can be solved
without adding more code.  Thanks, though, I'll probably end up using
it once I've ruined every other possibility.

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 Where is the filename coming from? Is it hard-coded in the script or is your
 script reading it from a directory listing?

The filename is being read from the file via scandir().  File created
on Windows, transferred to *nix.

 Have you checked to see if that filename is what you think it is on the
 Linux server?

The character is shown as a question mark in putty.  I've tried
forcing a UTF-8 font to make sure it's not a rendering issue but it
didn't seem to make a difference.  I'm not convinced the encoding
changed, though.

 Was Apache the web server both times, or was iis used on windows? If it was,
 look for any errant .htaccess files causing problems.

Both are apache.

 Lastly, what happens if you directly request that file from with the browser
 itself, without php scripts in the equation.

If I request the file directly from the Windows server, it opens Adobe
Acrobat.  If I request the same file directly from the Linux server, I
get the 404 File Not Found.

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Bastien Koert
On Tue, Oct 26, 2010 at 1:32 PM, Mari Masuda mbmas...@stanford.edu wrote:

 On Oct 26, 2010, at 10:10 AM, Marc Guay wrote:

 A windows server, or windows client to the same Linux server? I believe 
 that this issue is starting to get a bit over my head, with the different 
 operating systems involved and such.

 Windows server.  This is over my head, too.  I'm guessing that Windows
 and Linux encode filenames differently and when I transferred the file
 from one to the other, some kind of adjustment was made.

 Marc

 I think one way to do this is something like this (untested):

 1.  Put all of your files in some directory on the server.

 2.  Change your a href=http://example.com/encoded-file-name.pdf;my 
 file/a to a href=http://example.com/download-file.php?fileID=xxx;my 
 file/a where xxx is the urlencoded version of encoded-file-name.pdf.  
 (xxx could also be a fileID number if stored in a database.)

 3.  In download-file.php do something like this:

 ?php
  $parent_directory = /path/to/parent/directory/; // can be in or out of web 
 root
  if (file_exists($parent_directory . encoded-file-name.pdf)) {
    $data = file_get_contents($parent_directory . encoded-file-name.pdf);
    $file_name_with_french_chars = rawurldecode(encoded-file-name.pdf);

    header(Content-type: application/octet-stream);
    header(Content-disposition: Attachment; 
 filename=\$file_name_with_french_chars\); // this line assigns the nice 
 looking name as the file name
    echo $data;
  }
 ?
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



This approach is what I wanted to suggest as well. You can simulated a
db with an XML file if you wanted to and even assign the IDs as
numerical to make life really easy

-- 

Bastien

Cat, the other other white meat

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Steve Staples
  Have you checked to see if that filename is what you think it is on the
  Linux server?
 
 The character is shown as a question mark in putty.  I've tried
 forcing a UTF-8 font to make sure it's not a rendering issue but it
 didn't seem to make a difference.  I'm not convinced the encoding
 changed, though.

You say that in putty it is converted to a '?'?  so, on linux, the file
name is no longer what you intended it to be, so wouldn't you then need
to call the file EXACTLY as it is on the linux server?

maybe storing a non-utf8 filename is not the way to go?   it looks to
me, that if the filename was fileÂ.pdf on windose, and is now file?.pdf
on linux, no matter how much encoding you're going to do, you will never
be able to reference the file on linux with fileÂ.pdf as it is now
file?.pdf

maybe i am just talking out my ass here... i have a tendency to do that
once in a while :)

side note:  I had a script that was ported from windose to linux, and
the guy who created it, used capitals in his file names, but referred to
them in all lower case.   windose and apache didn't care, it would just
serve the page... ThisPage.php was the same as thispage.php... when we
moved it to linux, non of the damned links worked...so we had to do a
bunch of changes... 

Steve


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Character encoding hell

2010-10-26 Thread Marc Guay
 You say that in putty it is converted to a '?'?  so, on linux, the file
 name is no longer what you intended it to be, so wouldn't you then need
 to call the file EXACTLY as it is on the linux server?

I thought this too at first, but if I run htmlentites() on the
filename it displays the  character so it must not have been lost
completely, just encoded in a different way?  I'm quite sure that the
propblem with putty displaying it as a question mark is related to its
display settings.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php