[PHP] RSS Feed Accented Characters

2011-09-30 Thread Ron Piggott

I am trying to set up an RSS Feed in the Spanish language using a PHP cron job. 
 I am unsure of how to deal with accented letters.

An example: 

This syntax:

?php

$rss_content .= description . htmlentities(El Versículo del Día) . 
/description\r\n;

?

Outputs:


descriptionEl Versiacute;culo del Diacute;a/description


When I use an RSS Feed validator I receive the error message

This feed does not validate.

  a.. line 24, column 20: XML parsing error: unknown:24:20: undefined entity

I suspect the “;” is the issue, although it is needed for the accented letters. 
 If I don’t use htmlentities() the accented characters can’t be viewed, they 
become a “?”  How should I proceed?

Ron




www.TheVerseOfTheDay.info 


RE: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Jen Rasmussen
Would this work?

$content = El Versiacute;culo del Diacute;a;
$rss_content .= description . $content . /description\r\n;

Cheers!
Jen



-Original Message-
From: Ron Piggott [mailto:ron@actsministries.org] 
Sent: Friday, September 30, 2011 11:26 AM
To: php-general@lists.php.net
Subject: [PHP] RSS Feed Accented Characters


I am trying to set up an RSS Feed in the Spanish language using a PHP cron job. 
 I am unsure of how to deal with accented letters.

An example: 

This syntax:

?php

$rss_content .= description . htmlentities(El Versículo del Día) . 
/description\r\n;

?

Outputs:


descriptionEl Versiacute;culo del Diacute;a/description


When I use an RSS Feed validator I receive the error message

This feed does not validate.

  a.. line 24, column 20: XML parsing error: unknown:24:20: undefined entity

I suspect the “;” is the issue, although it is needed for the accented letters. 
 If I don’t use htmlentities() the accented characters can’t be viewed, they 
become a “?”  How should I proceed?

Ron




www.TheVerseOfTheDay.info 


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Richard Quadling
On 30 September 2011 17:41, Jen Rasmussen j...@cetaceasound.com wrote:
 Would this work?

 $content = El Versiacute;culo del Diacute;a;
 $rss_content .= description . $content . /description\r\n;

 Cheers!
 Jen

The entities are HTML entities. They are not XML entities.

If they are displayed as ? then it is an encoding issue.

What encoding are you using?

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Jen Rasmussen
Whoops! Forgive my try at it :)

-Original Message-
From: Richard Quadling [mailto:rquadl...@gmail.com] 
Sent: Friday, September 30, 2011 11:47 AM
To: j...@cetaceasound.com
Cc: Ron Piggott; php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:41, Jen Rasmussen j...@cetaceasound.com wrote:
 Would this work?

 $content = El Versiacute;culo del Diacute;a;
 $rss_content .= description . $content . /description\r\n;

 Cheers!
 Jen

The entities are HTML entities. They are not XML entities.

If they are displayed as ? then it is an encoding issue.

What encoding are you using?

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Ron Piggott


-Original Message- 
From: Richard Quadling

Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:26, Ron Piggott ron@actsministries.org wrote:


I am trying to set up an RSS Feed in the Spanish language using a PHP cron 
job.  I am unsure of how to deal with accented letters.


An example:

This syntax:

?php

$rss_content .= description . htmlentities(El Versículo del Día) . 
/description\r\n;


?

Outputs:


descriptionEl Versiacute;culo del Diacute;a/description


When I use an RSS Feed validator I receive the error message

This feed does not validate.

 a.. line 24, column 20: XML parsing error: unknown:24:20: undefined 
entity


I suspect the “;” is the issue, although it is needed for the accented 
letters.  If I don’t use htmlentities() the accented characters can’t be 
viewed, they become a “?”  How should I proceed?


Ron


Make sure you have ...

?xml version=1.0 encode=UTF-8?

as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




Hi Richard:

Having  ?xml version=1.0 encoding=UTF-8?  as the starting 
line didn't correct the problem.


The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

There are a variety of errors related to accented characters while using a 
feed valuator

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

- Also While viewing the feed in Firefox once the first accented character 
is displayed none of the rest of the feed is visible, except by right 
clicking and view source


The RSS Feed content will be populated by a database query.  The database 
columns are set to utf8_unicode_ci


How should I proceed?
Ron 



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Richard Quadling
On 30 September 2011 18:22, Ron Piggott ron@actsministries.org wrote:

 -Original Message- From: Richard Quadling
 Sent: Friday, September 30, 2011 12:31 PM
 To: Ron Piggott
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] RSS Feed Accented Characters

 On 30 September 2011 17:26, Ron Piggott ron@actsministries.org wrote:

 I am trying to set up an RSS Feed in the Spanish language using a PHP cron
 job.  I am unsure of how to deal with accented letters.

 An example:

 This syntax:

 ?php

 $rss_content .= description . htmlentities(El Versículo del Día) .
 /description\r\n;

 ?

 Outputs:


 descriptionEl Versiacute;culo del Diacute;a/description


 When I use an RSS Feed validator I receive the error message

 This feed does not validate.

  a.. line 24, column 20: XML parsing error: unknown:24:20: undefined
 entity

 I suspect the “;” is the issue, although it is needed for the accented
 letters.  If I don’t use htmlentities() the accented characters can’t be
 viewed, they become a “?”  How should I proceed?

 Ron

 Make sure you have ...

 ?xml version=1.0 encode=UTF-8?

 as the first line of the output. That tells the reader that the file
 is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
 that they say the encoding is UTF-8 and not a codepage.

 Go UTF-8 everywhere.


 --
 Richard Quadling
 Twitter : EE : Zend : PHPDoc
 @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




 Hi Richard:

 Having      ?xml version=1.0 encoding=UTF-8?      as the starting
 line didn't correct the problem.

 The RSS Feed is @
 http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

 There are a variety of errors related to accented characters while using a
 feed valuator
 http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

 - Also While viewing the feed in Firefox once the first accented character
 is displayed none of the rest of the feed is visible, except by right
 clicking and view source

 The RSS Feed content will be populated by a database query.  The database
 columns are set to utf8_unicode_ci

 How should I proceed?
 Ron


The byte sequence that is being received is just 0xED.

php -r file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use ?xml version=1.0 encoding=ISO-8859-1? as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.

?php
$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer Requests';

$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);

$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?

outputs ...

string(63) El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests
string(65) El Versículo del Día: Pray For Others: Incoming Prayer Requests
string(63) El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests
string(65) El Versículo del Día: Pray For Others: Incoming Prayer Requests

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Ron Piggott









www.TheVerseOfTheDay.info

-Original Message- 
From: Richard Quadling

Sent: Friday, September 30, 2011 2:53 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 18:22, Ron Piggott ron@actsministries.org wrote:


-Original Message- From: Richard Quadling
Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:26, Ron Piggott ron@actsministries.org 
wrote:


I am trying to set up an RSS Feed in the Spanish language using a PHP 
cron

job.  I am unsure of how to deal with accented letters.

An example:

This syntax:

?php

$rss_content .= description . htmlentities(El Versículo del Día) .
/description\r\n;

?

Outputs:


descriptionEl Versiacute;culo del Diacute;a/description


When I use an RSS Feed validator I receive the error message

This feed does not validate.

 a.. line 24, column 20: XML parsing error: unknown:24:20: undefined
entity

I suspect the “;” is the issue, although it is needed for the accented
letters.  If I don’t use htmlentities() the accented characters can’t be
viewed, they become a “?”  How should I proceed?

Ron


Make sure you have ...

?xml version=1.0 encode=UTF-8?

as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




Hi Richard:

Having  ?xml version=1.0 encoding=UTF-8?  as the starting
line didn't correct the problem.

The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

There are a variety of errors related to accented characters while using a
feed valuator
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

- Also While viewing the feed in Firefox once the first accented character
is displayed none of the rest of the feed is visible, except by right
clicking and view source

The RSS Feed content will be populated by a database query.  The database
columns are set to utf8_unicode_ci

How should I proceed?
Ron



The byte sequence that is being received is just 0xED.

php -r file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use ?xml version=1.0 encoding=ISO-8859-1? as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.

?php
$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer 
Requests';


$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);

$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?

outputs ...

string(63) El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests
string(65) El Versículo del Día: Pray For Others: Incoming Prayer Requests
string(63) El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests
string(65) El Versículo del Día: Pray For Others: Incoming Prayer Requests

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea


Richard I was unaware of the

utf8_encode

command.  Thank you very much --- this now works.  Now I may continue with 
the translation into Spanish.


Ron 



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php