Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Ron Piggott









www.TheVerseOfTheDay.info

-Original Message- 
From: Richard Quadling

Sent: Friday, September 30, 2011 2:53 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 18:22, Ron Piggott  wrote:


-Original Message- From: Richard Quadling
Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:26, Ron Piggott  
wrote:


I am trying to set up an RSS Feed in the Spanish language using a PHP 
cron

job.  I am unsure of how to deal with accented letters.

An example:

This syntax:

" . htmlentities("El Versículo del Día") .
"\r\n";

?>

Outputs:


El Versículo del Día


When I use an RSS Feed validator I receive the error message

This feed does not validate.

 a.. line 24, column 20: XML parsing error: :24:20: undefined
entity

I suspect the “;” is the issue, although it is needed for the accented
letters.  If I don’t use htmlentities() the accented characters can’t be
viewed, they become a “?”  How should I proceed?

Ron


Make sure you have ...



as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




Hi Richard:

Having "  " as the starting
line didn't correct the problem.

The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

There are a variety of errors related to accented characters while using a
feed valuator
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

- Also While viewing the feed in Firefox once the first accented character
is displayed none of the rest of the feed is visible, except by right
clicking and "view source"

The RSS Feed content will be populated by a database query.  The database
columns are set to utf8_unicode_ci

How should I proceed?
Ron



The byte sequence that is being received is just 0xED.

php -r "file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));"

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use  as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.

$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer 
Requests';


$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);

$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?>

outputs ...

string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea


Richard I was unaware of the

utf8_encode

command.  Thank you very much --- this now works.  Now I may continue with 
the translation into Spanish.


Ron 



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Richard Quadling
On 30 September 2011 18:22, Ron Piggott  wrote:
>
> -Original Message- From: Richard Quadling
> Sent: Friday, September 30, 2011 12:31 PM
> To: Ron Piggott
> Cc: php-general@lists.php.net
> Subject: Re: [PHP] RSS Feed Accented Characters
>
> On 30 September 2011 17:26, Ron Piggott  wrote:
>>
>> I am trying to set up an RSS Feed in the Spanish language using a PHP cron
>> job.  I am unsure of how to deal with accented letters.
>>
>> An example:
>>
>> This syntax:
>>
>> >
>> $rss_content .= "" . htmlentities("El Versículo del Día") .
>> "\r\n";
>>
>> ?>
>>
>> Outputs:
>>
>>
>> El Versículo del Día
>>
>>
>> When I use an RSS Feed validator I receive the error message
>>
>> This feed does not validate.
>>
>>  a.. line 24, column 20: XML parsing error: :24:20: undefined
>> entity
>>
>> I suspect the “;” is the issue, although it is needed for the accented
>> letters.  If I don’t use htmlentities() the accented characters can’t be
>> viewed, they become a “?”  How should I proceed?
>>
>> Ron
>
> Make sure you have ...
>
> 
>
> as the first line of the output. That tells the reader that the file
> is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
> that they say the encoding is UTF-8 and not a codepage.
>
> Go UTF-8 everywhere.
>
>
> --
> Richard Quadling
> Twitter : EE : Zend : PHPDoc
> @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea
>
>
>
>
> Hi Richard:
>
> Having "          " as the starting
> line didn't correct the problem.
>
> The RSS Feed is @
> http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml
>
> There are a variety of errors related to accented characters while using a
> feed valuator
> http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml
>
> - Also While viewing the feed in Firefox once the first accented character
> is displayed none of the rest of the feed is visible, except by right
> clicking and "view source"
>
> The RSS Feed content will be populated by a database query.  The database
> columns are set to utf8_unicode_ci
>
> How should I proceed?
> Ron
>

The byte sequence that is being received is just 0xED.

php -r "file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));"

This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).

So as I see it you have 1 choice.

Either use  as the XML tag
or convert the encoded data to UTF-8.

It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.

I would recommend doing that first.

That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.

I'd also be looking at the app that inputs the data into the DB initially.

To convert the text, here are 2 examples. I'm sure there are more ways.



outputs ...

string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"

notice that the correct strings are 2 bytes longer?

The í is encoded as 0xC3AD or U+00ED.

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Ron Piggott


-Original Message- 
From: Richard Quadling

Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:26, Ron Piggott  wrote:


I am trying to set up an RSS Feed in the Spanish language using a PHP cron 
job.  I am unsure of how to deal with accented letters.


An example:

This syntax:

$rss_content .= "" . htmlentities("El Versículo del Día") . 
"\r\n";


?>

Outputs:


El Versículo del Día


When I use an RSS Feed validator I receive the error message

This feed does not validate.

 a.. line 24, column 20: XML parsing error: :24:20: undefined 
entity


I suspect the “;” is the issue, although it is needed for the accented 
letters.  If I don’t use htmlentities() the accented characters can’t be 
viewed, they become a “?”  How should I proceed?


Ron


Make sure you have ...



as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea




Hi Richard:

Having "  " as the starting 
line didn't correct the problem.


The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml

There are a variety of errors related to accented characters while using a 
feed valuator

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml

- Also While viewing the feed in Firefox once the first accented character 
is displayed none of the rest of the feed is visible, except by right 
clicking and "view source"


The RSS Feed content will be populated by a database query.  The database 
columns are set to utf8_unicode_ci


How should I proceed?
Ron 



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Jen Rasmussen
Whoops! Forgive my try at it :)

-Original Message-
From: Richard Quadling [mailto:rquadl...@gmail.com] 
Sent: Friday, September 30, 2011 11:47 AM
To: j...@cetaceasound.com
Cc: Ron Piggott; php-general@lists.php.net
Subject: Re: [PHP] RSS Feed Accented Characters

On 30 September 2011 17:41, Jen Rasmussen  wrote:
> Would this work?
>
> $content = "El Versículo del Día";
> $rss_content .= "" . $content . "\r\n";
>
> Cheers!
> Jen

The entities are HTML entities. They are not XML entities.

If they are displayed as ? then it is an encoding issue.

What encoding are you using?

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Richard Quadling
On 30 September 2011 17:41, Jen Rasmussen  wrote:
> Would this work?
>
> $content = "El Versículo del Día";
> $rss_content .= "" . $content . "\r\n";
>
> Cheers!
> Jen

The entities are HTML entities. They are not XML entities.

If they are displayed as ? then it is an encoding issue.

What encoding are you using?

-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Jen Rasmussen
Would this work?

$content = "El Versículo del Día";
$rss_content .= "" . $content . "\r\n";

Cheers!
Jen



-Original Message-
From: Ron Piggott [mailto:ron@actsministries.org] 
Sent: Friday, September 30, 2011 11:26 AM
To: php-general@lists.php.net
Subject: [PHP] RSS Feed Accented Characters


I am trying to set up an RSS Feed in the Spanish language using a PHP cron job. 
 I am unsure of how to deal with accented letters.

An example: 

This syntax:

" . htmlentities("El Versículo del Día") . 
"\r\n";

?>

Outputs:


El Versículo del Día


When I use an RSS Feed validator I receive the error message

This feed does not validate.

  a.. line 24, column 20: XML parsing error: :24:20: undefined entity

I suspect the “;” is the issue, although it is needed for the accented letters. 
 If I don’t use htmlentities() the accented characters can’t be viewed, they 
become a “?”  How should I proceed?

Ron




www.TheVerseOfTheDay.info 


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] RSS Feed Accented Characters

2011-09-30 Thread Richard Quadling
On 30 September 2011 17:26, Ron Piggott  wrote:
>
> I am trying to set up an RSS Feed in the Spanish language using a PHP cron 
> job.  I am unsure of how to deal with accented letters.
>
> An example:
>
> This syntax:
>
> 
> $rss_content .= "" . htmlentities("El Versículo del Día") . 
> "\r\n";
>
> ?>
>
> Outputs:
>
>
> El Versículo del Día
>
>
> When I use an RSS Feed validator I receive the error message
>
> This feed does not validate.
>
>  a.. line 24, column 20: XML parsing error: :24:20: undefined entity
>
> I suspect the “;” is the issue, although it is needed for the accented 
> letters.  If I don’t use htmlentities() the accented characters can’t be 
> viewed, they become a “?”  How should I proceed?
>
> Ron

Make sure you have ...



as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.

Go UTF-8 everywhere.


-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php