On 4 November 2010 15:11, robert mena <robert.m...@gmail.com> wrote: > Hi, > The core of the code is simply > $fp = fopen('file.tab', 'rb'); > while(!feof($fp)) > { > $line = fgets($fp); > $data = explode("\t", $line); > ... > } > So I try to manipulate the $data[X]. For example $data[0] is supposed to be > numeric so I $n = (int) $data[0] > One other thing if the second column should contain a string. If I check > the string visually it is correct but a if( $data[1] == 'stringX') is false > even if in the file I can see this (and print those two) > I even did a md5 of both and they are different. > I seems to be an encoding issue. Is it safe to use explode with utf8 > strings? > I even tried this code but no match found (jst to replace the explode) > $str = "abc 文字化け efg"; > $results = array(); > preg_match_all("/\t/u", $str, $results); > var_dump($results[0]); > On Thu, Nov 4, 2010 at 6:33 AM, Richard Quadling <rquadl...@gmail.com> > wrote: >> >> On 3 November 2010 21:42, Alexander Holodny <alexander.holo...@gmail.com> >> wrote: >> > To exclude unexcepted behavior in case of wrongly formated input data, >> > it would be much better to use such type-casting method: >> > intval(ltrim(trim($inStr), '0')) >> > >> > 2010/11/3, Nicholas Kell <n...@monkeyknight.com>: >> >> >> >> On Nov 3, 2010, at 4:22 PM, robert mena wrote: >> >> >> >>> Hi, >> >>> >> >>> I have a text file (utf-8 encoded) which contains lines with numbers >> >>> and >> >>> text separated by \t. I need to convert the numbers that contains 0 >> >>> (at >> >>> left) to integers. >> >>> >> >>> For some reason one line that contains 00000002 is casted to 0 instead >> >>> of >> >>> 2. >> >>> Bellow the output of the cast (int) $field[0] where I get this from >> >>> explode each line. >> >>> >> >>> 0 00000002 >> >>> 4 00000004 >> >> >> >> >> >> >> >> My first guess is wondering how you are grabbing the strings from the >> >> file. >> >> Seems to me like it would just drop the zeros on the left by default. >> >> Are >> >> you including the \t in the string by accident? If so, that may be >> >> hosing >> >> it. Otherwise, have you tried ltrim on it? >> >> >> >> Ex: >> >> >> >> $_castableString = ltrim($_yourString, '0'); >> >> >> >> // Now cast >> >> <?php >> // Create test file. >> $s_TabbedFilename = './test.tab'; >> file_put_contents($s_TabbedFilename, "0\t00000002" . PHP_EOL . >> "4\t00000004" . PHP_EOL); >> >> // Open test file. >> $fp_TabbedFile = fopen($s_TabbedFilename, 'rt') or die("Could not open >> {$s_TabbedFilename}\n"); >> >> // Iterate file. >> while(True) >> { >> if (False !== ($a_Line = fgetcsv($fp_TabbedFile, 0, "\t"))) >> { >> var_dump($a_Line); >> foreach($a_Line as $i_Index => $m_Value) >> { >> $a_Line[$i_Index] = intval($m_Value); >> } >> var_dump($a_Line); >> } >> else >> { >> break; >> } >> } >> >> // Close the file. >> fclose($fp_TabbedFile); >> >> // Delete the file. >> unlink($s_TabbedFilename); >> >> >> outputs ... >> >> array(2) { >> [0]=> >> string(1) "0" >> [1]=> >> string(8) "00000002" >> } >> array(2) { >> [0]=> >> int(0) >> [1]=> >> int(2) >> } >> array(2) { >> [0]=> >> string(1) "4" >> [1]=> >> string(8) "00000004" >> } >> array(2) { >> [0]=> >> int(4) >> [1]=> >> int(4) >> } >> >> intval() operates as standard on base 10, so no need to worry about >> leading zeros' being thought of as base8/octal. >> >> What is your code? Can you reduce it to something as small like the >> above to see if you can repeat the issue?
Please don't top post. With regards to utf-8 data, no, PHP is not unicode aware. If a multi-byte character is comprised of a 0x09 byte, then it will be broken. Can you supply the file you are working on? b64encode it and drop it into a pastebin. -- Richard Quadling Twitter : EE : Zend @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php