function readyForDOM_report($originalReportAsText) {
  return str_replace ('<th', '<th class="transportTH"', $originalReportAsText);

$dom = new DOMDocument();
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach($rows as $row){
   foreach($row->childNodes as $node)
        // check $node for having a classname 'transportTH'.

the only problem i foresee is <th>s in your reports already having a
class="something" set, which could mess it up. you'd need to check
that. but in that case you can always pump the original $str to the
DOM, and use multiple $k's from foreach ($arr as $k=>$v) to get to the
corresponding node, and have the original class name.

On Thu, Mar 11, 2010 at 9:52 PM, Andy Theuninck <> wrote:
> I could could, but that would kind of defeat the point of the project
> (I'm trying to capture a bunch of existing HTML reports via output
> buffering and transform the tables into proper XLS. Tweaking every
> single report is exactly what I'm trying to avoid).
> On Thu, Mar 11, 2010 at 2:45 PM, Rene Veerman <> wrote:
>> hmm lame bug... but you can add a classname to the <th>s and check for 
>> that?..
>> On Thu, Mar 11, 2010 at 9:34 PM, Andy Theuninck <> wrote:
>>> I'm trying to parse a string containing an HTML table using the
>>> builtin DOM classes and running into an odd problem.
>>> Here's what I'm doing:
>>> $dom = new DOMDocument();
>>> $dom->loadHTML($str);
>>> $tables = $dom->getElementsByTagName("table");
>>> $rows = $tables->item(0)->getElementsByTagName('tr');
>>> foreach($rows as $row){
>>>    foreach($row->childNodes as $node)
>>>         // stuff
>>> }
>>> This gives me the row elements in order and access to their contents.
>>> The weird part is $node always appears to be a td tag - even when it's
>>> a th tag in the original string (DOMElement::tagName is always "td"
>>> (as well as DOMNode::nodeName and DOMNode::localName)). The th tags
>>> definitely aren't being omitted; I still get nodes with their
>>> contents, just with the wrong tag name.
>>> Is there any way to override this behavior so that I can distinguish
>>> between td tags and th tags?
