subject:"Re\: \[PHP\] Sequential access of XML nodes."

Re: [PHP] Sequential access of XML nodes.

2011-09-28 Thread Ross McKay

Richard Quadling wrote:

>It seems that the SimpleXMLIterator is perfect for me.
>[...]

Interesting, I forget that's there... I must have a play with it
sometime. Thanks for resurfacing it :)
-- 
Ross McKay, Toronto, NSW Australia
"Let the laddie play wi the knife - he'll learn"
- The Wee Book of Calvin

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Sequential access of XML nodes.

2011-09-28 Thread Richard Quadling

On 27 September 2011 03:38, Ross McKay  wrote:
> On Mon, 26 Sep 2011 14:17:43 -0400, Adam Richardson wrote:
>
>>I believe the XMLReader allows you to pull node by node, and it's really
>>easy to work with:
>>http://www.php.net/manual/en/intro.xmlreader.php
>>
>>In terms of dealing with various forms of compression, I believe you con use
>>the compression streams to handle this:
>>http://stackoverflow.com/questions/1190906/php-open-gzipped-xml
>>http://us3.php.net/manual/en/wrappers.compression.php
>
> +1 here. XMLReader is easy and fast, and will do the job you want albeit
> without the nice foreach(...) loop Richard spec's. You just loop over
> reading the XML and checking the node type, watching the state of your
> stream to see how to handle each iteration.
>
> e.g. (assuming $xml is an open XMLReader, $db is PDO in example)
>
> $text = '';
> $haveRecord = FALSE;
> $records = 0;
>
> // prepare insert statement
> $sql = '
> insert into Product (ID, Product, ...)
> values (:ID, :Product, ...)
> ';
> $cmd = $db->prepare($sql);
>
> // set list of allowable fields and their parameter type
> $fields = array(
>    'ID' => PDO::PARAM_INT,
>    'Product' => PDO::PARAM_STR,
>    ...
> );
>
> while ($xml->read()) {
>    switch ($xml->nodeType) {
>        case XMLReader::ELEMENT:
>            if ($xml->name === 'Product') {
>                // start of Product element,
>                // reset command parameters to empty
>                foreach ($fields as $name => $type) {
>                    $cmd->bindValue(":$name", NULL, PDO::PARAM_NULL);
>                }
>                $haveRecord = TRUE;
>            }
>            $text = '';
>            break;
>
>        case XMLReader::END_ELEMENT:
>            if ($xml->name === 'Product') {
>                // end of Product element, save record
>                if ($haveRecord) {
>                    $result = $cmd->execute();
>                    $records++;
>                }
>                $haveRecord = FALSE;
>            }
>            elseif ($haveRecord) {
>                // still inside a Product element,
>                // record field value and move on
>                $name = $xml->name;
>                if (array_key_exists($name, $fields)) {
>                    $cmd->bindValue(":$name", $text, $fields[$name]);
>                }
>            }
>            $text = '';
>            break;
>
>        case XMLReader::TEXT:
>        case XMLReader::CDATA:
>            // record value (or part value) of text or cdata node
>            $text .= $xml->value;
>            break;
>
>        default:
>            break;
>    }
> }
>
> return $records;

Thanks for all of that.

It seems that the SimpleXMLIterator is perfect for me.

I need to see if the documents I'm needing to process have multiple
namespaces. If they have do, then I'm not exactly sure what to do at
this stage.

Richard.
-- 
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Sequential access of XML nodes.

2011-09-26 Thread Ross McKay

On Mon, 26 Sep 2011 14:17:43 -0400, Adam Richardson wrote:

>I believe the XMLReader allows you to pull node by node, and it's really
>easy to work with:
>http://www.php.net/manual/en/intro.xmlreader.php
>
>In terms of dealing with various forms of compression, I believe you con use
>the compression streams to handle this:
>http://stackoverflow.com/questions/1190906/php-open-gzipped-xml
>http://us3.php.net/manual/en/wrappers.compression.php

+1 here. XMLReader is easy and fast, and will do the job you want albeit
without the nice foreach(...) loop Richard spec's. You just loop over
reading the XML and checking the node type, watching the state of your
stream to see how to handle each iteration.

e.g. (assuming $xml is an open XMLReader, $db is PDO in example)

$text = '';
$haveRecord = FALSE;
$records = 0;

// prepare insert statement
$sql = '
insert into Product (ID, Product, ...)
values (:ID, :Product, ...)
';
$cmd = $db->prepare($sql);

// set list of allowable fields and their parameter type
$fields = array(
'ID' => PDO::PARAM_INT,
'Product' => PDO::PARAM_STR,
...
);

while ($xml->read()) {
switch ($xml->nodeType) {
case XMLReader::ELEMENT:
if ($xml->name === 'Product') {
// start of Product element, 
// reset command parameters to empty
foreach ($fields as $name => $type) {
$cmd->bindValue(":$name", NULL, PDO::PARAM_NULL);
}
$haveRecord = TRUE;
}
$text = '';
break;

case XMLReader::END_ELEMENT:
if ($xml->name === 'Product') {
// end of Product element, save record
if ($haveRecord) {
$result = $cmd->execute();
$records++;
}
$haveRecord = FALSE;
}
elseif ($haveRecord) {
// still inside a Product element, 
// record field value and move on
$name = $xml->name;
if (array_key_exists($name, $fields)) {
$cmd->bindValue(":$name", $text, $fields[$name]);
}
}
$text = '';
break;

case XMLReader::TEXT:
case XMLReader::CDATA:
// record value (or part value) of text or cdata node
$text .= $xml->value;
break;

default:
break;
}
}

return $records;
-- 
Ross McKay, Toronto, NSW Australia
"Tuesday is Soylent Green day"

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Sequential access of XML nodes.

2011-09-26 Thread Adam Richardson

On Mon, Sep 26, 2011 at 12:24 PM, Richard Quadling wrote:

> Hi.
>
> I've got a project which will be needing to iterate some very large
> XML files (around 250 files ranging in size from around 50MB to
> several hundred MB - 2 of them are in excess of 500MB).
>
> The XML files have a root node and then a collection of products. In
> total, in all the files, there are going to be several million product
> details. Each XML feed will have a different structure as it relates
> to a different source of data.
>
> I plan to have an abstract reader class with the concrete classes
> being extensions of this, each covering the specifics of the format
> being received and has the ability to return a standardised view of
> the data for importing into mysql and eventually MongoDB.
>
> I want to use an XML iterator so that I can say something along the lines
> of ...
>
> 1 - Instantiate the XML iterator with the XML's URL.
> 2 - Iterate the XML getting back one node at a time without keeping
> all the nodes in memory.
>
> e.g.
>
>  $o_XML = new SomeExtendedXMLReader('http://www.site.com/data.xml');
> foreach($o_XML as $o_Product) {
>  // Process product.
> }
>
>
> Add to this that some of the xml feeds come .gz, I want to be able to
> stream the XML out of the .gz file without having to extract the
> entire file first.
>
> I've not got access to the XML feeds yet (they are coming from the
> various affiliate networks around, and I'm a remote user so need to
> get credentials and the like).
>
> If you have any pointers on the capabilities of the various XML reader
> classes, based upon this scenario, then I'd be very grateful.
>
>
> In this instance, the memory limitation is important. The current code
> is string based and whilst it works, you can imagine the complexity of
> it.
>
> The structure of each product internally will be different, but I will
> be happy to get back a nested array or an XML fragment, as long as the
> iterator is only holding onto 1 array/fragment at a time and not
> caching the massive number of products per file.
>
> Thanks.
>
> Richard.
>
>
> --
> Richard Quadling
> Twitter : EE : Zend : PHPDoc
> @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
I believe the XMLReader allows you to pull node by node, and it's really
easy to work with:
http://www.php.net/manual/en/intro.xmlreader.php

In terms of dealing with various forms of compression, I believe you con use
the compression streams to handle this:
http://stackoverflow.com/questions/1190906/php-open-gzipped-xml
http://us3.php.net/manual/en/wrappers.compression.php

Adam

-- 
Nephtali:  A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com

Re: [PHP] Sequential access of XML nodes.

2011-09-26 Thread Stuart Dallas

On 26 Sep 2011, at 17:24, Richard Quadling wrote:
> I've got a project which will be needing to iterate some very large
> XML files (around 250 files ranging in size from around 50MB to
> several hundred MB - 2 of them are in excess of 500MB).
> 
> The XML files have a root node and then a collection of products. In
> total, in all the files, there are going to be several million product
> details. Each XML feed will have a different structure as it relates
> to a different source of data.
> 
> I plan to have an abstract reader class with the concrete classes
> being extensions of this, each covering the specifics of the format
> being received and has the ability to return a standardised view of
> the data for importing into mysql and eventually MongoDB.
> 
> I want to use an XML iterator so that I can say something along the lines of 
> ...
> 
> 1 - Instantiate the XML iterator with the XML's URL.
> 2 - Iterate the XML getting back one node at a time without keeping
> all the nodes in memory.
> 
> e.g.
> 
>  $o_XML = new SomeExtendedXMLReader('http://www.site.com/data.xml');
> foreach($o_XML as $o_Product) {
> // Process product.
> }
> 
> 
> Add to this that some of the xml feeds come .gz, I want to be able to
> stream the XML out of the .gz file without having to extract the
> entire file first.
> 
> I've not got access to the XML feeds yet (they are coming from the
> various affiliate networks around, and I'm a remote user so need to
> get credentials and the like).
> 
> If you have any pointers on the capabilities of the various XML reader
> classes, based upon this scenario, then I'd be very grateful.
> 
> 
> In this instance, the memory limitation is important. The current code
> is string based and whilst it works, you can imagine the complexity of
> it.
> 
> The structure of each product internally will be different, but I will
> be happy to get back a nested array or an XML fragment, as long as the
> iterator is only holding onto 1 array/fragment at a time and not
> caching the massive number of products per file.

As far as I'm aware, XML Parser can handle all of this.

http://php.net/xml

It's a SAX parser so you can feed it the data chunk by chunk. You can use 
gzopen to open gzipped files and manually feed the data into xml_parse. Be sure 
to read the docs carefully because there's a lot to be aware of when parsing an 
XML document in pieces.

-Stuart

-- 
Stuart Dallas
3ft9 Ltd
http://3ft9.com/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Sequential access of XML nodes.

Re: [PHP] Sequential access of XML nodes.

Re: [PHP] Sequential access of XML nodes.

Re: [PHP] Sequential access of XML nodes.

Re: [PHP] Sequential access of XML nodes.

5 matches

Site Navigation

Mail list logo

Footer information