php-general Digest 24 Feb 2009 19:02:37 -0000 Issue 5977

php-general-digest-help Tue, 24 Feb 2009 11:05:19 -0800

php-general Digest 24 Feb 2009 19:02:37 -0000 Issue 5977

Topics (messages 288726 through 288744):


Re: Why PHP won
        288726 by: Michael A. Peters
        288728 by: Per Jessen
        288741 by: Michael A. Peters
        288742 by: Per Jessen
        288743 by: Michael A. Peters

Re: ms-word reading from PHP on linux O.S
        288727 by: Per Jessen

Re: optimizing space for array of booleans
        288729 by: leledumbo
        288735 by: Andrew Ballard

Re: Unexpected results using ksort on arrays in which the keys are mixed 
alphanumeric strings.
        288730 by: Jochem Maas

Re: PDO buffered query problem
        288731 by: Thodoris

Re: help installing phpDocumentor
        288732 by: Bob McConnell
        288734 by: 9el

Re: how to deal with multiple authors for one book
        288733 by: PJ

XML and :
        288736 by: Merlin Morgenstern
        288738 by: Andrew Ballard

Re: multiple choice dropdown box puzzle
        288737 by: PJ

Re: RecursiveDirectoryIterator and foreach
        288739 by: Nathan Rixham

Re: File Write Operation Slows to a Crawl....
        288740 by: Jochem Maas

SimpleXML
        288744 by: Alex Chamberlain

Administrivia:

To subscribe to the digest, e-mail:
        [email protected]

To unsubscribe from the digest, e-mail:
        [email protected]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------

--- Begin Message ---
Paul M Foster wrote:
On Mon, Feb 23, 2009 at 01:39:51PM -0800, Daevid Vincent wrote:
http://startuplessonslearned.blogspot.com/2009/01/why-php-won.html
I *like* the way this guy thinks.

Paul
It was a decent page.
Point #2 though - you can use mod_rewrite to do wonders with respect tourl presentation, and you can use DOMDocument to completely constructthe page before sending it to the browser - allowing you to translatexhtml to html for browsers that don't properly support xhtml+xml.
Avoiding mixing html with php really is better, and php does let you dothat fairly easily.
--- End Message ---

--- Begin Message ---

Michael A. Peters wrote:

[anip]
> and you can use DOMDocument to completely
> construct the page before sending it to the browser - allowing you to
> translate xhtml to html for browsers that don't properly support
> xhtml+xml.

I suspect you meant "translate xml to html"?  I publish everything in
xhtml, no browser has had a problem with that yet.

-- 
Per Jessen, Zürich (2.1°C)

--- End Message ---

--- Begin Message ---
Per Jessen wrote:
Michael A. Peters wrote:

[anip]
and you can use DOMDocument to completely
construct the page before sending it to the browser - allowing you to
translate xhtml to html for browsers that don't properly support
xhtml+xml.
I suspect you meant "translate xml to html"?  I publish everything in
xhtml, no browser has had a problem with that yet.
IE 6 does not accept the xml+html mime type. I don't believe IE7 doeseither, I think IE 8 beta does but I'm not sure.
If you are sending xhtml with an html mime type you are breaking astandard, even if it seems to work. xhtml is suppose to be sent with thexml+xhtml mime type.
by translating from xhtml to html 4.01 you can send the text/html mimetype to browsers that don't report they accept applicatoiion/xml+xhtmland send them standards compliant html.
By using DOMDocument it is really easy to do.

Standards compliance is important, and it isn't hard to achieve.

The following is the filter I use for browsers that don't accept xml+html:

function HTMLify($buffer) {
/* based onhttp://www.kilroyjames.co.uk/2008/09/xhtml-to-html-wordpress-plugin */
   $xhtml[] = '/<script([^<]*)\/>/';
   $html[]  = '<script\\1></script>';

   $xhtml[] = '/<div([^<]*)\/>/';
   $html[]  = '<div\\1></div>';

   $xhtml[] = '/<a([^<]*)\/>/';
   $html[]  = '<a\\1></a>';

   $xhtml[] = '/\/>/';
   $html[]  = '>';

   return preg_replace($xhtml, $html, $buffer);
   }

That's only part of the work, but it is the most important part.
Note that that doesn't translate all valid xhtml - it doesn't forexample take into account legal whitespace that DOMDocument doesn'tproduce. If you want more robust translator, use an xslt filter, butthis function is simple and doesn't require the various xslt parametersand libraries at php compile time.
When I create a new DOMDocument class instance -

$myxhtml = new DOMDocument("1.0","UTF-8");
$myxhtml->preserveWhiteSpace = false;
$myxhtml->formatOutput = true;
$xmlHtml = $myxhtml->createElement("html");
if ($usexml == 1) {
   $xmlHtml->setAttribute("xmlns","http://www.w3.org/1999/xhtml";);
   $xmlHtml->setAttribute("xml:lang","en");
   }
when the document is created - instead of printing the output I save itas a variable and then add the proper DTD:
function sendpage($page,$usexml) {
$xhtmldtd="\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\"\"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\";>\n";$htmldtd="<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"\"http://www.w3.org/TR/html4/strict.dtd\";>";
   if ($usexml == 0) {
$bar=preg_replace('/<\?xml version=\"1.0\"encoding=\"UTF-8\"\?>/',$htmldtd,$page,1);
      $bar = HTMLify($bar);
      } else {
      $bar=preg_replace('/\n/',$xhtmldtd,$page,1);
      }
   sendxhtmlheader($usexml);
   print($bar);
   }
I add the DTD there with preg_replace because I haven't yet found aDOMDocument way to add it, which is how it should be done - someone toldme the DTD is currently read only in DOMDocument and that in the future,I'll be able to add it when a new DOMDocument class is created, so fornow, the preg_replace hack works, and I might keep it that for some time.
To determine whether or not to send xhtml - when the page is requested(before any of that other code is executed) the following code is run:
if (! isset($usexml)) {
   $usexml=1;
   }
if ($usexml == 1) {
   if (isset( $_SERVER['HTTP_ACCEPT'] )) {
      if(! strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
         $usexml=0;
         }
      } else {
      $usexml=0;
      }
   }

The sendxhtmlheader function:

function sendxhtmlheader($usexml) {
   if ($usexml == 1) {
      header("Content-Type: application/xhtml+xml; charset=utf-8");
      } else {
      header("Content-type: text/html; charset=utf-8");
      }
   }
Anyway - all that is in a file I call "xhtml.inc" that I include in allmy pages, then to build the document, I use the $myxhtml object thatinclude creates. To send the document to the browser -
$foo=$myxhtml->saveXML();
sendpage($foo,$usexml);
--- End Message ---

--- Begin Message ---

Michael A. Peters wrote:

> Per Jessen wrote:
>> Michael A. Peters wrote:
>> 
>> [anip]
>>> and you can use DOMDocument to completely
>>> construct the page before sending it to the browser - allowing you
>>> to translate xhtml to html for browsers that don't properly support
>>> xhtml+xml.
>> 
>> I suspect you meant "translate xml to html"?  I publish everything in
>> xhtml, no browser has had a problem with that yet.
>> 
> 
> IE 6 does not accept the xml+html mime type. I don't believe IE7 does
> either, I think IE 8 beta does but I'm not sure.

I don't use any of them, but I thought even IE6 was able to deal with
xml. 

> If you are sending xhtml with an html mime type you are breaking a
> standard, even if it seems to work. 
> xhtml is suppose to be sent with the xml+xhtml mime type.

>From http://www.w3.org/TR/xhtml-media-types/

"In general, 'application/xhtml+xml' should be used for XHTML Family
documents, and the use of 'text/html' should be limited to
HTML-compatible XHTML Family documents intended for delivery to user
agents that do not explicitly state in their HTTP Accept header that
they accept 'application/xhtml+xml' [HTTP]. "

/Per

-- 
Per Jessen, Zürich (2.7°C)

--- End Message ---

--- Begin Message ---
Per Jessen wrote:
I don't use any of them, but I thought even IE6 was able to deal with
xml.
What happens is IE6 (and I believe IE7) asks the user what applicationthey want to open the file with if it receives an xml+xhtml header.
IE does parse xhtml but only if sent with an incorrect html header.
IE8 is suppose to fix that, but it will probably take years before IE6/7is for all practical purposes out of circulation.
--- End Message ---

--- Begin Message ---

Srinivasa Rao D wrote:

> Hi all,
>       * How better, i can  read ms-word doc files  from PHP on LINUX
>       OS*.
[snip]
> 
>   *Is there are any other softwares that can fetch text from MS-WORD
>   file?.*

OpenOffice.



-- 
Per Jessen, Zürich (1.9°C)

--- End Message ---

--- Begin Message ---

Just tried serializing array of 256 booleans and printing the length, it
really shocked me: 2458. This project will be used by about 500 students, so
in the worst case (all students enroll all courses) it will eat 500 * 2458
(assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big
deal, eh?

-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22177808.html
Sent from the PHP - General mailing list archive at Nabble.com.

--- End Message ---

--- Begin Message ---

On Tue, Feb 24, 2009 at 3:24 AM, leledumbo <[email protected]> wrote:
>
> Just tried serializing array of 256 booleans and printing the length, it
> really shocked me: 2458. This project will be used by about 500 students, so
> in the worst case (all students enroll all courses) it will eat 500 * 2458
> (assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big
> deal, eh?
>

That is a very un-normalized way of storing associations between
entities in a database. You describe a "worst case scenario" where
every student is registered for every course. If you serialize an
array, every bit takes the same amount of space so it doesn't matter
how many courses each student has. If you convert this to a bitmask as
you are describing, your worst case would happen very frequently - at
least as often as students are assigned to the course whose place is
the most significant bit value. What's more, if you later need to add
a 257th course, you could have lots of work to do to adjust your code
to deal with the extra value.

Generally relationships like the one you describe are stored in three
separate and related tables: Students, Courses, and Enrollment. The
latter is a n:m association between the first two. The advantage this
approach has with regard to storage is that it is a sparse matrix.
Most students will only enroll in a handful of courses. With this
approach, you only store a row in the Enrollment table if a student is
enrolled in a course. Otherwise, you don't need to store anything.
Granted, if you use 4-byte integers for the primary keys in both the
Students and Courses tables, your data storage would be about 2048
bytes for the data "worst case scenario" that you described, and
another 2048 bytes for a unique index that would insure that each
student could be enrolled in any course no more than one time.
However, since I highly doubt most students are going to be enrolled
in every course, it is not likely that you'll approach the 1.2MB
storage space that you're worried about. Each row would take 16 bytes
to store both the data and the index; if each student is enrolled in
an average of 4 courses, it should only take an average of 64 bytes
per student. For 500 students, that works out to around 2000 rows at
about 32KB overall.

For comparison, I have a fairly small table in a system we recently
put into production that maintains similar associations. It stores
more information in each row than just a two-column association (about
17 columns) and currently has over 7400 rows. The data is 0.734MB and
the total size of three indexes is 0.820MB.

And no, 1.2MB is not that big of a deal. A single floppy high-density
drive could hold that much data 18 years ago!

Andrew

--- End Message ---

--- Begin Message ---

Clancy schreef:
> I have been experimenting using four character alphanumeric keys on an array, 
> and when I
> generated a random set of keys, and then used ksort to sort the array, I was 
> very
> surprised to find that if the key contained any non-numeric character, or if 
> it started
> with zero, the key was sorted as a base 36 number (0- 9, A-Z, as I expected. 
> However if
> the key only contained numbers, and did not start with zero, it was sorted to 
> the end of
> the list.

did your experiment include reading the manual? or did you expect ksort() to 
known what
kind of sort you wanted? ... try specifying a sort flag:

<?php

$r = array(
    'ASDF' => true,
    '000A' => true,
    '0009' => true,
    '0999' => true,
    '0000' => true,
    '09A0' => true,
    '9999' => true,
    '1000' => true,
    'ZZZZ' => true,
);

echo "UNSORTED:\n";
print_r($r);

echo "SORT_REGULAR:\n";
ksort($r, SORT_REGULAR);
print_r($r);

echo "SORT_NUMERIC:\n";
ksort($r, SORT_NUMERIC);
print_r($r);

echo "SORT_STRING:\n";
ksort($r, SORT_STRING);
print_r($r);


> 
> Thus:
>       0000
>       0009
>       000A
> 
>       0999
>       09A0
>       ASDF
> 
>       ZZZZ
>       1000
>       9999
> 
> I presume this is related to last weeks discussions about casting variables, 
> but I cannot
> understand why 0999 should go to the start of the list, while 1000 goes to 
> the end. Can
> anyone explain this logically?
>

--- End Message ---

--- Begin Message ---
Stewart Duncan wrote:
Hi there,
I'm having some serious problems with the PHP Data Object functions.I'm trying to loop through a sizeable result set (~60k rows, ~1gig)using a buffered query to avoid fetching the whole set.
No matter what I do, the script just hangs on the PDO::query() - itseems the query is running unbuffered (why else would the change inresult set size 'fix' the issue?). Here is my code to reproduce theproblem:
<?php
$Database = new PDO(
    'mysql:host=localhost;port=3306;dbname=mydatabase',
    'root',
    '',
    array(
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true
    )
);
Don't you want it the other way around? You want it unbuffered so itdoesn't get retrieved in mysql, send the whole result set to php thenyou can use it.
You want to start using it immediately - so make it unbuffered.
Either way if the result set is going to be large your MySQL's memory orPHP's memory may exceed. So if you use either you may need to fine-tunePHP by increasing the per process memory (memory_limit in php.ini) or MySQL.
In case you use unbuffered queries you cannot use transactions as far asI can recall.
--
Thodoris
--- End Message ---

--- Begin Message ---

From: Jim Lucas
> 
> I may be wrong, but I heard short tags were going the
>  way of the Dodo bird as of PHP6.

This is not surprising. With the advent of XHTML, the short tag option
collides with another valid tag, "<?xml". So that option has to be
turned off as soon as you need any XML in your pages. I'm in the process
of correcting that in more than 150 files in one project alone. There
are two other bigger projects that require the same treatment. All three
also make extensive use of magic quotes and register_globals, which are
likewise becoming extinct.

Which reminds me, where can I get a definitive list of all deprecated
features? In addition to identifying each feature, it should indicate
which release marked them deprecated, and which release will no longer
support them, if known.

Bob McConnell

--- End Message ---

--- Begin Message ---

-----------------------------------------------------------------------
Use FreeOpenSourceSoftwares, Stop piracy, Let the developers live. Get
a Free CD of Ubuntu mailed to your door without any cost. Visit :
www.ubuntu.com
----------------------------------------------------------------------


On Tue, Feb 24, 2009 at 7:51 PM, Bob McConnell <[email protected]> wrote:

> From: Jim Lucas
> >
> > I may be wrong, but I heard short tags were going the
> >  way of the Dodo bird as of PHP6.
>
> This is not surprising. With the advent of XHTML, the short tag option
> collides with another valid tag, "<?xml". So that option has to be
> turned off as soon as you need any XML in your pages. I'm in the process
> of correcting that in more than 150 files in one project alone. There
> are two other bigger projects that require the same treatment. All three
> also make extensive use of magic quotes and register_globals, which are
> likewise becoming extinct.


well if you wanna keep short tags.. then  <?php echo "<?xml "
would be better idea.

>
>
> Which reminds me, where can I get a definitive list of all deprecated
> features? In addition to identifying each feature, it should indicate
> which release marked them deprecated, and which release will no longer
> support them, if known.
>
> Bob McConnell
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

--- End Message ---

--- Begin Message ---

Reinhardt Christiansen wrote:
>
>
>
>> From: PJ <[email protected]>
>> To: MySql <[email protected]>
>> Subject: how to deal with multiple authors for one book
>> Date: Mon, 16 Feb 2009 17:20:54 -0500
>>
>> In my db there are a number of books with several authors; so, I am
>> wondering how to set up a table on books and authors to be able to
>> insert (via php-mysql pages) data and retrieve and display these books
>> with several authors
>> I suspect that to insert data for a multiple author book I will have to
>> enter all data other than the author names into the book table and enter
>> the authors in the author tables with foreign keys to reference the
>> authors and their book.
>> Then to retrieve and display the book,I would have to use some kind of
>> join instruction with a where clause(regarding the position - 1st, 2nd,
>> 3rd...) to retrieve the authors and their order. The order would
>> probably be done by a third field (e.g. f_name, l_name, position) in the
>> book_author table (tables in db -  book, author, and book_author)
>> Am I on the right track, here?
>>
> Sort of, but not completely.
>
> I think you would really benefit from a tutorial or course on data
> normalization. I haven't looked for one in several years so I can't
> suggest a specific tutorial but if you google it, you may well find
> something that you like.
>
> In a nutshell, you are trying to implement a many-to-many relationship
> (a book can have several authors and an author can have several
> books). These are not normally implemented directly in relational
> databases. Instead, you typically have intermediate tables that are
> usually called "association tables" (or "intersection tables") that
> sit between the other tables. In your case, you might see something
> like this:
>
> Book Table
> =======
> Book_code    Title
> ---------------     ------
> Z1                 The Mote In God's Eye
> Z2                  Ringworld
> Z3                  Janissaries
> Z4                  War and Peace
>
>
> Author Table
> ========
> Author_code   Author_name
> ------------------   ------------------
> 1                   Larry Niven
> 2                  Jerry Pournelle
> 87                Leo Tolstoy
>
> Books Table (intersection table)
> =======
> Book_code         Author_code
> ---------------         ------------------
> Z1                    1
> Z1                    2
> Z2                    1
> Z3                    2
> Z4                    87
>
> In other words, the Books table identifies that The Mote in God's Eye
> is written by Niven _and_ Pournelle; Ringworld is written by Niven
> alone and Janissaries is written by Pournelle alone. And, of course,
> War and Peace is written by Tolstoy.
>
> You're going to want to do something very much like this.
>
> A good tutorial will explain this well. I'm out of time; I have to go
> now.
>
> -- 
> Rhino
>
>
>
Thank you for your clear explanation.
I have things set up rather well and have been able to generate a web
page to insert most of the data in the db and retrieve it to display in
another web page.
I say most because I have several "small" problems which I have posted
on mysql and php lists. Perhaps you can suggest something either where
and how to post or what to do.

Problem 1. How to SELECT and display multiple authors. Presently, my
book_author(intersection table contains fields authID, bookID and
ordinal. ordinal refers to the order of the author's name (1 if only 1
or first in line; 2 if 2nd in line). To retrieve the author's name I use
CONCAT_WS(' ', first_name, last_name) AS Author. So far, in my testing I
only have 10 books in the db with only single authors. Undoubtedly this
is not the way to go to retrieve 2 authors and display them as
(first_name1 last_name1 and first_name2 lastname or "Joe Firstauthor adn
Bob Secondauthor").

The present query (works fine for 1 author):
"SELECT b.title, b.sub_title, b.descr, b.comment, b.bk_cover,
b.copyright, b.ISBN, b.sellers, c.publisher,
CONCAT_WS(' ', first_name, last_name) AS Author
FROM book AS b
LEFT JOIN book_author AS ab ON b.id = ab.bookID
LEFT JOIN author AS a ON ab.authID=a.id
LEFT JOIN book_publisher as abc ON b.id = abc.bookID
LEFT JOIN publishers AS c ON abc.publishers_id = c.id
ORDER BY title ASC ";

But to show 2 authors I think I need something of the order of:
CONCAT_WS (' ', (CONCAT_WS (' ', [(first_name, last_name)WHERE
book_author.ordinal = 1], [(first_name, last_name)WHERE
book_author.ordinal = 2]) AS Author

I suspect that one cannot nest the CONCAT_WS statement and I suspect the
WHERE is not in the right place either, but this seems to be fairly
logical... am I on the right track?

Problem 2... is similar to Problem 1 but deals with multiple categories
(62) and I'll deal with than when I get this one solved.
TIA

-- 

Phil Jourdan --- [email protected]
   http://www.ptahhotep.com
   http://www.chiccantine.com

--- End Message ---

--- Begin Message ---
Hi there,
I am trying to pars an XML file with php. This works if the xml taglooks like this: <anbieternr>88</anbieternr>
In that case I retrieve the info: $xml->anbieternr
But now the tag looks different like this:<imo:anbieternr>88</imo:anbieternr>
The command $xml->imo:anbieternr does not work in that case.

Has somebody an idea how to adress this?

Thank you for any help!

Merlin
--- End Message ---

--- Begin Message ---

On Tue, Feb 24, 2009 at 9:51 AM, Merlin Morgenstern
<[email protected]> wrote:
> Hi there,
>
> I am trying to pars an XML file with php. This works if the xml tag looks
> like this: <anbieternr>88</anbieternr>
> In that case I retrieve the info: $xml->anbieternr
>
> But now the tag looks different like this:
> <imo:anbieternr>88</imo:anbieternr>
>
> The command $xml->imo:anbieternr does not work in that case.
>
> Has somebody an idea how to adress this?
>
> Thank you for any help!
>
> Merlin
>

Short answer? You probably need to look at using XPath queries in
SimpleXML to get to those elements. Shorter answer? If you have to
deal with namespaces in documents, it's probably time for the project
to graduate from SimpleXML to something like DOM.

Look at the user note on this page:
http://us3.php.net/manual/en/intro.simplexml.php

Andrew

--- End Message ---

--- Begin Message ---

Bob McConnell wrote:
> From: PJ
>   
>> Here's my test page and, so far, nothing works...
>>     
>
> Please expound on "nothing works...". What do you see in the browser?
> What do you see in the server logs?
>   
It's not in the browser that I look, rather in the db: nothing is
INSERTed. I have confirmed that in some configurations do work
partly(can't recall which - I know, should have noted this, but that was
only to confirm that something was being written to the db and that is
now a part of the current script which does not yet fully work).
>   
>> <?
>>     
>
> I strongly recommend changing all of these to <?php per the XHTML specs.
> That will reduce the ambiguity and possibly prevent errors like the next
> one.
>
The mix of php and html is rather muddled... (for me, at least, when I
edit the page as XHTML. I am using HomeSite+ to edit and the validation
doesn't work very well since there is HTML code within the php code...
And suggestions for validating (other than online as my work is only
local for now)?
>   
>> <select name="<?echo $categoriesIN?>.'[]'" multiple="multiple"
>>     
> size="5">
>
> You at least need a space between <? and echo, otherwise the server is
> trying to parse "<?echo" as a single token, and that is probably
> undefined.
>   
Right, this was a typo I didn't catch. Thank you for pointing it out.
The rest of these type of statements are quite correct and the input
web-page works fine. It's just the input of the categories that does not.
> Bob McConnell
>
>   


-- 

Phil Jourdan --- [email protected]
   http://www.ptahhotep.com
   http://www.chiccantine.com

--- End Message ---

--- Begin Message ---
Ryan Panning wrote:
I have discovered that when I foreach over a RecursiveDirectoryIterator(see example below) the $item actually turns into a SplFileInfo object.I would expect it to be a RecursiveDirectoryIterator. How do I do ahasChildren() on SplFileInfo?
seems like expected functionality to me, you're looping over thecontents of the directory and that can only be a file or a directory -thus SplFileInfo seems correct?
in short you don't call hasChildren() on SplFileInfo, you call it onRecursiveDirectoryIterator.
From the docs:
RecursiveDirectoryIterator::hasChildren — Returns whether current entryis a directory and not '.' or '..'
RecursiveDirectoryIterator::getChildren — Returns an iterator for thecurrent entry if it is a directory
Thus this is how you call getChildren properly:

$dir = new RecursiveDirectoryIterator( dirname(dirname(__FILE__)) );
foreach($dir as $splFileInfo) {
  if( $dir->hasChildren() ) {
    $childDir = $dir->getChildren();
    echo get_class($childDir) . ' ' . $childDir->getPath() .  PHP_EOL;
  }
}

many regards,

Nathan
--- End Message ---

--- Begin Message ---

I read that you already got your script performance up,
but I'd still like to suggest that you shouldn't be reading in
a complete 18Mb file (especially given that you don't know
in advance whether some day(s) this size might be much larger).

instead you should be opening a handle to the file and then
read in, parse, & write out the parse results one line at a time.

to get an idea of what I mean take a good look at this page of the manual:

http://php.net/manual/en/function.fgets.php

[email protected] schreef:
> Hi:
> 
> Newbie here. This is my first attempt at PHP scripting. I'm trying to find
> an alternative to Lotus Domino's domlog.nsf for logging web transactions.
> Domino does create an Apache compatible text file of the web transactions,
> and this is what I’m trying to parse. I started off using a code snibbet I
> found on the web. I modified it a little bit to suit my needs. It was
> working fine with the small 600k test log file I was using, but since I’ve
> moved to the larger 18Mb production log file here’s what happens:
> 
> I’ve modified the code and added an echo statement to echo each loop that
> gets processed. Initially it starts off very fast but then performance
> becomes very slow, to a point where I can count each loop as it’s being
> processed. It’s taking a little over 3 hours to parse the entire file. I
> figured it was a disk cache thing, so I created a ram drive. This has
> improved the performance, but is still taking an hour to parse.
> 
> Here is the PHP script I’m using:
> 
> 
> <?php
> 
> $ac_arr = file('access_log');
> $astring = join("", $ac_arr);
> $astring = preg_replace("/(\r|\t)/", "", $astring);
> $records = preg_split("/(\n)/", $astring, -1, PREG_SPLIT_NO_EMPTY);
> 
> $sizerecs = sizeof($records);
> 
> // now split into records
> $i = 1;
> $each_rec = 0;
> 
> while($i<$sizerecs) {
> $all = $records[$i];
> 
> // IP Address ($IP):
> $IP = substr($all, 0, strpos($all, " "));
> $all = str_replace($IP, "", $all);
> 
> //Remote User ($RU):
> $string = substr($all, 0, strpos($all, " [")); // www.vpcl.on.ca T123
> $sstring = substr($string, strpos($string, " ")+1);
> $AUstring = substr($sstring, strpos($sstring, " "));
> $RU = preg_replace("/\"/", "", $AUstring);
> $RU = trim($RU);
> $all = str_replace($string, "", $all);
> 
> //Request Time Stamp ($RTS):
> preg_match("/\[(.+)\]/", $all, $match);
> $RTS = $match[1];
> $all = str_replace(" [$RTS] \"", "", $all);
> 
> //Http Request Line ($HRL):
> $string = substr($all, 0, strpos($all, "\"")+2);
> $HRL = str_replace("\"", "", $string);
> $all = str_replace($string, "", $all);
> 
> //Http Response Status Code (HRSC):
> $HRSC = trim(substr($all, 0, strpos($all, " ")+1));
> $all = str_replace($HRSC, "", $all);
> 
> //Request Content Length (RCL):
> $string = substr($all, 0, strpos($all, "\"")+1);
> $RCL = trim(str_replace("\"", "", $string));
> $all = str_replace($string, "", $all);
> 
> //Referring URL (RefU):
> $string = substr($all, 0, strpos($all, "\"")+3);
> $RefU = substr($all, 0, strpos($all, "\""));
> $all = str_replace($string, "", $all);
> 
> //User Agent (UA):
> $string = substr($all, 0, strpos($all, "\"")+2);
> $UA = substr($all, 0, strpos($all, "\""));
> $all = str_replace($string, "", $all);
> 
> //Time to Process Request:
> 
> #$new_format[$each_rec] = "$UA\n";
> $new_format[$each_rec] =
> "$IP\t$RU\t$RTS\t$HRL\t$HRSC\t$RCL\t$RefU\t$UA\t$all\n";
> 
> $fhandle = fopen("/ramdrive/import_file.txt", "w");
>   foreach($new_format as $data) {
>     fputs($fhandle, "$data");
>     }
>   fclose($fhandle);
> 
> // advance to next record
> echo "$i\n";
> $i = $i + 1;
> 
> $each_rec++;
> }
> ?>
> 
> 
> This is running on a Toshiba Tecra A4 Laptop with FreeBSD 7.0 Release.
> Plenty of RAM and HDD space. The PHP Version is:
> 
> PHP 5.2.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb 11 2009 09:28:47)
> Copyright (c) 1997-2007 The PHP Group
> Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
> 
> What should I do to get this script to run faster?
> 
> Any help is appreciated….
> 
> Regards,
> 
> 
> 
> Fred Schnittke
> 
> 
> ----------------------------
> Powered by Execulink Webmail
> http://www.execulink.com/
> 
>

--- End Message ---

--- Begin Message ---

Hi,

I am trying to write a PHP interface to ISBNdb.com. When I make a certain
request, the following is returned

<ISBNdb server_time="2009-02-24T18:57:31Z">
 <BookList total_results="1" page_size="10" page_number="1"
shown_results="1">
  <BookData book_id="language_proof_and_logic" isbn="157586374X"
isbn13="9781575863740">
   <Title>Language, Proof and Logic</Title>
   <TitleLong></TitleLong>
   <AuthorsText>Jon Barwise, John Etchemendy, </AuthorsText>
   <PublisherText publisher_id="center_for_the_study_of_la_a03">Center for
the Study of Language and Inf</PublisherText>
   <Subjects>
    <Subject
subject_id="amazon_com_nonfiction_philosophy_logic_language">Amazon.com --
Nonfiction -- Philosophy -- Logic &amp; Language</Subject>
    <Subject subject_id="amazon_com_science_general">Amazon.com -- Science
-- General</Subject>
   </Subjects>
  </BookData>
 </BookList>
</ISBNdb>

And when loaded using simplexml_load_string, it gives the following object

object(SimpleXMLElement)#10 (2) {
    ["@attributes"]=>
    array(1) {
      ["server_time"]=>
      string(20) "2009-02-24T18:57:31Z"
    }
    ["BookList"]=>
    object(SimpleXMLElement)#14 (2) {
      ["@attributes"]=>
      array(4) {
        ["total_results"]=>
        string(1) "1"
        ["page_size"]=>
        string(2) "10"
        ["page_number"]=>
        string(1) "1"
        ["shown_results"]=>
        string(1) "1"
      }
      ["BookData"]=>
      object(SimpleXMLElement)#11 (6) {
        ["@attributes"]=>
        array(3) {
          ["book_id"]=>
          string(24) "language_proof_and_logic"
          ["isbn"]=>
          string(10) "157586374X"
          ["isbn13"]=>
          string(13) "9781575863740"
        }
        ["Title"]=>
        string(25) "Language, Proof and Logic"
        ["TitleLong"]=>
        object(SimpleXMLElement)#15 (0) {
        }
        ["AuthorsText"]=>
        string(30) "Jon Barwise, John Etchemendy, "
        ["PublisherText"]=>
        string(40) "Center for the Study of Language and Inf"
        ["Subjects"]=>
        object(SimpleXMLElement)#16 (1) {
          ["Subject"]=>
          array(2) {
            [0]=>
            string(58) "Amazon.com -- Nonfiction -- Philosophy -- Logic &
Language"
            [1]=>
            string(32) "Amazon.com -- Science -- General"
          }
        }
      }
    }

Notice that I lose the attribute subject_id from the Subject tag. Why is
this?? Is there any way from preventing it from happening??

Thanks in advance,

Alex Chamberlain

--- End Message ---

php-general Digest 24 Feb 2009 19:02:37 -0000 Issue 5977

Reply via email to