Hi people. I am having some trouble with the PREG functions in php.
Here's what I am trying to do...
First of all I am reading in a file which is 1.5mb's in size, it could be many more,
going up to 8mb's, the contents of the file is input to a string.
The format of the file is as follows...
# # # "quoted text" "quoted text" # #
the # represents a number, in the case of the first 3 numbers they are only ever 1 or
2 digits long. The final two digits can get to be rather big in size, thousands and
millions. Each element is seperated by a tab space and then a carriage return (\r)
terminates each record.
I use preg_match_all to find all the lines that start with 1 and 1 as there first
numbers, typically there will be 25 entries of 1 1. So I am looking for all lines in
this format:
1 1 # "quoted text" "quoted text" # #
I have the search pattern figured out, it is as follow:
preg_match_all("/($first)\t($second)\t([0-9]{1,2})\t\"([^\"]*)\"\t\"([^\"]*)\"\t([0-9]*)\t([0-9]*)\r/",
$input, $output, PREG_SET_ORDER );
When this pattern finds a matching line beginning equal to $first and $second it will
put all the elements of the record into the array $output. $output[0] being the array
of the first elements found, $array[1] being the second line that was matched, and so
on.
This pattern does actually work to some extent. When the filesize is low (100kb) it
works fine, but when I start to get over that filesize it becomes greedy and the
$second value doesnt seem to be taken into account when it searchs. It seems to
return everything that equals the following:
1 # # "quoted text" "quoted text" # #
Obviously not what I want. Could this be some sort of overflow problem? I am at a
lost end here, so if anyone could offer some insight as to why it is not functioning
correctly I would most welcome it. Overwise the only solution I can think of is
chopping up the input, I dont really want to go down that path, as it seems like a
rather cheap workaround.
Thanks.
Matt