On 2015-01-28 22:10, ABClf wrote:
Main issue encountered is how .pageindex is handling its indexation
task. It sounds like it definitely stops working when the amount of
_new_ data is too big.
I mean, the process looks like it evaluates first the amount of new
data, rather than starting to index, thus, in case there is too much
new data, you get a memory error message, and the game is over. I wish
the pageindexing would work, and work, and work, no matter how much
new data there is to index, until its done.
If the amount of new data is acceptable, then he will start making the
index. Not in one time : you will have to ask him several times, but
at the end (search 10 times, more or less), you know its done, and you
have not encoutered memory issue.
This is done by the function PageIndexUpdate() in scripts/pagelist.php.
There is a 10 seconds default limit for the indexation work, that is, if
there are more pages that haven't been indexed, they will be dropped and
will be indexed on the next search.
While pages are indexed, there shouldn't be a huge need for memory.
After the terms of a page are compacted, they are written into the
".pageindex,new" file and dropped from memory (actually the values are
replaced). Same for the next pages up to 10 seconds. After, the contents
of the old ".pageindex" file is copied to ".pageindex,new" and then
".pageindex,new" is renamed to ".pageindex", replacing the old file.
None of these operations should require a lot of memory.
The only place I see where the memory usage can grow, is on line 773 of
pagelist.php. This line adds the processed page name to an array, so
that PmWiki knows that the page was already processed. If you have a
huge number of pages, only the characters composing the page names may
go over the memory limit. If your error messages mention this line 773,
the problem is there.
You can reduce the number of pages indexed (actually the number of
seconds of continued indexing) by adding this in config.php:
$PageIndexTime = 5; # 5 seconds instead of 10
I'll review the functions next weekend in case we are missing something.
See also the recipe SystemLimits, you may be able to increase the memory
limits.
Related question is : as I'm using sqlite for storing a big amount of
short and very short pages, why use the pmwiki .pageindex process
rather than performing a fulltext search ?
The SQLite PageStore() class only allows the "storage" of the pages into
a single SQLite database file. The reasons, the pros and cons are
explained in the recipe page.
Other than "a fulltext search from the SQLite database is not yet
written", I think the built-in search using .pageindex will perform much
faster than a fulltext database search.
Petko
_______________________________________________
pmwiki-users mailing list
[email protected]
http://www.pmichaud.com/mailman/listinfo/pmwiki-users