Hi there,

I was wondering is it possible to get doc_id during the indexing process, or can I simply assume that doc_id starts from 0 and increments with each record added?

Basically, I need SQL like:
INSERT INTO tbl (name) VALUES ('John') RETURNING id
after each INSERT I can extend the list of document id's in which name John appears.

For example, I want to make a hash which maps some people names to a list of internal doc_id:

my %keyword_to_doc_id;
while (...) {
  my $content = ...get a document;
  my $keyword = .. get a person's name;

  $indexer->add_doc( { doc_content => $content, ... } );
  push ( @{$keyword_to_doc_id{$keyword}}, <doc_id> ) if ($keyword is in the 
$content)
}|
$indexer->commit;
...
make another index of keywords appearing in the indexed documents without
time consuming search of previously created index for|||millions of predefined 
keywords|
|

For text mining purposes, I can later analyze only index of predefined keywords (metadata), and extend the search to much bigger documents index only when needed.

Alex

Reply via email to