Depends what you are planing to do, nutch 0.8 support meta data that
is very flexible (key value tuples) and fast.
Also you can store information in parseData.getMetaData, these will
be available until indexing as well.
Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:
Sorry to just jumpping in.
We have doc id associated when we index. We could store the doc id
in mysql table.We could use the docid to query the nutch database..
When parsing, capture things needed as part of "metadata"
Index the metadata. the docId associated is stored in mysql.
Does that give any idea ?...
Please do share your concerns. I am working on a similar stuff
where eventually we have to adopt a database.
Thanks
John Reidy <[EMAIL PROTECTED]> wrote: I am looking at
something similar.
I would guess the place to put it is the indexer. As I understand
it the
parser runs for just about everything fetched, however the indexer is
only run for pages you want to index.
I am also looking at having static objects (Eg a connection) that is
initialise when the plugin is loaded, ideally through the startup
method.
Regards
John
Hey all,
I have writen a custom HTML parser and indexer. I would like to
save some
information that I have gathered during the parse in a Mysql DB.
I imagine
there could be some performance hit here (e.g. connecting to db).
What's
the best place to add code to save this information - the parser
or the
indexer?
-Mike
--
View this message in context: http://www.nabble.com/Saving-
Metadata-to-Mysql-t1389216.html#a3732992
Sent from the Nutch - User forum at Nabble.com.
Sudhi Seshachala
http://sudhilogs.blogspot.com/
---------------------------------
How low will we go? Check out Yahoo! Messenger’s low PC-to-Phone
call rates.
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net