Depends what you are planing to do, nutch 0.8 support meta data that is very flexible (key value tuples) and fast. Also you can store information in parseData.getMetaData, these will be available until indexing as well.


Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:

Sorry to just jumpping in.
We have doc id associated when we index. We could store the doc id in mysql table.We could use the docid to query the nutch database..
When parsing, capture things needed as part of "metadata"
Index the metadata. the docId associated is stored in mysql.

Does that give any idea ?...
Please do share your concerns. I am working on a similar stuff where eventually we have to adopt a database.

Thanks



John Reidy <[EMAIL PROTECTED]> wrote: I am looking at something similar.

I would guess the place to put it is the indexer. As I understand it the
parser runs for just about everything fetched, however the indexer is
only run for pages you want to index.
I am also looking at having static objects (Eg a connection) that is
initialise when the plugin is loaded, ideally through the startup method.

Regards

John

Hey all,
I have writen a custom HTML parser and indexer. I would like to save some information that I have gathered during the parse in a Mysql DB. I imagine there could be some performance hit here (e.g. connecting to db). What's the best place to add code to save this information - the parser or the
indexer?

-Mike
--
View this message in context: http://www.nabble.com/Saving- Metadata-to-Mysql-t1389216.html#a3732992
Sent from the Nutch - User forum at Nabble.com.







  Sudhi Seshachala
  http://sudhilogs.blogspot.com/



                
---------------------------------
How low will we go? Check out Yahoo! Messenger’s low PC-to-Phone call rates.

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to