Problem: I want to develop a web service in which I have to extract certain data from one site and certain data from another site and will store those to the index.
Case study: Let's say I am developing a real estate search site in which I know all the seed urls and have to extract data only from these seed urls in a pre-defined way. For every seed url website I will extract pre-known fields like location, price, zip, bedrooms, description etc. and will add these fields to the index. Here the field extraction will be different for different sites and for that have to use xpath,xquery or regex expressions for every such site. So I want kind of web-harvest(http://web-harvest.sourceforge.net) integration in nutch. Can anyone suggest me any such plugin or any other way to do this. Thanks Anarus -- View this message in context: http://www.nabble.com/Is-there-any-plugin-for-data-extraction-using-Xpath%2C-XQuery-or-regex-for-nutch-tf4742306.html#a13561159 Sent from the Nutch - User mailing list archive at Nabble.com.
