Hi all I'm wondering about how nutch handle cookies defined while fetching a page.
1) are those cookies used when nutch is crawling urls generated from that page ? 2) is there a way to configure Nutch so the values of some of those cookies are considered as part of the identity of the page (as well as the URL) (ready to do some dev if necessary) For the last point, I'm trying to fetch en e-commerce web site working for different shops selling the same products. You can enter a shop via a specific url (shop-home) that will set a cookie for this shop. And then, the urls for the product are exactly the same whatever the shop, but the information on the page (price, availability and so on) is different depending on the cookie defining the shop. Thus, with the usual nutch config, beginning the fetch using all "shop-home" urls as seeds, nutch will fetch only one page per product (url being the identity) and not one page per product / shop. Is my analysis correct ? Is there a way arounf that ? Regards RemyA

