Hi, I wonder what is a good choice for an environment for a scalable scrapy project similar to scrapinghub? Starting with a single vserver/root-server for crawling and data storing with the possibility to add additional servers when I need more scraping power or database space. According to a blog entry (http://blog.scrapinghub.com/2013/07/26/introducing-dash/), scrapinghub is using Cloudera CDH (run on which OS?) and they store their data in HBase. So this is a good choice?
Is there any information how to setup scrapy in a CDH environment and saving data into HBase? Thank you, Christoph -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.