Hello all. I'm new here and this is my first post to this group. I was studying the Scrapy docs (it's a slow day at work...;-) when I came across this:
DjangoItem caveats DjangoItem is a rather convenient way to integrate Scrapy projects with Django models, but bear in mind that Django ORM may not scale well if you scrape a lot of items (ie. millions) with Scrapy. This is because a relational backend is often not a good choice for a write intensive application (such as a web crawler), specially if the database is highly normalized and with many indices. http://doc.scrapy.org/en/latest/topics/djangoitem.html Say what? Explain, please! Now, I did keep looking, and found https://groups.google.com/forum/#!searchin/scrapy-users/DjangoItem/scrapy-users/HsDJ-jM7LvM/ESRlGF6QXcIJ "Using DjangoItem step-by-step guide" and the SO post from which it comes. Is the max_locks issue that was brought up there the reason for the caveat? (Note: I can't access github from work - it's blocked - go figure). My project is text heavy (government documents) and I need a solid database to store my results in. And yes, of course I want to scale. A good database is *always *normalized, so what are we talking about here? If you are saying don't use an RDBMS for big projects are you just as well saying don't use django ORM for big projects? Because the way the caveat is worded, it talks about django per se, not djangoitems. (And no, I am not inviting a debate about nonrel). Or should I just not use djangoitems and follow Chris' advice on SO "I ended up not using DjangoItem at all which solved all my problems"? As much clarity, detail, and yes, caveats as you can enlighten me with would be GREATLY appreciated. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
