Hi Jakob :) Thanks for your help & for your attention.
Unfortunately, I can hardly grasp the official docs. And I think I have already told you from the beginning, that the official docs is killing me :D If the docs can be undesrtood easily, I won't be here. haha... Oya, I found a good alternative to learn "scrapy pipeline". I think this blog and forum is good for a newbie like me: http://www.smallsurething.com/web-scraping-article-extraction-and-sentiment-analysis-with-scrapy-goose-and-textblob/ https://stackoverflow.com/questions/29946989/renaming-downloaded-images-in-scrapy-0-24-with-content-from-an-item-field-while Thanks Guys :) *NB:* SCRAPY SHOULD HIRE SOMEBODY TO RE-WRITE IT'S DOCS. IT'S VERY FRUSTATING TO READ IT. On Monday, August 17, 2015 at 7:14:41 PM UTC+7, Jakob de Maeyer wrote: > > Hey Ivanov, > > now I'm unsure whether you received my private mail from the 11th, so > here it is again: > > Hey Ivanov, > > I can point you in the right direction, but really, it's all there in > the docs > > Pipelines are a really easy concept: Every Item that is scraped (i.e. > yielded or returned) by the Spider is given to the process_item() method > of all pipelines. This method can then inspect and modify the item and > must do one of two things: > - if it returns the Item, it will be processed by the next pipeline, or > if there is no further pipeline, go to the feed exports (see > > http://doc.scrapy.org/en/latest/intro/tutorial.html#storing-the-scraped-data) > > - if it raises scrapy.exceptions.DropItem, this particular item will > stop being processed, end of story. You can use this if you want to > filter your items for certain characteristics. > > There are a couple of extra methods you *can* implement if you want, > e.g. to open/close files or database connections, but literally all that > a pipeline *must* do is have a process_item() method. All methods, their > signatures, and their use cases are explained here: > > http://doc.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline > > > The most common use case for pipelines is to write scraped data to a > database. The docs have an example for MongoDB: > > http://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb > > > You can have multiple pipelines, and the items will be processed in the > order you set in your ITEM_PIPELINES setting (which you set in your > settings.py file), as explained here: > > http://doc.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component > > > Whether you need item pipelines at all really depends on what you want > to do. > > > Cheers, > -Jakob > > > On 08/17/2015 01:35 PM, ivanov wrote: > > Can anyone teach me to use pipeline properly? Or maybe you can tell me > > a tutorial blog about pipeline. > > > > Please don't recommend the official docs. > > > > -- > > You received this message because you are subscribed to a topic in the > > Google Groups "scrapy-users" group. > > To unsubscribe from this topic, visit > > https://groups.google.com/d/topic/scrapy-users/ttaAatl0LCg/unsubscribe. > > To unsubscribe from this group and all its topics, send an email to > > scrapy-users...@googlegroups.com <javascript:> > > <mailto:scrapy-users+unsubscr...@googlegroups.com <javascript:>>. > > To post to this group, send email to scrapy...@googlegroups.com > <javascript:> > > <mailto:scrapy...@googlegroups.com <javascript:>>. > > Visit this group at http://groups.google.com/group/scrapy-users. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.