Re: Where can I find a proper tutorial about scrapy

ivanov Thu, 20 Aug 2015 21:12:07 -0700

Hi Jakob :)

Thanks for your help & for your attention.


Unfortunately, I can hardly grasp the official docs. And I think I have 
already told you from the beginning, that the official docs is killing me 
:D If the docs can be undesrtood easily, I won't be here. haha...

Oya, I found a good alternative to learn "scrapy pipeline". I think this 
blog and forum is good for a newbie like me:

http://www.smallsurething.com/web-scraping-article-extraction-and-sentiment-analysis-with-scrapy-goose-and-textblob/

https://stackoverflow.com/questions/29946989/renaming-downloaded-images-in-scrapy-0-24-with-content-from-an-item-field-while

Thanks Guys :)


*NB:* SCRAPY SHOULD HIRE SOMEBODY TO RE-WRITE IT'S DOCS. IT'S VERY 
FRUSTATING TO READ IT.



On Monday, August 17, 2015 at 7:14:41 PM UTC+7, Jakob de Maeyer wrote:
>
> Hey Ivanov, 
>
> now I'm unsure whether you received my private mail from the 11th, so 
> here it is again: 
>
> Hey Ivanov, 
>
> I can point you in the right direction, but really, it's all there in 
> the docs 
>
> Pipelines are a really easy concept: Every Item that is scraped (i.e. 
> yielded or returned) by the Spider is given to the process_item() method 
> of all pipelines. This method can then inspect and modify the item and 
> must do one of two things: 
> - if it returns the Item, it will be processed by the next pipeline, or 
>   if there is no further pipeline, go to the feed exports (see 
>
> http://doc.scrapy.org/en/latest/intro/tutorial.html#storing-the-scraped-data) 
>
> - if it raises scrapy.exceptions.DropItem, this particular item will 
>   stop being processed, end of story. You can use this if you want to 
>   filter your items for certain characteristics. 
>
> There are a couple of extra methods you *can* implement if you want, 
> e.g. to open/close files or database connections, but literally all that 
> a pipeline *must* do is have a process_item() method. All methods, their 
> signatures, and their use cases are explained here: 
>
> http://doc.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline
>  
>
> The most common use case for pipelines is to write scraped data to a 
> database. The docs have an example for MongoDB: 
>
> http://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb
>  
>
> You can have multiple pipelines, and the items will be processed in the 
> order you set in your ITEM_PIPELINES setting (which you set in your 
> settings.py file), as explained here: 
>
> http://doc.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component
>  
>
> Whether you need item pipelines at all really depends on what you want 
> to do. 
>
>
> Cheers, 
> -Jakob 
>
>
> On 08/17/2015 01:35 PM, ivanov wrote: 
> > Can  anyone teach me to use pipeline properly? Or maybe you can tell me 
> > a tutorial blog about pipeline. 
> > 
> > Please don't recommend the official docs. 
> > 
> > -- 
> > You received this message because you are subscribed to a topic in the 
> > Google Groups "scrapy-users" group. 
> > To unsubscribe from this topic, visit 
> > https://groups.google.com/d/topic/scrapy-users/ttaAatl0LCg/unsubscribe. 
> > To unsubscribe from this group and all its topics, send an email to 
> > scrapy-users...@googlegroups.com <javascript:> 
> > <mailto:scrapy-users+unsubscr...@googlegroups.com <javascript:>>. 
> > To post to this group, send email to scrapy...@googlegroups.com 
> <javascript:> 
> > <mailto:scrapy...@googlegroups.com <javascript:>>. 
> > Visit this group at http://groups.google.com/group/scrapy-users. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Where can I find a proper tutorial about scrapy

Reply via email to