So for now i took a website for testing purpose and to help me to learn basics of scrapy.
website is "http://www.allosociete.ch/telephone-horaires-metier/Pressing" i would to get in csv the following output: counter: a simple increment number page_id: page number url: url the display the company details company name: once on the URL, to collect company name for that on > http://www.allosociete.ch/telephone-horaires-metier/Pressing which is in page page 1, i was able to collect data as following: import scrapy class AlloSociete(scrapy.Spider): name = 'allosocietepressing' start_urls = ['http://www.allosociete.ch/telephone-horaires-metier/Pressing'] counter = 1 pagenum = 1 def parse(self, response): for href in response.css('div.lien-ville ul li a::attr("href")'): full_url = response.urljoin(href.extract()) yield scrapy.Request(full_url, self.parse_lien) def parse_lien(self, response): yield { 'count' : self.counter, 'page' : self.pagenum, 'lien' : response.url } self.counter = self.counter + 1 for now i was not able to have a clear understanding how to code the pagination catch and to replace self.pagenum by the pagination id. this section has only 3 pages. thanks to help me to understand how scrapy works as it seems to be very promising for collecting real time data. On Monday, August 15, 2016 at 1:14:45 AM UTC+2, WANG Ruoxi wrote: > > Hi Raf, > > Not sure that I understand your question well, you can always use a regex > in the LinkExtractor to retrieve all the pagination links that you need. > Something like > > "telephone-horaires-metier\/Restaurant\?p=[0-9]+$" can match the links, if > your last number is always a positive integer. > > Regards, > > > > On Sunday, August 14, 2016 at 11:11:40 PM UTC+8, Raf Roger wrote: >> >> Hi, >> >> i'm new to scrapy and i'm looking for a way to retrieve all links (with >> class: ul li a). >> on each page, there is pagination and first page url is like: >> telephone-horaires-metier/Restaurant >> >> page 2 url is: >> telephone-horaires-metier/Restaurant?p=2 >> >> page 3 url is: >> telephone-horaires-metier/Restaurant?p=3 >> >> etc... >> >> the "next" url is always the current page +1 so if i'm page 2 "next" url >> is telephone-horaires-metier/Restaurant?p=3 >> >> How can i do to collect all links on each page ? >> >> thx >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.