Re: xpath and specific sign

Rolando Espinoza La Fuente Tue, 28 Jan 2014 13:58:41 -0800

You can use the euro symbol in your regex. Scrapy under the hood uses the
flag re.UNICODE with allows you to do that. See:


In [33]: text = u"<span>12,76 €</span>"

In [34]: sel = Selector(text=text)

In [35]: sel.xpath('//span/text()').re(u'(\d+,\d+) €')
Out[35]: [u'12,76']



On Tue, Jan 28, 2014 at 5:44 PM, d4v1d <[email protected]> wrote:

> hello
> yes, your are right my explanations are not clear
> my objectif is to find on a web page the price, i supposed that the price
> is construct like this : 12,76 €
> i have the different urls in a database, so i test each url and search the
> price with a specific regex but it didn't accept symbol €
> Maybe i have to specify that the item['price'] is in utf8 but i don't know
> how ?
>
>     def parse(self, response):
>         hxs = HtmlXPathSelector(response)
>         item = DmozItem()
>         item['price'] = hxs.select('//span/text()').re(
> '([0-9]+(?:[,.][0-9])?)\s')
>
>         cur = self.db.cursor()
>         cur.execute("select url from urls")
>         for j in range(len(item['price'])):
>             cursor = self.db.cursor()
>             sql = "update urls set price_%s = '%s' where url = '%s'" % (j,item
> ['price'][j], response.url)
>             cursor.execute(sql)
>             self.db.commit()
>         return item
>
>
> I hope it's more clear
> thanks in advance
> regards
>
>
> Le mardi 28 janvier 2014 12:15:36 UTC+1, Mikołaj Roszkowski a écrit :
>>
>> It's hard to say without seeing the page's source code. The usual method
>> to this task is to crawl the necessery nodes with xpath and then process
>> those scraped items in the item pipeline to extract the values.
>> http://doc.scrapy.org/en/latest/topics/item-pipeline.html
>>
>>
>> 2014-01-28 David LANGLADE <[email protected]>
>>
>>>  Hello
>>> Thanks for your feedback
>>> Not really, i want to crawl all the page for find specific symbols +
>>> numeric sequence (for example 15.23€) and return this value
>>> Regards
>>>
>>>
>>>
>>>
>>> 2014-01-27 Mikołaj Roszkowski <[email protected]>
>>>
>>>> You want to check the whole page's html content and then grab values
>>>> with numbers?
>>>>
>>>>
>>>> 2014-01-27 d4v1d <[email protected]>
>>>>
>>>>> is something like this is in the right direction ?
>>>>>
>>>>> item['price'] = hxs.select('/html').re('[0-9]&#128;')
>>>>>
>>>>>
>>>>>
>>>>> Le dimanche 26 janvier 2014 22:35:16 UTC+1, d4v1d a écrit :
>>>>>
>>>>>> Hello
>>>>>> Is it possible to search in an url a specific text without having to
>>>>>> specify a tag
>>>>>> Example, i would like to search all the texts 0 to 9 and with .
>>>>>> before and after the sign $
>>>>>> It is probably possible with a regex but i don't know how use this
>>>>>> type of tools on scrapy
>>>>>> Thanks for you help
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> To post to this group, send email to [email protected].
>>>>>
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "scrapy-users" group.
>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>> topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
>>>>  To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>>
>>>> To post to this group, send email to [email protected].
>>>>
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>>
>>> --
>>> David LANGLADE
>>> 5 rue du patuel
>>> 42800 Saint martin la plaine
>>> Tel : 06.49.42.38.85
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: xpath and specific sign

Reply via email to