You could declare the encoding of the Python script containing this "€"
character,
with for example
#!/usr/bin/env python
# -*- coding: utf-8 -*-
at the top (adapt to the encoding used by your code editor)
or safer, but less readable, is the use the unicode Python representation
or the "€" character
>>> text = u"<span>12,76 €</span>"
>>> [text]
[u'<span>12,76 \u20ac</span>']
so the regex becomes
sel.xpath('//span/text()').re(u'(\d+,\d+) \u20ac')
/Paul.
On Wednesday, January 29, 2014 12:26:21 PM UTC+1, d4v1d wrote:
>
> Thanks for your help
> I just have un problem with the encoding :
>
> Syntax-Error : Non-ASCII character '\x80' in file...
> but no encoding declared; see http://www.python.org/peps/pep-0263.html
>
> How can implement this encoding in scrapy?
> Regards
>
>
>
> 2014-01-28 Rolando Espinoza La Fuente <[email protected] <javascript:>>
>
>> You can use the euro symbol in your regex. Scrapy under the hood uses the
>> flag re.UNICODE with allows you to do that. See:
>>
>> In [33]: text = u"<span>12,76 €</span>"
>>
>> In [34]: sel = Selector(text=text)
>>
>> In [35]: sel.xpath('//span/text()').re(u'(\d+,\d+) €')
>> Out[35]: [u'12,76']
>>
>>
>>
>> On Tue, Jan 28, 2014 at 5:44 PM, d4v1d <[email protected]
>> <javascript:>>wrote:
>>
>>> hello
>>> yes, your are right my explanations are not clear
>>> my objectif is to find on a web page the price, i supposed that the
>>> price is construct like this : 12,76 €
>>> i have the different urls in a database, so i test each url and search
>>> the price with a specific regex but it didn't accept symbol €
>>> Maybe i have to specify that the item['price'] is in utf8 but i don't
>>> know how ?
>>>
>>> def parse(self, response):
>>> hxs = HtmlXPathSelector(response)
>>> item = DmozItem()
>>> item['price'] = hxs.select('//span/text()').re(
>>> '([0-9]+(?:[,.][0-9])?)\s')
>>>
>>> cur = self.db.cursor()
>>> cur.execute("select url from urls")
>>> for j in range(len(item['price'])):
>>> cursor = self.db.cursor()
>>> sql = "update urls set price_%s = '%s' where url = '%s'" % (
>>> j, item['price'][j], response.url)
>>> cursor.execute(sql)
>>> self.db.commit()
>>> return item
>>>
>>>
>>> I hope it's more clear
>>> thanks in advance
>>> regards
>>>
>>>
>>> Le mardi 28 janvier 2014 12:15:36 UTC+1, Mikołaj Roszkowski a écrit :
>>>>
>>>> It's hard to say without seeing the page's source code. The usual
>>>> method to this task is to crawl the necessery nodes with xpath and then
>>>> process those scraped items in the item pipeline to extract the values.
>>>> http://doc.scrapy.org/en/latest/topics/item-pipeline.html
>>>>
>>>>
>>>> 2014-01-28 David LANGLADE <[email protected]>
>>>>
>>>>> Hello
>>>>> Thanks for your feedback
>>>>> Not really, i want to crawl all the page for find specific symbols +
>>>>> numeric sequence (for example 15.23€) and return this value
>>>>> Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-01-27 Mikołaj Roszkowski <[email protected]>
>>>>>
>>>>>> You want to check the whole page's html content and then grab values
>>>>>> with numbers?
>>>>>>
>>>>>>
>>>>>> 2014-01-27 d4v1d <[email protected]>
>>>>>>
>>>>>>> is something like this is in the right direction ?
>>>>>>>
>>>>>>> item['price'] = hxs.select('/html').re('[0-9]€')
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Le dimanche 26 janvier 2014 22:35:16 UTC+1, d4v1d a écrit :
>>>>>>>
>>>>>>>> Hello
>>>>>>>> Is it possible to search in an url a specific text without having
>>>>>>>> to specify a tag
>>>>>>>> Example, i would like to search all the texts 0 to 9 and with .
>>>>>>>> before and after the sign $
>>>>>>>> It is probably possible with a regex but i don't know how use this
>>>>>>>> type of tools on scrapy
>>>>>>>> Thanks for you help
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>>> To post to this group, send email to [email protected].
>>>>>>>
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "scrapy-users" group.
>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>> topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>>
>>>>>> To post to this group, send email to [email protected].
>>>>>>
>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> David LANGLADE
>>>>> 5 rue du patuel
>>>>> 42800 Saint martin la plaine
>>>>> Tel : 06.49.42.38.85
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "scrapy-users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>>
>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected] <javascript:>.
>>>
>>> To post to this group, send email to [email protected]<javascript:>
>>> .
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "scrapy-users" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> --
> David LANGLADE
> 5 rue du patuel
> 42800 Saint martin la plaine
> Tel : 06.49.42.38.85
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.