Thanks again Jan,
 
Your comments are very useful. I think perhaps what is happening is that 
when the SELECT 1 command is run more than just the player name is being 
added i.e. in its raw stored format of 
[u'Avins, K']

Which is why I had the [0] after item:
conn.execute("""SELECT EXISTS(
                SELECT 1 FROM tblHandicaps WHERE Player = %s)""", 
(item['name'][0]))


Perhaps this is not the right option to use here?

On Wednesday, December 18, 2013 10:18:10 AM UTC, Jan Wrobel wrote:
>
> I tested the EXISTS query and it also works correctly for me. I was 
> suspecting that maybe conn.fetchone()[0] is returning a string "0" 
> instead of 0, which would break the if, but it isn't the case. 
>
> There is a potential race condition if your spider reused Item object, 
> or if the item was modified by subsequent pipelines that are executed 
> after MySQLStorePipeline. Such race condition could cause item['name'] 
> to change between an execution of EXISTS and INSERT (because 
> _do_upsert runs in a separate thread), but it should rather manifest 
> itself rarely. If updates are never executed, I suspect something else 
> is causing it. 
>
> I'm still concern if the DB lib can match datetime.now() result to an 
> Updated column without appropriate '=%' format directive, but such 
> problem would manifest itself differently. 
>
> I think a way to debug this could be to add more debug logs to see 
> what 'name' is used in the EXISTS query, what is a result of the query 
> (to confirm that this is indeed an int, not a string) and what 'name' 
> is later inserted in the 'INSERT' query. 
>
> Cheer, 
> Jan 
>
> On Wed, Dec 18, 2013 at 10:00 AM, Shaun Marchant 
> <[email protected] <javascript:>> wrote: 
> > Hi Jan, 
> > 
> > Thanks, there's not supposed to be a %s there as I'm using the 
> > datetime.now() for that field. The problem lies before the Update 
> statement 
> > though as its never getting triggered. 
> > 
> > The pipeline starts 
> > Connects to the database and checks to see if a record matching Player 
> > already exists 
> > If a matching Player record exists then the UPDATE command should be 
> > executed, if not then INSERT a new record. 
> > 
> > It is point 3. where either there is a slight variance in the 
> > Player/item['name'] match which is returning no records causing INSERT 
> to 
> > happen or my IF statement logic is not correct. 
> > I have manually tested the SELECT EXISTS statement and it works so 
> unless 
> > Scrapy is altering it somehow I think the problem is with the IF 
> statement. 
> > 
> >     def process_item(self, item, spider): 
> >         d = self.dbpool.runInteraction(self._do_upsert, item, spider) 
> >         d.addErrback(self._handle_error, item, spider) 
> >         d.addBoth(lambda _: item) 
> > 
> >         return d 
> > 
> >     def _do_upsert(self, conn, item, spider): 
> > 
> >         conn.execute("""SELECT EXISTS( 
> >                 SELECT 1 FROM tblHandicaps WHERE Player = %s)""", 
> > (item['name'][0])) 
> >         database.commit() 
> >         ret = conn.fetchone()[0] 
> > 
> >         if ret: 
> >             conn.execute(""" 
> >                 UPDATE tblHandicaps....... 
> > 
> > 
> > 
> > 
> > 
> > On Wednesday, December 18, 2013 8:45:48 AM UTC, Jan Wrobel wrote: 
> >> 
> >> Isn't '=%s' missing after 'Updated' in the UPDATE query? 
> >> 
> >> Jan 
> >> 
> >> On Wed, Dec 18, 2013 at 9:29 AM, Shaun Marchant 
> >> <[email protected]> wrote: 
> >> > Hi Arvin, 
> >> > 
> >> > Thanks for your suggestion, I have added in the database commit 
> >> > statements 
> >> > but it makes no noticeable difference. The pipeline does write to the 
> >> > database OK as if I delete all records it will add them in no 
> problem, 
> >> > its 
> >> > purely the UPDATE routine that is not triggering but I cannot see 
> why. 
> >> > 
> >> > 
> >> > Cheers. 
> >> > 
> >> > Shaun 
> >> > 
> >> > 
> >> > On Wednesday, December 18, 2013 12:55:54 AM UTC, Arvin Cao wrote: 
> >> >> 
> >> >> Hi 
> >> >>    Maybe you did not commit your database. after "insert" or 
> "update" 
> >> >> your 
> >> >> mysql with Python, you should use database.commit() to commit your 
> >> >> database. 
> >> >> 
> >> >> 
> >> >> 2013/12/18 Shaun Marchant <[email protected]> 
> >> >>> 
> >> >>> Hi Group, 
> >> >>> 
> >> >>> I've managed to get a pipeline working that is INSERTING my scraped 
> >> >>> data 
> >> >>> into a MySQL database but I am unable to update records as it 
> appears 
> >> >>> the 
> >> >>> update logic is not working correctly... any suggestions as to why 
> its 
> >> >>> not 
> >> >>> triggering? I have tested the SQL statement etc directly which is 
> >> >>> working 
> >> >>> fine. 
> >> >>> 
> >> >>> Pipeline.py: 
> >> >>> import sys 
> >> >>> import MySQLdb 
> >> >>> import hashlib 
> >> >>> from datetime import datetime 
> >> >>> from scrapy.exceptions import DropItem 
> >> >>> from scrapy.http import Request 
> >> >>> 
> >> >>> class EmptyItemPipeline(object): 
> >> >>> 
> >> >>>     def process_item(self, item, spider): 
> >> >>>         if item['handicap']: 
> >> >>>             return item 
> >> >>>         else: 
> >> >>>             item['handicap'] = '99' 
> >> >>>             return item 
> >> >>> 
> >> >>> class MySQLStorePipeline(object): 
> >> >>> 
> >> >>>     def __init__(self, dbpool): 
> >> >>>         self.dbpool = dbpool 
> >> >>> 
> >> >>>     @classmethod 
> >> >>>     def from_settings(cls, settings): 
> >> >>>         dbargs = dict( 
> >> >>>             host=settings['localhost'], 
> >> >>>             db=settings['database'], 
> >> >>>             user=settings['username'], 
> >> >>>             passwd=settings['password'], 
> >> >>>             charset='utf8', 
> >> >>>             use_unicode=True, 
> >> >>>         ) 
> >> >>>         dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs) 
> >> >>>         return cls(dbpool) 
> >> >>> 
> >> >>>     def process_item(self, item, spider): 
> >> >>>         # run db query in the thread pool 
> >> >>>         d = self.dbpool.runInteraction(self._do_upsert, item, 
> spider) 
> >> >>>         d.addErrback(self._handle_error, item, spider) 
> >> >>>         # at the end return the item in case of success or failure 
> >> >>>         d.addBoth(lambda _: item) 
> >> >>>         # return the deferred instead the item. This makes the 
> engine 
> >> >>> to 
> >> >>>         # process next item (according to CONCURRENT_ITEMS setting) 
> >> >>> after 
> >> >>> this 
> >> >>>         # operation (deferred) has finished. 
> >> >>>         return d 
> >> >>> 
> >> >>>     def _do_upsert(self, conn, item, spider): 
> >> >>> 
> >> >>>         conn.execute("""SELECT EXISTS( 
> >> >>>                 SELECT 1 FROM tblHandicaps WHERE Player = %s)""", 
> >> >>> (item['name'][0])) 
> >> >>>         ret = conn.fetchone()[0] 
> >> >>> 
> >> >>>         if ret: 
> >> >>>             conn.execute(""" 
> >> >>>                 UPDATE tblHandicaps 
> >> >>>                 SET Player=%s, Handicap=%s, Exact=%s, Category=%s, 
> >> >>> Updated 
> >> >>>                 WHERE Player=%s 
> >> >>>             """, (item['name'][0], item['handicap'][0], 
> >> >>> item['exact'][0], 
> >> >>> item['category'][0], datetime.now(), item['name'][0])) 
> >> >>>             spider.log("Item updated in db: %s %r" % 
> (item['name'][0], 
> >> >>> item)) 
> >> >>>         else: 
> >> >>>             conn.execute("""INSERT INTO tblHandicaps (Player, 
> >> >>> Handicap, 
> >> >>> Exact, Category, Sex, Updated) VALUES (%s, %s, %s, %s, 'M', %s)""", 
> >> >>>                 (item['name'][0], 
> >> >>>                  item['handicap'][0], 
> >> >>>                  item['exact'][0], 
> >> >>>                  item['category'][0], 
> >> >>>                  datetime.now())) 
> >> >>> 
> >> >>>             spider.log("Item stored in db: %s %r" % 
> (item['name'][0], 
> >> >>> item)) 
> >> >>> 
> >> >>>     def _handle_error(self, failure, item, spider): 
> >> >>>         """Handle occurred on db interaction.""" 
> >> >>>         # do nothing, just log 
> >> >>>         log.err(failure) 
> >> >>> 
> >> >>> 
> >> >>> 
> >> >>> I don't see a specific error in the console log other than an SQL 
> >> >>> primary 
> >> >>> key error as its trying to INSERT rather than UPDATE. 
> >> >>> 
> >> >>> Cheers 
> >> >>> 
> >> >>> 
> >> >>> Shaun 
> >> >>> 
> >> >>> -- 
> >> >>> You received this message because you are subscribed to the Google 
> >> >>> Groups 
> >> >>> "scrapy-users" group. 
> >> >>> To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> >>> an 
> >> >>> email to [email protected]. 
> >> >>> To post to this group, send email to [email protected]. 
> >> >>> 
> >> >>> Visit this group at http://groups.google.com/group/scrapy-users. 
> >> >>> For more options, visit https://groups.google.com/groups/opt_out. 
> >> >> 
> >> >> 
> >> > -- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "scrapy-users" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to [email protected]. 
> >> > To post to this group, send email to [email protected]. 
> >> > Visit this group at http://groups.google.com/group/scrapy-users. 
> >> > For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "scrapy-users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > To post to this group, send email to 
> > [email protected]<javascript:>. 
>
> > Visit this group at http://groups.google.com/group/scrapy-users. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to