Re: [HACKERS] AutoVacuum starvation from sinval messages
On 8 November 2012 23:58, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Nov 8, 2012 at 2:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 11/08/2012 11:40 PM, Simon Riggs wrote: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? We see lots of traffic on the mail list about people trying to dump several hundred thousand tables, or they can only create one database every two minutes, or truncating hundreds of tables at a time over and over again gets slow, etc. I know little about the internal of the invalidation code, but I would think doing that kind of thing must generate a lot of them. OK, so the problem is *any* sinval. I thought you meant one sinval per object per minute, which seemed much less likely. I agree one sinval per minute for long periods is actually quite likely. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On Thu, Nov 8, 2012 at 6:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? It doesn't take a whole lot of DDL to provoke an sinval overrun, if the recipient process is just sitting idle and not servicing the messages. I think Jeff's concern is entirely valid. So, do we need a sinval overrun or just a sinval message to provoke starvation? The former would be bad but the latter would be really, really bad. IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. OTOH, creating 1 temp table a minute would hit a much broader swath of users. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On 9 November 2012 13:42, Robert Haas robertmh...@gmail.com wrote: On Thu, Nov 8, 2012 at 6:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? It doesn't take a whole lot of DDL to provoke an sinval overrun, if the recipient process is just sitting idle and not servicing the messages. I think Jeff's concern is entirely valid. So, do we need a sinval overrun or just a sinval message to provoke starvation? The former would be bad but the latter would be really, really bad. IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. OTOH, creating 1 temp table a minute would hit a much broader swath of users. The point is moot because latches don't work that way anymore. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On Fri, Nov 9, 2012 at 9:02 AM, Simon Riggs si...@2ndquadrant.com wrote: So, do we need a sinval overrun or just a sinval message to provoke starvation? The former would be bad but the latter would be really, really bad. IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. OTOH, creating 1 temp table a minute would hit a much broader swath of users. The point is moot because latches don't work that way anymore. One of us is confused, because IIUC Tom just fixed this this morning, and I'm trying to figure out how many users will be affected by it, and how seriously. Like, do we need an immediate minor release? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On 9 November 2012 14:16, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 9, 2012 at 9:02 AM, Simon Riggs si...@2ndquadrant.com wrote: So, do we need a sinval overrun or just a sinval message to provoke starvation? The former would be bad but the latter would be really, really bad. IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. OTOH, creating 1 temp table a minute would hit a much broader swath of users. The point is moot because latches don't work that way anymore. One of us is confused, because IIUC Tom just fixed this this morning, and I'm trying to figure out how many users will be affected by it, and how seriously. Like, do we need an immediate minor release? You asked what provokes starvation, and the answer is nothing anymore, since Tom's commit. No confusion here... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Simon Riggs wrote: Robert Haas robertmh...@gmail.com wrote: One of us is confused, because IIUC Tom just fixed this this morning, and I'm trying to figure out how many users will be affected by it, and how seriously. Like, do we need an immediate minor release? You asked what provokes starvation, and the answer is nothing anymore, since Tom's commit. No confusion here... So for all those end users running production with a build compiled from HEAD after Tom's commit this morning there is no issue. I'm wondering about those using something else in production. What does it take for their autovacuum to be stalled? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Robert Haas robertmh...@gmail.com writes: So, do we need a sinval overrun or just a sinval message to provoke starvation? The former would be bad but the latter would be really, really bad. The former. Actually, a signal will be sent as soon as a backend is determined to be unreasonably far behind, which I think is when the message queue is half full. In the AV launcher case, the queue probably never actually overflows; but the launcher has to wake up to process messages every so often, and that wakeup cycle is breaking the timeout management in WaitLatch. IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. Well, one DDL typically generates multiple messages --- one for each catalog row added/modified/removed, roughly speaking. When I run the constant create/drop example Jeff posted, I see the AV launcher getting a catchup signal every few seconds. I didn't try to determine exactly how many create/drop cycles that was, but I'm pretty sure it's a lot less than 1000. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Tom Lane escribió: Robert Haas robertmh...@gmail.com writes: IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. Well, one DDL typically generates multiple messages --- one for each catalog row added/modified/removed, roughly speaking. When I run the constant create/drop example Jeff posted, I see the AV launcher getting a catchup signal every few seconds. I didn't try to determine exactly how many create/drop cycles that was, but I'm pretty sure it's a lot less than 1000. Just creating the sequence for the serial column means 16 pg_attribute tuples. There's also two pg_class entries, one more pg_attribute, two pg_type entries, a bunch of pg_depend entries ... I doubt it's less than 30 catalog tuples, all things considered. Double that for the drop. So for a 4k entry table that needs to get 50% full, that's only ~35 temp table creations like that. I hadn't realized sequences used so many pg_attribute entries. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On Fri, Nov 9, 2012 at 10:08 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane escribió: Robert Haas robertmh...@gmail.com writes: IIRC the queue has 4K entries, and IIRC a single DDL operation might provoke a couple of sinvals, but I'm thinking that somebody would probably have to be creating 1024 temp tables a minute to overrun the queue, which is very possible but not necessarily common. Well, one DDL typically generates multiple messages --- one for each catalog row added/modified/removed, roughly speaking. When I run the constant create/drop example Jeff posted, I see the AV launcher getting a catchup signal every few seconds. I didn't try to determine exactly how many create/drop cycles that was, but I'm pretty sure it's a lot less than 1000. Just creating the sequence for the serial column means 16 pg_attribute tuples. There's also two pg_class entries, one more pg_attribute, two pg_type entries, a bunch of pg_depend entries ... I doubt it's less than 30 catalog tuples, all things considered. Double that for the drop. So for a 4k entry table that needs to get 50% full, that's only ~35 temp table creations like that. I hadn't realized sequences used so many pg_attribute entries. Hmm. So, are we going to force a minor release for this, or do we think it's not serious enough to warrant that? I'm not expressing an opinion either way, just asking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AutoVacuum starvation from sinval messages
While doing a rather contrived test of something else, using: perl -le 'print create table foo (x serial); drop table foo; foreach (1..1e6)'|psql I noticed starvation of autovacuum launcher process and bloating of catalogs. Basically the WaitLatch in autovacuum.c line 602 (in head) never returns. This was introduced by a180776f7a1c4554f214b, Teach unix_latch.c to use poll() where available When the poll() gets EINTR by SIGUSR1 for the invalidation, it restarts at the full timeout (60 seconds in this case) with no accounting for the time already waited. It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Should gettimeofday be called before and after the poll() and then the difference deducted from timeout? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Jeff Janes jeff.ja...@gmail.com writes: Basically the WaitLatch in autovacuum.c line 602 (in head) never returns. This was introduced by a180776f7a1c4554f214b, Teach unix_latch.c to use poll() where available When the poll() gets EINTR by SIGUSR1 for the invalidation, it restarts at the full timeout (60 seconds in this case) with no accounting for the time already waited. Hmm. That seems like a problem, all right, but I don't believe the claim that it applies only to the poll() code path. On many platforms the select() path would have the same issue. It's a bit annoying to have to incur an extra gettimeofday call per WaitLatch operation in case this happens, but there doesn't seem to be a lot of choice. I guess we can arrange to incur the extra call only when a timeout is requested --- but IIRC that's going to be the majority of usages anyway ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Jeff Janes jeff.ja...@gmail.com writes: On Thu, Nov 8, 2012 at 12:36 PM, Jeff Janes jeff.ja...@gmail.com wrote: Should gettimeofday be called before and after the poll() and then the difference deducted from timeout? Something like this? Meh. Not like that, because (1) we shouldn't add overhead when no timeout is requested. (2) it'd be better to do only one gettimeofday at the start and then one in the EINTR path, not one per loop; this saves syscalls and also avoids slippage of the timeout time. (3) the select() path needs a similar fix. But I think you have a good idea to use the INSTR_TIME macros instead of custom code, even though this is Unix-only so there's not more than one underlying implementation. Will fix this up and commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
I wrote: Jeff Janes jeff.ja...@gmail.com writes: When the poll() gets EINTR by SIGUSR1 for the invalidation, it restarts at the full timeout (60 seconds in this case) with no accounting for the time already waited. Hmm. That seems like a problem, all right, but I don't believe the claim that it applies only to the poll() code path. On many platforms the select() path would have the same issue. BTW, doesn't win32_latch.c have the identical problem? I don't see it updating the timeout after a signal, either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On 2012-11-08 22:40:43 +, Simon Riggs wrote: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? Temporary tables might a good contender. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On 11/08/2012 11:40 PM, Simon Riggs wrote: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? Could heavy use of temporary tables cause this ? Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
Simon Riggs si...@2ndquadrant.com writes: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? It doesn't take a whole lot of DDL to provoke an sinval overrun, if the recipient process is just sitting idle and not servicing the messages. I think Jeff's concern is entirely valid. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
On Thu, Nov 8, 2012 at 2:50 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 11/08/2012 11:40 PM, Simon Riggs wrote: On 8 November 2012 20:36, Jeff Janes jeff.ja...@gmail.com wrote: It does not seem outrageous to me that there would be real-world conditions in which invalidations would be sent more than once a minute over prolonged periods, so this total starvation seems like a bug. Yes, its a bug, but do you really believe the above? In what cases? We see lots of traffic on the mail list about people trying to dump several hundred thousand tables, or they can only create one database every two minutes, or truncating hundreds of tables at a time over and over again gets slow, etc. I know little about the internal of the invalidation code, but I would think doing that kind of thing must generate a lot of them. Could heavy use of temporary tables cause this ? It looks like they do. I'm not sure what is being invalidated in those cases, but something seems to be. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AutoVacuum starvation from sinval messages
I wrote: Hmm. That seems like a problem, all right, but I don't believe the claim that it applies only to the poll() code path. On many platforms the select() path would have the same issue. BTW, doesn't win32_latch.c have the identical problem? I don't see it updating the timeout after a signal, either. I've committed a patch for this, but the win32_latch version is entirely innocent of any testing. The buildfarm will probably show any mistake there, but if someone would like to try Jeff's test case on Windows just to make sure the autovac launcher wakes up on time, it wouldn't hurt. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum starvation
Jim Nasby wrote: On May 2, 2007, at 5:39 PM, Alvaro Herrera wrote: The recently discovered autovacuum bug made me notice something that is possibly critical. The current autovacuum code makes an effort not to leave workers in a starting state for too long, lest there be failure to timely tend all databases needing vacuum. Isn't there some way to get the postmaster to signal the launcher? Perhaps stick an error code in shared memory and send it a signal? We could have the postmaster signal the launcher, but the signal cannot contain much useful info because the postmaster does generally not want to write in shared memory. Perhaps we could have the postmaster send a SIGUSR2 signal, which would mean couldn't start the worker without any other info. Anything else the launcher needs can be deduced from shmem state anyway. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] autovacuum starvation
Alvaro Herrera [EMAIL PROTECTED] writes: We could have the postmaster signal the launcher, but the signal cannot contain much useful info because the postmaster does generally not want to write in shared memory. The postmaster already touches/writes in shared memory in pmsignal.c. The trick here is that whatever it does should be sufficiently constrained that arbitrarily-bad corruption of shared memory can't crash or freeze the postmaster. If you can meet that restriction, feel free to introduce some more signaling knowledge. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum starvation
On May 2, 2007, at 5:39 PM, Alvaro Herrera wrote: The recently discovered autovacuum bug made me notice something that is possibly critical. The current autovacuum code makes an effort not to leave workers in a starting state for too long, lest there be failure to timely tend all databases needing vacuum. This is how the launching of workers works: 1) the launcher puts a pointer to a WorkerInfo entry in shared memory, called the starting worker pointer 2) the launcher sends a signal to the postmaster 3) the postmaster forks a worker 4) the new worker checks the starting worker pointer 5) the new worker resets the starting worker pointer 6) the new worker connects to the given database and vacuums it The problem is this: I originally added some code in the autovacuum launcher to check that a worker does not take too long to start. This is autovacuum_naptime seconds. If this happens, the launcher resets the starting worker pointer, which means that the newly starting worker will not see anything that needs to be done and exit quickly. The problem with this is that on a high load machine, for example lionfish during buildfarm runs, this would cause autovacuum starvation for the period in which the high load is sustained. This could prove dangerous. The problem is that things like fork() failure cannot be communicated back to the launcher. So when the postmaster tries to start a process and it fails for some reason (failure to fork, or out of memory) we need a way to re-initiate the worker that failed. The current code resets the starting worker pointer, and leave the slot free for another worker, maybe in another database, to start. I recently added code to resend the postmaster signal when the launcher sees the starting worker pointer not invalid -- step 2 above. I think this is fine, but 1) we should remove the logic to remove the starting worker pointer. It is not needed, because database-local failures will be handled by subsequent checks 2) we should leave the logic to resend the postmaster, but we should make an effort to avoid sending it too frequently Opinions? If I haven't stated the problem clearly please let me know and I'll try to rephrase. Isn't there some way to get the postmaster to signal the launcher? Perhaps stick an error code in shared memory and send it a signal? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] autovacuum starvation
Hi, The recently discovered autovacuum bug made me notice something that is possibly critical. The current autovacuum code makes an effort not to leave workers in a starting state for too long, lest there be failure to timely tend all databases needing vacuum. This is how the launching of workers works: 1) the launcher puts a pointer to a WorkerInfo entry in shared memory, called the starting worker pointer 2) the launcher sends a signal to the postmaster 3) the postmaster forks a worker 4) the new worker checks the starting worker pointer 5) the new worker resets the starting worker pointer 6) the new worker connects to the given database and vacuums it The problem is this: I originally added some code in the autovacuum launcher to check that a worker does not take too long to start. This is autovacuum_naptime seconds. If this happens, the launcher resets the starting worker pointer, which means that the newly starting worker will not see anything that needs to be done and exit quickly. The problem with this is that on a high load machine, for example lionfish during buildfarm runs, this would cause autovacuum starvation for the period in which the high load is sustained. This could prove dangerous. The problem is that things like fork() failure cannot be communicated back to the launcher. So when the postmaster tries to start a process and it fails for some reason (failure to fork, or out of memory) we need a way to re-initiate the worker that failed. The current code resets the starting worker pointer, and leave the slot free for another worker, maybe in another database, to start. I recently added code to resend the postmaster signal when the launcher sees the starting worker pointer not invalid -- step 2 above. I think this is fine, but 1) we should remove the logic to remove the starting worker pointer. It is not needed, because database-local failures will be handled by subsequent checks 2) we should leave the logic to resend the postmaster, but we should make an effort to avoid sending it too frequently Opinions? If I haven't stated the problem clearly please let me know and I'll try to rephrase. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match