Re: Increasing replication factor and repair doesn't seem to work
After thinking about it more, I have no idea how that worked at all. I must have not cleared out the working directory or something Regardless, I did something weird with my initial joining of the cluster and then wasn't using repair -full. Thank y'all very much for the info. On Wed, May 25, 2016 at 3:11 PM Luke Jollywrote: > So I figured out the main cause of the problem. The seed node was > itself. That's what got it in a weird state. The second part was that I > didn't know the default repair is incremental as I was accidently looking > at the wrong version documentation. After running a repair -full, the 3 > other nodes are synced correctly it seems as they have identical loads. > Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others > have 6 GB). Since I now know I started it off in a very weird state, I'm > going to just decommission it and add it back in from scratch. When I > added it, all working folders were cleared. > > I feel Cassandra should through an error if the seed node is set to itself > and fail to bootstrap / join? > > > On Wed, May 25, 2016 at 2:37 AM Mike Yeap wrote: > >> Hi Luke, I've encountered similar problem before, could you please advise >> on following? >> >> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml? >> >> 2) when you add 10.128.0.20, were the data and cache directories in >> 10.128.0.20 empty? >> >>- /var/lib/cassandra/data >>- /var/lib/cassandra/saved_caches >> >> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load" >> column in "nodetool status "? >> >> 4) when you do the full repair, did you use "nodetool repair" or >> "nodetool repair -full"? I'm asking this because Incremental Repair is the >> default for Cassandra 2.2 and later. >> >> >> Regards, >> Mike Yeap >> >> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng >> wrote: >> >>> Hi Luke, >>> >>> I've never found nodetool status' load to be useful beyond a general >>> indicator. >>> >>> You should expect some small skew, as this will depend on your current >>> compaction status, tombstones, etc. IIRC repair will not provide >>> consistency of intermediate states nor will it remove tombstones, it only >>> guarantees consistency in the final state. This means, in the case of >>> dropped hints or mutations, you will see differences in intermediate >>> states, and therefore storage footrpint, even in fully repaired nodes. This >>> includes intermediate UPDATE operations as well. >>> >>> Your one node with sub 1GB sticks out like a sore thumb, though. Where >>> did you originate the nodetool repair from? Remember that repair will only >>> ensure consistency for ranges held by the node you're running it on. While >>> I am not sure if missing ranges are included in this, if you ran nodetool >>> repair only on a machine with partial ownership, you will need to complete >>> repairs across the ring before data will return to full consistency. >>> >>> I would query some older data using consistency = ONE on the affected >>> machine to determine if you are actually missing data. There are a few >>> outstanding bugs in the 2.1.x and older release families that may result >>> in tombstone creation even without deletes, for example CASSANDRA-10547, >>> which impacts updates on collections in pre-2.1.13 Cassandra. >>> >>> You can also try examining the output of nodetool ring, which will give >>> you a breakdown of tokens and their associations within your cluster. >>> >>> --Bryan >>> >>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves >>> wrote: >>> Not necessarily considering RF is 2 so both nodes should have all partitions. Luke, are you sure the repair is succeeding? You don't have other keyspaces/duplicate data/extra data in your cassandra data directory? Also, you could try querying on the node with less data to confirm if it has the same dataset. On 24 May 2016 at 22:03, Bhuvan Rawal wrote: > For the other DC, it can be acceptable because partition reside on one > node, so say if you have a large partition, it may skew things a bit. > On May 25, 2016 2:41 AM, "Luke Jolly" wrote: > >> So I guess the problem may have been with the initial addition of the >> 10.128.0.20 node because when I added it in it never synced data I >> guess? It was at around 50 MB when it first came up and transitioned to >> "UN". After it was in I did the 1->2 replication change and tried repair >> but it didn't fix it. From what I can tell all the data on it is stuff >> that has been written since it came up. We never delete data ever so we >> should have zero tombstones. >> >> If I am not mistaken, only two of my nodes actually have all the >> data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount. >> 10.142.0.13 is
Re: Increasing replication factor and repair doesn't seem to work
So I figured out the main cause of the problem. The seed node was itself. That's what got it in a weird state. The second part was that I didn't know the default repair is incremental as I was accidently looking at the wrong version documentation. After running a repair -full, the 3 other nodes are synced correctly it seems as they have identical loads. Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others have 6 GB). Since I now know I started it off in a very weird state, I'm going to just decommission it and add it back in from scratch. When I added it, all working folders were cleared. I feel Cassandra should through an error if the seed node is set to itself and fail to bootstrap / join? On Wed, May 25, 2016 at 2:37 AM Mike Yeapwrote: > Hi Luke, I've encountered similar problem before, could you please advise > on following? > > 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml? > > 2) when you add 10.128.0.20, were the data and cache directories in > 10.128.0.20 empty? > >- /var/lib/cassandra/data >- /var/lib/cassandra/saved_caches > > 3) if you do a compact in 10.128.0.3, what is the size shown in "Load" > column in "nodetool status "? > > 4) when you do the full repair, did you use "nodetool repair" or "nodetool > repair -full"? I'm asking this because Incremental Repair is the default > for Cassandra 2.2 and later. > > > Regards, > Mike Yeap > > On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng > wrote: > >> Hi Luke, >> >> I've never found nodetool status' load to be useful beyond a general >> indicator. >> >> You should expect some small skew, as this will depend on your current >> compaction status, tombstones, etc. IIRC repair will not provide >> consistency of intermediate states nor will it remove tombstones, it only >> guarantees consistency in the final state. This means, in the case of >> dropped hints or mutations, you will see differences in intermediate >> states, and therefore storage footrpint, even in fully repaired nodes. This >> includes intermediate UPDATE operations as well. >> >> Your one node with sub 1GB sticks out like a sore thumb, though. Where >> did you originate the nodetool repair from? Remember that repair will only >> ensure consistency for ranges held by the node you're running it on. While >> I am not sure if missing ranges are included in this, if you ran nodetool >> repair only on a machine with partial ownership, you will need to complete >> repairs across the ring before data will return to full consistency. >> >> I would query some older data using consistency = ONE on the affected >> machine to determine if you are actually missing data. There are a few >> outstanding bugs in the 2.1.x and older release families that may result >> in tombstone creation even without deletes, for example CASSANDRA-10547, >> which impacts updates on collections in pre-2.1.13 Cassandra. >> >> You can also try examining the output of nodetool ring, which will give >> you a breakdown of tokens and their associations within your cluster. >> >> --Bryan >> >> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves >> wrote: >> >>> Not necessarily considering RF is 2 so both nodes should have all >>> partitions. Luke, are you sure the repair is succeeding? You don't have >>> other keyspaces/duplicate data/extra data in your cassandra data directory? >>> Also, you could try querying on the node with less data to confirm if it >>> has the same dataset. >>> >>> On 24 May 2016 at 22:03, Bhuvan Rawal wrote: >>> For the other DC, it can be acceptable because partition reside on one node, so say if you have a large partition, it may skew things a bit. On May 25, 2016 2:41 AM, "Luke Jolly" wrote: > So I guess the problem may have been with the initial addition of the > 10.128.0.20 node because when I added it in it never synced data I > guess? It was at around 50 MB when it first came up and transitioned to > "UN". After it was in I did the 1->2 replication change and tried repair > but it didn't fix it. From what I can tell all the data on it is stuff > that has been written since it came up. We never delete data ever so we > should have zero tombstones. > > If I am not mistaken, only two of my nodes actually have all the data, > 10.128.0.3 and 10.142.0.14 since they agree on the data amount. > 10.142.0.13 > is almost a GB lower and then of course 10.128.0.20 which is missing > over 5 GB of data. I tried running nodetool -local on both DCs and it > didn't fix either one. > > Am I running into a bug of some kind? > > On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal > wrote: > >> Hi Luke, >> >> You mentioned that replication factor was increased from 1 to 2. In >> that case was the node bearing ip
Re: Increasing replication factor and repair doesn't seem to work
Hi Luke, I've encountered similar problem before, could you please advise on following? 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml? 2) when you add 10.128.0.20, were the data and cache directories in 10.128.0.20 empty? - /var/lib/cassandra/data - /var/lib/cassandra/saved_caches 3) if you do a compact in 10.128.0.3, what is the size shown in "Load" column in "nodetool status "? 4) when you do the full repair, did you use "nodetool repair" or "nodetool repair -full"? I'm asking this because Incremental Repair is the default for Cassandra 2.2 and later. Regards, Mike Yeap On Wed, May 25, 2016 at 8:01 AM, Bryan Chengwrote: > Hi Luke, > > I've never found nodetool status' load to be useful beyond a general > indicator. > > You should expect some small skew, as this will depend on your current > compaction status, tombstones, etc. IIRC repair will not provide > consistency of intermediate states nor will it remove tombstones, it only > guarantees consistency in the final state. This means, in the case of > dropped hints or mutations, you will see differences in intermediate > states, and therefore storage footrpint, even in fully repaired nodes. This > includes intermediate UPDATE operations as well. > > Your one node with sub 1GB sticks out like a sore thumb, though. Where did > you originate the nodetool repair from? Remember that repair will only > ensure consistency for ranges held by the node you're running it on. While > I am not sure if missing ranges are included in this, if you ran nodetool > repair only on a machine with partial ownership, you will need to complete > repairs across the ring before data will return to full consistency. > > I would query some older data using consistency = ONE on the affected > machine to determine if you are actually missing data. There are a few > outstanding bugs in the 2.1.x and older release families that may result > in tombstone creation even without deletes, for example CASSANDRA-10547, > which impacts updates on collections in pre-2.1.13 Cassandra. > > You can also try examining the output of nodetool ring, which will give > you a breakdown of tokens and their associations within your cluster. > > --Bryan > > On Tue, May 24, 2016 at 3:49 PM, kurt Greaves > wrote: > >> Not necessarily considering RF is 2 so both nodes should have all >> partitions. Luke, are you sure the repair is succeeding? You don't have >> other keyspaces/duplicate data/extra data in your cassandra data directory? >> Also, you could try querying on the node with less data to confirm if it >> has the same dataset. >> >> On 24 May 2016 at 22:03, Bhuvan Rawal wrote: >> >>> For the other DC, it can be acceptable because partition reside on one >>> node, so say if you have a large partition, it may skew things a bit. >>> On May 25, 2016 2:41 AM, "Luke Jolly" wrote: >>> So I guess the problem may have been with the initial addition of the 10.128.0.20 node because when I added it in it never synced data I guess? It was at around 50 MB when it first came up and transitioned to "UN". After it was in I did the 1->2 replication change and tried repair but it didn't fix it. From what I can tell all the data on it is stuff that has been written since it came up. We never delete data ever so we should have zero tombstones. If I am not mistaken, only two of my nodes actually have all the data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 is almost a GB lower and then of course 10.128.0.20 which is missing over 5 GB of data. I tried running nodetool -local on both DCs and it didn't fix either one. Am I running into a bug of some kind? On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal wrote: > Hi Luke, > > You mentioned that replication factor was increased from 1 to 2. In > that case was the node bearing ip 10.128.0.20 carried around 3GB data > earlier? > > You can run nodetool repair with option -local to initiate repair > local datacenter for gce-us-central1. > > Also you may suspect that if a lot of data was deleted while the node > was down it may be having a lot of tombstones which is not needed to be > replicated to the other node. In order to verify the same, you can issue a > select count(*) query on column families (With the amount of data you have > it should not be an issue) with tracing on and with consistency local_all > by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a > file. It will give you a fair amount of idea about how many deleted cells > the nodes have. I tried searching for reference if tombstones are moved > around during repair, but I didnt find evidence of it. However I see no > reason to because if the
Re: Increasing replication factor and repair doesn't seem to work
Hi Luke, I've never found nodetool status' load to be useful beyond a general indicator. You should expect some small skew, as this will depend on your current compaction status, tombstones, etc. IIRC repair will not provide consistency of intermediate states nor will it remove tombstones, it only guarantees consistency in the final state. This means, in the case of dropped hints or mutations, you will see differences in intermediate states, and therefore storage footrpint, even in fully repaired nodes. This includes intermediate UPDATE operations as well. Your one node with sub 1GB sticks out like a sore thumb, though. Where did you originate the nodetool repair from? Remember that repair will only ensure consistency for ranges held by the node you're running it on. While I am not sure if missing ranges are included in this, if you ran nodetool repair only on a machine with partial ownership, you will need to complete repairs across the ring before data will return to full consistency. I would query some older data using consistency = ONE on the affected machine to determine if you are actually missing data. There are a few outstanding bugs in the 2.1.x and older release families that may result in tombstone creation even without deletes, for example CASSANDRA-10547, which impacts updates on collections in pre-2.1.13 Cassandra. You can also try examining the output of nodetool ring, which will give you a breakdown of tokens and their associations within your cluster. --Bryan On Tue, May 24, 2016 at 3:49 PM, kurt Greaveswrote: > Not necessarily considering RF is 2 so both nodes should have all > partitions. Luke, are you sure the repair is succeeding? You don't have > other keyspaces/duplicate data/extra data in your cassandra data directory? > Also, you could try querying on the node with less data to confirm if it > has the same dataset. > > On 24 May 2016 at 22:03, Bhuvan Rawal wrote: > >> For the other DC, it can be acceptable because partition reside on one >> node, so say if you have a large partition, it may skew things a bit. >> On May 25, 2016 2:41 AM, "Luke Jolly" wrote: >> >>> So I guess the problem may have been with the initial addition of the >>> 10.128.0.20 node because when I added it in it never synced data I >>> guess? It was at around 50 MB when it first came up and transitioned to >>> "UN". After it was in I did the 1->2 replication change and tried repair >>> but it didn't fix it. From what I can tell all the data on it is stuff >>> that has been written since it came up. We never delete data ever so we >>> should have zero tombstones. >>> >>> If I am not mistaken, only two of my nodes actually have all the data, >>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 >>> is almost a GB lower and then of course 10.128.0.20 which is missing >>> over 5 GB of data. I tried running nodetool -local on both DCs and it >>> didn't fix either one. >>> >>> Am I running into a bug of some kind? >>> >>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal >>> wrote: >>> Hi Luke, You mentioned that replication factor was increased from 1 to 2. In that case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? You can run nodetool repair with option -local to initiate repair local datacenter for gce-us-central1. Also you may suspect that if a lot of data was deleted while the node was down it may be having a lot of tombstones which is not needed to be replicated to the other node. In order to verify the same, you can issue a select count(*) query on column families (With the amount of data you have it should not be an issue) with tracing on and with consistency local_all by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a file. It will give you a fair amount of idea about how many deleted cells the nodes have. I tried searching for reference if tombstones are moved around during repair, but I didnt find evidence of it. However I see no reason to because if the node didnt have data then streaming tombstones does not make a lot of sense. Regards, Bhuvan On Tue, May 24, 2016 at 11:06 PM, Luke Jolly wrote: > Here's my setup: > > Datacenter: gce-us-central1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.128.0.3 6.4 GB 256 100.0% > 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default > UN 10.128.0.20 943.08 MB 256 100.0% > 958348cb-8205-4630-8b96-0951bf33f3d3 default > Datacenter: gce-us-east1 > > Status=Up/Down > |/
Re: Increasing replication factor and repair doesn't seem to work
Not necessarily considering RF is 2 so both nodes should have all partitions. Luke, are you sure the repair is succeeding? You don't have other keyspaces/duplicate data/extra data in your cassandra data directory? Also, you could try querying on the node with less data to confirm if it has the same dataset. On 24 May 2016 at 22:03, Bhuvan Rawalwrote: > For the other DC, it can be acceptable because partition reside on one > node, so say if you have a large partition, it may skew things a bit. > On May 25, 2016 2:41 AM, "Luke Jolly" wrote: > >> So I guess the problem may have been with the initial addition of the >> 10.128.0.20 node because when I added it in it never synced data I >> guess? It was at around 50 MB when it first came up and transitioned to >> "UN". After it was in I did the 1->2 replication change and tried repair >> but it didn't fix it. From what I can tell all the data on it is stuff >> that has been written since it came up. We never delete data ever so we >> should have zero tombstones. >> >> If I am not mistaken, only two of my nodes actually have all the data, >> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 >> is almost a GB lower and then of course 10.128.0.20 which is missing >> over 5 GB of data. I tried running nodetool -local on both DCs and it >> didn't fix either one. >> >> Am I running into a bug of some kind? >> >> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal wrote: >> >>> Hi Luke, >>> >>> You mentioned that replication factor was increased from 1 to 2. In that >>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? >>> >>> You can run nodetool repair with option -local to initiate repair local >>> datacenter for gce-us-central1. >>> >>> Also you may suspect that if a lot of data was deleted while the node >>> was down it may be having a lot of tombstones which is not needed to be >>> replicated to the other node. In order to verify the same, you can issue a >>> select count(*) query on column families (With the amount of data you have >>> it should not be an issue) with tracing on and with consistency local_all >>> by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a >>> file. It will give you a fair amount of idea about how many deleted cells >>> the nodes have. I tried searching for reference if tombstones are moved >>> around during repair, but I didnt find evidence of it. However I see no >>> reason to because if the node didnt have data then streaming tombstones >>> does not make a lot of sense. >>> >>> Regards, >>> Bhuvan >>> >>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly >>> wrote: >>> Here's my setup: Datacenter: gce-us-central1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.3 6.4 GB 256 100.0% 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default UN 10.128.0.20 943.08 MB 256 100.0% 958348cb-8205-4630-8b96-0951bf33f3d3 default Datacenter: gce-us-east1 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.142.0.14 6.4 GB 256 100.0% c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default UN 10.142.0.13 5.55 GB256 100.0% d0d9c30e-1506-4b95-be64-3dd4d78f0583 default And my replication settings are: {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', 'gce-us-central1': '2', 'gce-us-east1': '2'} As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also 10.142.0.13 seems also not to have everything as it only has a load of 5.55 GB. On Mon, May 23, 2016 at 7:28 PM, kurt Greaves wrote: > Do you have 1 node in each DC or 2? If you're saying you have 1 node > in each DC then a RF of 2 doesn't make sense. Can you clarify on what your > set up is? > > On 23 May 2016 at 19:31, Luke Jolly wrote: > >> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >> gce-us-east1. I increased the replication factor of gce-us-central1 >> from 1 >> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" >> for the node switched to 100% as it should but the Load showed that it >> didn't actually sync the data. I then ran a full 'nodetool repair' and >> it >> didn't fix it still. This scares me as I thought 'nodetool repair' was a >> way to assure consistency and that all the nodes were synced but it >> doesn't
Re: Increasing replication factor and repair doesn't seem to work
For the other DC, it can be acceptable because partition reside on one node, so say if you have a large partition, it may skew things a bit. On May 25, 2016 2:41 AM, "Luke Jolly"wrote: > So I guess the problem may have been with the initial addition of the > 10.128.0.20 node because when I added it in it never synced data I > guess? It was at around 50 MB when it first came up and transitioned to > "UN". After it was in I did the 1->2 replication change and tried repair > but it didn't fix it. From what I can tell all the data on it is stuff > that has been written since it came up. We never delete data ever so we > should have zero tombstones. > > If I am not mistaken, only two of my nodes actually have all the data, > 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 > is almost a GB lower and then of course 10.128.0.20 which is missing over > 5 GB of data. I tried running nodetool -local on both DCs and it didn't > fix either one. > > Am I running into a bug of some kind? > > On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal wrote: > >> Hi Luke, >> >> You mentioned that replication factor was increased from 1 to 2. In that >> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? >> >> You can run nodetool repair with option -local to initiate repair local >> datacenter for gce-us-central1. >> >> Also you may suspect that if a lot of data was deleted while the node was >> down it may be having a lot of tombstones which is not needed to be >> replicated to the other node. In order to verify the same, you can issue a >> select count(*) query on column families (With the amount of data you have >> it should not be an issue) with tracing on and with consistency local_all >> by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a >> file. It will give you a fair amount of idea about how many deleted cells >> the nodes have. I tried searching for reference if tombstones are moved >> around during repair, but I didnt find evidence of it. However I see no >> reason to because if the node didnt have data then streaming tombstones >> does not make a lot of sense. >> >> Regards, >> Bhuvan >> >> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly wrote: >> >>> Here's my setup: >>> >>> Datacenter: gce-us-central1 >>> === >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns (effective) Host ID >>> Rack >>> UN 10.128.0.3 6.4 GB 256 100.0% >>> 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default >>> UN 10.128.0.20 943.08 MB 256 100.0% >>> 958348cb-8205-4630-8b96-0951bf33f3d3 default >>> Datacenter: gce-us-east1 >>> >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns (effective) Host ID >>> Rack >>> UN 10.142.0.14 6.4 GB 256 100.0% >>> c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default >>> UN 10.142.0.13 5.55 GB256 100.0% >>> d0d9c30e-1506-4b95-be64-3dd4d78f0583 default >>> >>> And my replication settings are: >>> >>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', >>> 'gce-us-central1': '2', 'gce-us-east1': '2'} >>> >>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of >>> 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also >>> 10.142.0.13 >>> seems also not to have everything as it only has a load of 5.55 GB. >>> >>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves >>> wrote: >>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in each DC then a RF of 2 doesn't make sense. Can you clarify on what your set up is? On 23 May 2016 at 19:31, Luke Jolly wrote: > I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and > gce-us-east1. I increased the replication factor of gce-us-central1 from > 1 > to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" > for the node switched to 100% as it should but the Load showed that it > didn't actually sync the data. I then ran a full 'nodetool repair' and it > didn't fix it still. This scares me as I thought 'nodetool repair' was a > way to assure consistency and that all the nodes were synced but it > doesn't > seem to be. Outside of that command, I have no idea how I would assure > all > the data was synced or how to get the data correctly synced without > decommissioning the node and re-adding it. > -- Kurt Greaves k...@instaclustr.com www.instaclustr.com >>> >>> >>
Re: Increasing replication factor and repair doesn't seem to work
So I guess the problem may have been with the initial addition of the 10.128.0.20 node because when I added it in it never synced data I guess? It was at around 50 MB when it first came up and transitioned to "UN". After it was in I did the 1->2 replication change and tried repair but it didn't fix it. From what I can tell all the data on it is stuff that has been written since it came up. We never delete data ever so we should have zero tombstones. If I am not mistaken, only two of my nodes actually have all the data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 is almost a GB lower and then of course 10.128.0.20 which is missing over 5 GB of data. I tried running nodetool -local on both DCs and it didn't fix either one. Am I running into a bug of some kind? On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawalwrote: > Hi Luke, > > You mentioned that replication factor was increased from 1 to 2. In that > case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? > > You can run nodetool repair with option -local to initiate repair local > datacenter for gce-us-central1. > > Also you may suspect that if a lot of data was deleted while the node was > down it may be having a lot of tombstones which is not needed to be > replicated to the other node. In order to verify the same, you can issue a > select count(*) query on column families (With the amount of data you have > it should not be an issue) with tracing on and with consistency local_all > by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a > file. It will give you a fair amount of idea about how many deleted cells > the nodes have. I tried searching for reference if tombstones are moved > around during repair, but I didnt find evidence of it. However I see no > reason to because if the node didnt have data then streaming tombstones > does not make a lot of sense. > > Regards, > Bhuvan > > On Tue, May 24, 2016 at 11:06 PM, Luke Jolly wrote: > >> Here's my setup: >> >> Datacenter: gce-us-central1 >> === >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN 10.128.0.3 6.4 GB 256 100.0% >> 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default >> UN 10.128.0.20 943.08 MB 256 100.0% >> 958348cb-8205-4630-8b96-0951bf33f3d3 default >> Datacenter: gce-us-east1 >> >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN 10.142.0.14 6.4 GB 256 100.0% >> c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default >> UN 10.142.0.13 5.55 GB256 100.0% >> d0d9c30e-1506-4b95-be64-3dd4d78f0583 default >> >> And my replication settings are: >> >> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', >> 'gce-us-central1': '2', 'gce-us-east1': '2'} >> >> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of >> 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also >> 10.142.0.13 >> seems also not to have everything as it only has a load of 5.55 GB. >> >> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves >> wrote: >> >>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in >>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set >>> up is? >>> >>> On 23 May 2016 at 19:31, Luke Jolly wrote: >>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and gce-us-east1. I increased the replication factor of gce-us-central1 from 1 to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" for the node switched to 100% as it should but the Load showed that it didn't actually sync the data. I then ran a full 'nodetool repair' and it didn't fix it still. This scares me as I thought 'nodetool repair' was a way to assure consistency and that all the nodes were synced but it doesn't seem to be. Outside of that command, I have no idea how I would assure all the data was synced or how to get the data correctly synced without decommissioning the node and re-adding it. >>> >>> >>> >>> -- >>> Kurt Greaves >>> k...@instaclustr.com >>> www.instaclustr.com >>> >> >> >
Re: Increasing replication factor and repair doesn't seem to work
Hi Luke, You mentioned that replication factor was increased from 1 to 2. In that case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? You can run nodetool repair with option -local to initiate repair local datacenter for gce-us-central1. Also you may suspect that if a lot of data was deleted while the node was down it may be having a lot of tombstones which is not needed to be replicated to the other node. In order to verify the same, you can issue a select count(*) query on column families (With the amount of data you have it should not be an issue) with tracing on and with consistency local_all by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a file. It will give you a fair amount of idea about how many deleted cells the nodes have. I tried searching for reference if tombstones are moved around during repair, but I didnt find evidence of it. However I see no reason to because if the node didnt have data then streaming tombstones does not make a lot of sense. Regards, Bhuvan On Tue, May 24, 2016 at 11:06 PM, Luke Jollywrote: > Here's my setup: > > Datacenter: gce-us-central1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.128.0.3 6.4 GB 256 100.0% > 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default > UN 10.128.0.20 943.08 MB 256 100.0% > 958348cb-8205-4630-8b96-0951bf33f3d3 default > Datacenter: gce-us-east1 > > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.142.0.14 6.4 GB 256 100.0% > c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default > UN 10.142.0.13 5.55 GB256 100.0% > d0d9c30e-1506-4b95-be64-3dd4d78f0583 default > > And my replication settings are: > > {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', > 'gce-us-central1': '2', 'gce-us-east1': '2'} > > As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of > 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also > 10.142.0.13 > seems also not to have everything as it only has a load of 5.55 GB. > > On Mon, May 23, 2016 at 7:28 PM, kurt Greaves > wrote: > >> Do you have 1 node in each DC or 2? If you're saying you have 1 node in >> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set >> up is? >> >> On 23 May 2016 at 19:31, Luke Jolly wrote: >> >>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >>> gce-us-east1. I increased the replication factor of gce-us-central1 from 1 >>> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" >>> for the node switched to 100% as it should but the Load showed that it >>> didn't actually sync the data. I then ran a full 'nodetool repair' and it >>> didn't fix it still. This scares me as I thought 'nodetool repair' was a >>> way to assure consistency and that all the nodes were synced but it doesn't >>> seem to be. Outside of that command, I have no idea how I would assure all >>> the data was synced or how to get the data correctly synced without >>> decommissioning the node and re-adding it. >>> >> >> >> >> -- >> Kurt Greaves >> k...@instaclustr.com >> www.instaclustr.com >> > >
Re: Increasing replication factor and repair doesn't seem to work
Here's my setup: Datacenter: gce-us-central1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.128.0.3 6.4 GB 256 100.0% 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default UN 10.128.0.20 943.08 MB 256 100.0% 958348cb-8205-4630-8b96-0951bf33f3d3 default Datacenter: gce-us-east1 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.142.0.14 6.4 GB 256 100.0% c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default UN 10.142.0.13 5.55 GB256 100.0% d0d9c30e-1506-4b95-be64-3dd4d78f0583 default And my replication settings are: {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', 'gce-us-central1': '2', 'gce-us-east1': '2'} As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also 10.142.0.13 seems also not to have everything as it only has a load of 5.55 GB. On Mon, May 23, 2016 at 7:28 PM, kurt Greaveswrote: > Do you have 1 node in each DC or 2? If you're saying you have 1 node in > each DC then a RF of 2 doesn't make sense. Can you clarify on what your set > up is? > > On 23 May 2016 at 19:31, Luke Jolly wrote: > >> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >> gce-us-east1. I increased the replication factor of gce-us-central1 from 1 >> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" for >> the node switched to 100% as it should but the Load showed that it didn't >> actually sync the data. I then ran a full 'nodetool repair' and it didn't >> fix it still. This scares me as I thought 'nodetool repair' was a way to >> assure consistency and that all the nodes were synced but it doesn't seem >> to be. Outside of that command, I have no idea how I would assure all the >> data was synced or how to get the data correctly synced without >> decommissioning the node and re-adding it. >> > > > > -- > Kurt Greaves > k...@instaclustr.com > www.instaclustr.com >
Re: Increasing replication factor and repair doesn't seem to work
Do you have 1 node in each DC or 2? If you're saying you have 1 node in each DC then a RF of 2 doesn't make sense. Can you clarify on what your set up is? On 23 May 2016 at 19:31, Luke Jollywrote: > I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and > gce-us-east1. I increased the replication factor of gce-us-central1 from 1 > to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" for > the node switched to 100% as it should but the Load showed that it didn't > actually sync the data. I then ran a full 'nodetool repair' and it didn't > fix it still. This scares me as I thought 'nodetool repair' was a way to > assure consistency and that all the nodes were synced but it doesn't seem > to be. Outside of that command, I have no idea how I would assure all the > data was synced or how to get the data correctly synced without > decommissioning the node and re-adding it. > -- Kurt Greaves k...@instaclustr.com www.instaclustr.com
Increasing replication factor and repair doesn't seem to work
I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and gce-us-east1. I increased the replication factor of gce-us-central1 from 1 to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" for the node switched to 100% as it should but the Load showed that it didn't actually sync the data. I then ran a full 'nodetool repair' and it didn't fix it still. This scares me as I thought 'nodetool repair' was a way to assure consistency and that all the nodes were synced but it doesn't seem to be. Outside of that command, I have no idea how I would assure all the data was synced or how to get the data correctly synced without decommissioning the node and re-adding it.