Thanks, List datatype has been in-use for this table almost over a few years now and never had issues. We ran into this issue recently when we did the keyspace migration.
Thanks, Murali On Thu, Oct 24, 2019 at 11:36 AM ZAIDI, ASAD <az1...@att.com> wrote: > Have you chosen correct datatype to begin with, if you don’t want > duplicates? > > > > Generally speaking: > > > > A set and a list both represent multiple values but do so differently. > > A set doesn’t save ordering and values are sorted in ascending order. No > duplicates are allowed. > > > > A list saves ordering where you append or prepend the value into the list. > A list allows duplicates. > > > > > > > > *From:* Muralikrishna Gutha [mailto:muralikgu...@gmail.com] > *Sent:* Thursday, October 24, 2019 10:27 AM > *To:* user@cassandra.apache.org > *Cc:* Muralikrishna Gutha <muralikgu...@gmail.com> > *Subject:* Duplicates columns which are backed by LIST collection types > > > > Hello Guys, > > > > We started noticing strange behavior after we migrated one keyspace from > existing cluster to new cluster. > > > > We expanded our source cluster from 18 node to 36 nodes and Didn't run > "nodetool cleanup". > > We took sstable backups on source cluster and restored which has duplicate > data and restored (sstableloader) it on to new cluster. Apparently > applications started seeing duplicate data mostly on list backed columns. > Below is sstable2json output for one of the list backed columns. > > > > Clustering Column1:Clustering Column2:mods (List collection type > > ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233 > > > > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000], > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000], > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000], > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000], > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000], > > > ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000], > > > > Below is the select statement i would expect Cassandra to return data with > latest timestamp rather it returns duplicate values. > > > > select mods from keyspace.table where partition_key ='1117302' and > type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233; > > > > [image: image.png] > > > > Any help or guidance is greatly appreciated. > > > > -- > > Thanks & Regards > Murali K Gutha > -- Thanks & Regards Murali K Gutha