No - it immediately removes the sstables on all nodes. On Mon, Apr 27, 2015 at 7:53 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:
> Wouldn't truncating the table create tombstones? > > On Mon, Apr 27, 2015 at 11:55 AM, Peer, Oded <oded.p...@rsa.com> wrote: > >> I recommend truncating the table instead of dropping it since you don’t >> need to re-issue DDL commands and put load on the system keyspace. >> >> Both DROP and TRUNCATE automatically create snapshots, there no >> “snapshotting” advantage for using DROP . See >> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot >> >> >> >> >> >> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com] >> *Sent:* Sunday, April 26, 2015 10:31 PM >> >> *To:* user@cassandra.apache.org >> *Subject:* Re: Data model suggestions >> >> >> >> Thanks Peer. I like the approach you're suggesting. >> >> >> >> Why do you recommend truncating the last active table rather than just >> dropping it? Since all the data would be inserted into a new table, seems >> like it would make sense to drop the last table, and that way truncate >> snapshotting also won't have to be dealt with (unless I'm missing anything). >> >> >> >> Thanks. >> >> >> >> >> >> On Sun, Apr 26, 2015 at 1:29 PM, Peer, Oded <oded.p...@rsa.com> wrote: >> >> I would maintain two tables. >> >> An “archive” table that holds all the active and inactive records, and is >> updated hourly (re-inserting the same record has some compaction overhead >> but on the other side deleting records has tombstones overhead). >> >> An “active” table which holds all the records in the last external API >> invocation. >> >> To avoid tombstones and read-before-delete issues “active” should >> actually a synonym, an alias, to the most recent active table. >> >> I suggest you create two identical tables, “active1” and “active2”, and >> an “active_alias” table that informs which of the two is the most recent. >> >> Thus when you query the external API you insert the data to “archive” and >> to the unaliased “activeN” table, switch the alias value in “active_alias” >> and truncate the new unaliased “activeM” table. >> >> No need to query the data before inserting it. Make sure truncating >> doesn’t create automatic snapshots. >> >> >> >> >> >> *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com] >> *Sent:* Friday, April 24, 2015 6:53 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Data model suggestions >> >> >> >> I think one table say record should be good. The primary key is record >> id. This will ensure good distribution. >> Just update the active attribute to true or false. >> For range query on active vs archive records maintain 2 indexes or try >> secondary index. >> >> On Apr 23, 2015 1:32 PM, "Ali Akhtar" <ali.rac...@gmail.com> wrote: >> >> Good point about the range selects. I think they can be made to work with >> limits, though. Or, since the active records will never usually be > 500k, >> the ids may just be cached in memory. >> >> >> >> Most of the time, during reads, the queries will just consist of select * >> where primaryKey = someValue . One row at a time. >> >> >> >> The question is just, whether to keep all records in one table (including >> archived records which wont be queried 99% of the time), or to keep active >> records in their own table, and delete them when they're no longer active. >> Will that produce tombstone issues? >> >> >> >> On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar < >> khangaon...@gmail.com> wrote: >> >> Hi, >> >> If your external API returns active records, that means I am guessing you >> need to do a select * on the active table to figure out which records in >> the table are no longer active. >> >> You might be aware that range selects based on partition key will timeout >> in cassandra. They can however be made to work using the column cluster >> key. >> >> To comment more, We would need to see your proposed cassandra tables and >> queries that you might need to run. >> >> regards >> >> >> >> >> >> >> >> On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >> That's returned by the external API we're querying. We query them for >> active records, if a previous active record isn't included in the results, >> that means its time to archive that record. >> >> >> >> On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar <khangaon...@gmail.com> >> wrote: >> >> Hi, >> >> How do you determine if the record is no longer active ? Is it a >> perioidic process that goes through every record and checks when the last >> update happened ? >> >> regards >> >> >> >> On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >> Hey all, >> >> >> >> We are working on moving a mysql based application to Cassandra. >> >> >> >> The workflow in mysql is this: We have two tables: active and archive . >> Every hour, we pull in data from an external API. The records which are >> active, are kept in 'active' table. Once a record is no longer active, its >> deleted from 'active' and re-inserted into 'archive' >> >> >> >> The purpose for that, is because most of the time, queries are only done >> against the active records rather than archived. Therefore keeping the >> active table small may help with faster queries, if it only has to search >> 200k records vs 3 million or more. >> >> >> >> Is it advisable to keep the same data model in Cassandra? I'm concerned >> about tombstone issues when records are deleted from active. >> >> >> >> Thanks. >> >> >> >> -- >> >> http://khangaonkar.blogspot.com/ >> >> >> >> >> >> -- >> >> http://khangaonkar.blogspot.com/ >> >> >> >> >> > >