Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jon Haddad
If it's sparsely populated you'll get the same benefit from the schema definition. You don't pay for fields you don't use. > On Nov 24, 2015, at 12:17 PM, Jack Krupansky wrote: > > Are all or ost of the 1000+ columns populated for a given row? If they are > sparse

Re: Many keyspaces pattern

2015-11-24 Thread Jack Krupansky
And DateTieredCompactionStrategy can be used to efficiently remove whole sstables when the TTL expires, but this implies knowing what TTL to set in advance. I don't know if there are any tools to bulk delete older than a specific age when DateTieredCompactionStrategy is used, but it might be a

Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jack Krupansky
Are all or ost of the 1000+ columns populated for a given row? If they are sparse you can replace them with a single map collection column which would only occupy the entries that are populated. -- Jack Krupansky On Tue, Nov 24, 2015 at 11:04 AM, Jack Krupansky wrote:

Re: Many keyspaces pattern

2015-11-24 Thread Saladi Naidu
I can think of following features to solve 1. If you know the time period of after how long data should be removed then use TTL feature2. Use Time Series to model the data and use inverted index to query the data by time period? Naidu Saladi On Tuesday, November 24, 2015 6:49 AM, Jack

[ANNOUNCE] YCSB 0.5.0 Release

2015-11-24 Thread Connor McCoy
On behalf of the development community, I am pleased to announce the release of YCSB 0.5.0. Highlights: Added support for Kudu (getkudu.io). Added CQL support for Cassandra 2.1+, via the cassandra2-cql binding. Improved semantics for scans under JDBC. Replaced numeric return codes with

RE: No query results while expecting results

2015-11-24 Thread Peer, Oded
Ramon, Have you tried another driver to determine if the problem is in the Python driver? You can deserialize your composite key using the following code: ByteBuffer t = ByteBufferUtil.hexToBytes("0008000e70451f6404000500"); short periodlen =

Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-24 Thread Anishek Agarwal
Peer, that talks about having a similar sized cluster, I was wondering if there is a way for moving from larger to smaller cluster. I will try a few things as soon as i get time and update here. On Thu, Nov 19, 2015 at 5:48 PM, Peer, Oded wrote: > Have you read the DataStax

Many keyspaces pattern

2015-11-24 Thread Jonathan Ballet
Hi, we are running an application which produces every night a batch with several hundreds of Gigabytes of data. Once a batch has been computed, it is never modified (nor updates nor deletes), we just keep producing new batches every day. Now, we are *sometimes* interested to remove a

Re: No query results while expecting results

2015-11-24 Thread Ramon Rockx
Hello Carlos and Oded, Thanks to you all for your input! @Carlos, I did not try the thrift client yet. @Oded, thank you for deserializing the key. It looks exactly what to expect, once it's deserialized... I think we're onto something. I reproduced and simplified the case like this. First I

Re: Many keyspaces pattern

2015-11-24 Thread Jack Krupansky
How often is sometimes - closer to 20% of the batches or 2%? How are you querying batches, both current and older ones? As always, your queries should drive your data models. If deleting a batch is very infrequent, maybe best to not do it and simply have logic in the app to ignore deleted

SELECT some_column vs SELECT *

2015-11-24 Thread Kai Wang
Hi all, If I have the following table: CREATE TABLE t ( pk int, ck int, c1 int, c2 int, ... PRIMARY KEY (pk, ck) ) There are lots of non-clustering columns (1000+). From time to time I need to do a query like this: SELECT c1 FROM t WHERE pk = abc AND ck > xyz; How efficient is this

Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jack Krupansky
As always, your queries should drive your data model. Unless you really need 1000+ columns for most queries, you should consider separate tables for the subsets of the columns that need to be returned for a given query. The new 3.0 Materialized View feature can be used to easily create subsets of