Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread Jan Kesten
Hi, the embedded cassandra to speedup entering the project may will work for developers, we used it for junit. But a simple clone and maven build - I guess it will end in a single node cassandra cluster. Remember cassandra is a distributed database, one will need more than one node to get perfo

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
Considering the (simplified) table that I wrote before: create table data ( id bigint, ts bigint, column1 blob, column2 blob, column3 blob, ... column29 blob, column30 blob primary key (id, ts) A user request (varies every time) translates into a set of queries asking a subset of the columns (< 1

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
What does your query actually look like today? Is your non-EQ on timestamp selecting a single row a few rows or many rows (dozens, hundreds, thousands)? -- Jack Krupansky On Sun, Feb 14, 2016 at 7:40 PM, Gianluca Borello wrote: > Thanks again. > > One clarification about "reading in a single

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
Thanks again. One clarification about "reading in a single SELECT": in my point 2, I mentioned the need to read a variable subset of columns every time, usually in the range of ~5 out of 30. I can't find a way to do that in a single SELECT unless I use the IN operator (which I can't, as explained)

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
You can definitely read all of columns in a single SELECT. And the n-INSERTS can be batched and will insert fewer cells in the storage engine than the previous approach. -- Jack Krupansky On Sun, Feb 14, 2016 at 7:31 PM, Gianluca Borello wrote: > Thank you for your reply. > > Your advice is def

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
Thank you for your reply. Your advice is definitely sound, although it still seems suboptimal to me because: 1) It requires N INSERT queries from the application code (where N is the number of columns) 2) It requires N SELECT queries from my application code (where N is the number of columns I n

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Jack Krupansky
You could add the column number as an additional clustering key. And then you can actually use COMPACT STORAGE for even more efficient storage and access (assuming there is only a single non-PK data column, the blob value.) You can then access (read or write) an individual column/blob or a slice o

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread John Sanda
The motivation was to make it easy for someone to get up and running quickly with the project. Clone the git repo, run the maven build, and then you are all set. It definitely does lower the learning curve for someone just getting started with a project and who is not really thinking about Cassandr

Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
Hi I've just painfully discovered a "little" detail in Cassandra: Cassandra touches all columns on a CQL select (related issues https://issues.apache.org/jira/browse/CASSANDRA-6586, https://issues.apache.org/jira/browse/CASSANDRA-6588, https://issues.apache.org/jira/browse/CASSANDRA-7085). My dat

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread Jack Krupansky
What motivated the use of an embedded instance for development - as opposed to simply spawning a process for Cassandra? -- Jack Krupansky On Sun, Feb 14, 2016 at 2:05 PM, John Sanda wrote: > The project I work on day to day uses an embedded instance of Cassandra, > but it is intended for prim

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread John Sanda
The project I work on day to day uses an embedded instance of Cassandra, but it is intended for primarily for development. We embed Cassandra in a WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I personally do not do this. I use and recommend ccm