mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Hi all! We have encountered the following problem. We create our column families via hector like this: ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* mykeyspace*, *mycf*); cfdef.setColumnType(ColumnType.*STANDARD*); cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);

Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-08-30 Thread Miguel Angel Martin junquera
I try this: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* * * *values2= FOREACH rows GENERATE TOTUPLE (id) as (mycolumn:tuple(name,value));* *dump values2;* *describe values2;* * *

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Miguel Angel Martin junquera
I try this: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* * * *values2= FOREACH rows GENERATE TOTUPLE (id) as (mycolumn:tuple(name,value));* *dump values2;* *describe values2;* * *

Re: successful use of shuffle?

2013-08-30 Thread Alain RODRIGUEZ
+1. I am still afraid of this step. Yet you can avoid it by introducing new nodes, with vnodes enabled, and then remove old ones. This should work. My problem is that I am not really confident in vnodes either... Any share, on this transition, and then of the use of vnodes would be great

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
The short story is that you're probably not up to date on how CQL and thrift table definition relate to one another, and that may not be exactly how you think it does. If you haven't done so, I'd suggest the reading of

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I wish to repeat my another (implied) question that I believe will not be answered in these articles. Why does the explicit definition of columns in a column family significantly improve performance and key cache hit ratio

[RELEASE] Apache Cassandra 1.2.9 released

2013-08-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.2.9. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here:

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
Why does the explicit definition of columns in a column family significantly improve performance and key cache hit ratio (the last one being almost zero when there are no explicit column definitions)? It doesn't, not in itself at least. So something else has changed or something is wrong in

RE: Cassandra-shuffle fails

2013-08-30 Thread Romain HARDOUIN
Hi, Failed to enable shuffling is thrown when an IOException occurs in the constructor JMXConnection(endpoint, port). See Shuffle.enableRelocations() in org.apache.cassandra.tools. Have you set up credentials for JMX? Regards, Romain De :Tamar Rosen ta...@correlor.com A :

map/reduce performance time and sstable readerÅ .

2013-08-30 Thread Hiller, Dean
Has anyone done performance tests on sstable reading vs. M/R? I did a quick test on reading all SSTAbles in a LCS column family on 23 tables and took the average time it took sstable2json(to /dev/null to make it faster) which was 7 seconds per table. (reading to stdout took 16 seconds per

is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes

RE: Truncate question

2013-08-30 Thread S C
Thank you all for your responses. Yes I have cleared the snapshots post truncate operation. Thanks,SC Date: Thu, 29 Aug 2013 21:41:25 -0400 Subject: Re: Truncate question From: dmcne...@gmail.com To: user@cassandra.apache.org You would, however, want to clear the snapshot folder afterword,

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Does your previous snapshot include the system keyspace? I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.Other than that, make sure you also rolled back your config files? On Aug 30, 2013, at 8:57 AM, Mike Neir

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote: Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Robert Coli
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com wrote: I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling

Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mohit Anchlia
If you have multiple DCs you at least want to upgrade to 1.0.11. There is an issue where you might get errors during cross DC replication. On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir m...@liquidweb.com wrote: In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need to do streaming operations (move/repair/bootstrap/etc). The reading I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming operations. Datastax seems to indicate here that doing a

Update-Replace

2013-08-30 Thread Jan Algermissen
Hi, I have a use case, where I periodically need to apply updates to a wide row that should replace the whole row. The straight-forward insert/update only replace values that are present in the executed statement, keeping remaining data around. Is there a smooth way to do a replace with C* or

[ANNOUNCE] Polidoro - A Cassandra client in Scala

2013-08-30 Thread Lanny Ripple
Hi all, We've open sourced Polidoro. It's a Cassandra client in Scala on top of Astyanax and in the style of Cascal. Find it at https://github.com/SpotRight/Polidoro -Lanny Ripple SpotRight, Inc - http://spotright.com

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jeremiah D Jordan
You probably want to go to 1.0.11/12 first no matter what. If you want the least chance of issue you should then go to 1.1.12. While there is a high probability that going from 1.0.X-1.2 will work. You have the best chance at no failures if you go through 1.1.12. There are some edge cases

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text,

Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Jeremiah D Jordan
FYI: http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html -Jeremiah On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote: is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
And surprisingly if i alter table as : alter table user add first_name text; alter table user add last_name text; It gives me back column with values, but still no indexes. Thrift and CQL3 depends on same storage engine. Do they really maintain different metadata for same column family? -Vivek

Re: CQL Thrift

2013-08-30 Thread Peter Lin
in my case, I built a temporal database on top of Cassandra, so it's absolutely key. Dynamic columns are super powerful, which relational database have no equivalent. For me, that is one of the top 3 reasons for using Cassandra. On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
Hi, I understand that, but i want to understand the reason behind such behavior? Is it because of maintaining different metadata objects for CQL3 and thrift? Any suggestion? -Vivek On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
Could you please give a more concrete example? On Aug 30, 2013, at 11:10 AM, Peter Lin wool...@gmail.com wrote: in my case, I built a temporal database on top of Cassandra, so it's absolutely key. Dynamic columns are super powerful, which relational database have no equivalent. For

Re: CQL Thrift

2013-08-30 Thread Jonathan Ellis
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. I agree but partly. You can always create column family with key, column and value and store any number of arbitrary columns as column name in column and it's corresponding value with value. I find it

Re: CQL Thrift

2013-08-30 Thread Peter Lin
I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific types of data for column name and value. It forces you to use all bytes or all strings, which would require coverting it to other types. thrift is much more powerful in

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
True for newly build platform(s), but what about existing apps build using thrift? As per http:// www.datastax.com/dev/blog/thrift-to-cql3http://www.datastax.com/dev/blog/thrift-to-cql3 it should be easy. I am just curious to understand the real reason behind such behavior. -Vivek On Fri, Aug

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
If you talk about comparator. Yes, that's a valid point and not possible with CQL3. -Vivek On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote: I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific

Re: CQL Thrift

2013-08-30 Thread Peter Lin
In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
It sounds like you want this: create table data ( pk int, colname blob, value blob, primary key (pk, colname)); that gives you arbitrary columns (cleverly labeled colname) in a single row, where the value is value. If you don't want the overhead of storing colname in every row, try with

Re: CQL Thrift

2013-08-30 Thread Les Hazlewood
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. Here's one thing: If you're using wide rows and you want to do anything other than just

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
Did you try to explore CQL3 collection support for the same? You can definitely save on number of rows with that. Point which i am trying to make out is, you can achieve it via CQL3 ( Jonathan's blog : http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows) I agree with you

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
@lhazlewood https://issues.apache.org/jira/browse/CASSANDRA-5959 Begin batch multiple insert statements. apply batch It doesn't work for you? -Vivek On Sat, Aug 31, 2013 at 12:21 AM, Les Hazlewood lhazlew...@apache.orgwrote: On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad

CQL Thrift

2013-08-30 Thread Vivek Mishra
Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId)

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
Is there anything that you can link that describes the pitfalls you mention? I'd like a bit more information. Just for clarity's sake, are you recommending 1.0.9 - 1.0.12 - 1.1.12 - 1.2.x? Or would 1.0.9 - 1.1.12 - 1.2.x suffice? Regarding the placement strategy mentioned in a different post,

Re: CQL Thrift

2013-08-30 Thread Peter Lin
CQL3 collections is meant to store stuff that is list, set, map. Plus, collections currently do not supporting secondary indexes. The point is often you don't know what columns are needed at design time. If you know what's needed, use static columns. Using a list, set or map to store data you

Re: successful use of shuffle?

2013-08-30 Thread Jeremiah D Jordan
You need to introduce the new vnode enabled nodes in a new DC. Or you will have similar issues to https://issues.apache.org/jira/browse/CASSANDRA-5525 Add vnode DC: http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html Point

Re: CQL Thrift

2013-08-30 Thread Alex Popescu
On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote: @lhazlewood https://issues.apache.org/jira/browse/CASSANDRA-5959 Begin batch multiple insert statements. apply batch It doesn't work for you? -Vivek According to the OP batching inserts is slow. The SO

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
It seems really strange to me that you're create a table with specific types then try to deviate from it. Why not just use the blob type, then you can store whatever you want in there? The whole point of adding strong typing is to adhere to it. I wouldn't consider it a fault of the database

Re: CQL3 wide row and slow inserts - is there a single insert alternative?

2013-08-30 Thread Les Hazlewood
Well, it appears that this just isn't possible. I created CASSANDRA-5959 as a result. (Backstory + performance testing results are described in the issue): https://issues.apache.org/jira/browse/CASSANDRA-5959 -- Les Hazlewood | @lhazlewood CTO, Stormpath | http://stormpath.com | @goStormpath |

Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat,

Re: CQL Thrift

2013-08-30 Thread Les Hazlewood
Yes, that's correct - and that's a scaled number. In practice: On the local dev machine, CQL3 inserting 10,000 columns (for 1 row) in a BATCH took 1.5 minutes. 50,000 columns (the desired amount) in a BATCH took 7.5 minutes. The same Thrift functionality took _235 milliseconds_. That's almost

Re: CQL Thrift

2013-08-30 Thread Peter Lin
This has nothing to do with compact storage. Cassandra supports arbitrary dynamic columns of different name/value type today. If people are happy with SQL metaphor, then CQL is fine. Then again, if SQL metaphor was good for temporal databases, there wouldn't be so many failed temporal databases

Re: CQL Thrift

2013-08-30 Thread Peter Lin
my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Chad Johnston
I threw together a quick UDF to work around this issue. It just extracts the value portion of the tuple while taking advantage of the CqlStorage generated schema to keep the type correct. You can get it here: https://github.com/iamthechad/cqlstorage-udf I'll see if I can find more useful

Selecting multiple rows with composite partition keys using CQL3

2013-08-30 Thread Carl Lerche
Hello, I've been trying to figure out how to port my application to CQL3 based on http://cassandra.apache.org/doc/cql3/CQL.html. I have a table with a primary key: ( (app, name), timestamp ). So, the partition key would be composite (on app and name). I'm trying to figure out if there is a way

Data Modeling help for representing a survey form.

2013-08-30 Thread John Anderson
I have an existing system in postgres that I would like to move to cassandra. The system is for building registration forms for conferences. For example, you might want to build a registration form (or survey) that has a bunch of questions on it. An overview of this system I whiteboarded here:

Is it possible to synchronous run Cassandra Triggers?

2013-08-30 Thread yun peng
Hi, All I am interested in using the new Cassandra feature Trigger to implement a synchronous (or asynchronous but with deadline) index on Cassandra. The Trigger API allows one to define a mutation job to do (in the future) but is there any way to control when the (asynchronously executed) job is