Hi all!
We have encountered the following problem. We create our column families
via hector like this:
ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
mykeyspace*, *mycf*);
cfdef.setColumnType(ColumnType.*STANDARD*);
cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
I try this:
*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*
*dump rows;*
*ILLUSTRATE rows;*
*describe rows;*
*
*
*values2= FOREACH rows GENERATE TOTUPLE (id) as
(mycolumn:tuple(name,value));*
*dump values2;*
*describe values2;*
*
*
I try this:
*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*
*dump rows;*
*ILLUSTRATE rows;*
*describe rows;*
*
*
*values2= FOREACH rows GENERATE TOTUPLE (id) as
(mycolumn:tuple(name,value));*
*dump values2;*
*describe values2;*
*
*
+1.
I am still afraid of this step. Yet you can avoid it by introducing new
nodes, with vnodes enabled, and then remove old ones. This should work.
My problem is that I am not really confident in vnodes either...
Any share, on this transition, and then of the use of vnodes would be great
The short story is that you're probably not up to date on how CQL and
thrift table definition relate to one another, and that may not be exactly
how you think it does. If you haven't done so, I'd suggest the reading of
Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I
wish to repeat my another (implied) question that I believe will not be
answered in these articles.
Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.9.
Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:
Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio (the last one
being almost zero when there are no explicit column definitions)?
It doesn't, not in itself at least. So something else has changed or
something is wrong in
Hi,
Failed to enable shuffling is thrown when an IOException occurs in the
constructor JMXConnection(endpoint, port).
See Shuffle.enableRelocations() in org.apache.cassandra.tools.
Have you set up credentials for JMX?
Regards,
Romain
De :Tamar Rosen ta...@correlor.com
A :
Has anyone done performance tests on sstable reading vs. M/R? I did a quick
test on reading all SSTAbles in a LCS column family on 23 tables and took the
average time it took sstable2json(to /dev/null to make it faster) which was 7
seconds per table. (reading to stdout took 16 seconds per
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses
thrift)?
We are not worried about repeated reads since we are idempotent but would
rather have the direct speed (even if we had to read from a snapshot, it would
be fine).
(We would most likely run our M/R on 4 nodes
Thank you all for your responses. Yes I have cleared the snapshots post
truncate operation.
Thanks,SC
Date: Thu, 29 Aug 2013 21:41:25 -0400
Subject: Re: Truncate question
From: dmcne...@gmail.com
To: user@cassandra.apache.org
You would, however, want to clear the snapshot folder afterword,
Does your previous snapshot include the system keyspace? I haven't tried
upgrading from 1.0.x then rolling back, but it's possible there's some
backwards incompatible changes.Other than that, make sure you also rolled
back your config files?
On Aug 30, 2013, at 8:57 AM, Mike Neir
Sorry, I didn't see the test procedure, it's still early.
On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote:
Greetings folks,
I'm faced with the need to update a 36 node cluster with roughly 25T of data
on disk to a version of cassandra in the 1.2.x series. While it seems
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com wrote:
I'm faced with the need to update a 36 node cluster with roughly 25T of
data on disk to a version of cassandra in the 1.2.x series. While it seems
that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
rolling
Greetings folks,
I'm faced with the need to update a 36 node cluster with roughly 25T of data on
disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8
will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd
still like to have a roll-back plan in
If you have multiple DCs you at least want to upgrade to 1.0.11. There is
an issue where you might get errors during cross DC replication.
On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir m...@liquidweb.com wrote:
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there
is no need
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no
need to do streaming operations (move/repair/bootstrap/etc). The reading I've
done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming
operations. Datastax seems to indicate here that doing a
Hi,
I have a use case, where I periodically need to apply updates to a wide row
that should replace the whole row.
The straight-forward insert/update only replace values that are present in the
executed statement, keeping remaining data around.
Is there a smooth way to do a replace with C* or
Hi all,
We've open sourced Polidoro. It's a Cassandra client in Scala on top of
Astyanax and in the style of Cascal.
Find it at https://github.com/SpotRight/Polidoro
-Lanny Ripple
SpotRight, Inc - http://spotright.com
You probably want to go to 1.0.11/12 first no matter what. If you want the
least chance of issue you should then go to 1.1.12. While there is a high
probability that going from 1.0.X-1.2 will work. You have the best chance at
no failures if you go through 1.1.12. There are some edge cases
If you're going to work with CQL, work with CQL. If you're going to work with
Thrift, work with Thrift. Don't mix.
On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
Hi,
If i a create a table with CQL3 as
create table user(user_id text PRIMARY KEY, first_name text,
FYI:
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html
-Jeremiah
On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses
thrift)?
We are not worried about repeated reads
And surprisingly if i alter table as :
alter table user add first_name text;
alter table user add last_name text;
It gives me back column with values, but still no indexes.
Thrift and CQL3 depends on same storage engine. Do they really maintain
different metadata for same column family?
-Vivek
in my case, I built a temporal database on top of Cassandra, so it's
absolutely key.
Dynamic columns are super powerful, which relational database have no
equivalent. For me, that is one of the top 3 reasons for using Cassandra.
On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra
Hi,
I understand that, but i want to understand the reason behind
such behavior? Is it because of maintaining different metadata objects for
CQL3 and thrift?
Any suggestion?
-Vivek
On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad j...@jonhaddad.com wrote:
If you're going to work with CQL, work
Could you please give a more concrete example?
On Aug 30, 2013, at 11:10 AM, Peter Lin wool...@gmail.com wrote:
in my case, I built a temporal database on top of Cassandra, so it's
absolutely key.
Dynamic columns are super powerful, which relational database have no
equivalent. For
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:
my bias perspective, I find the sweet spot is thrift for insert/update and
CQL for select queries.
CQL is too limiting and negates the power of
Just curious - what do you need to do that requires thrift? We've build our
entire platform using CQL3 and we haven't hit any issues.
On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:
my bias perspective, I find the sweet spot is thrift for insert/update and
CQL for
CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.
I agree but partly. You can always create column family with key, column
and value and store any number of arbitrary columns as column name in
column and it's corresponding value with value. I find it
I use dynamic columns all the time and they vary in type.
With CQL you can define a default type, but you can't insert specific types
of data for column name and value. It forces you to use all bytes or all
strings, which would require coverting it to other types.
thrift is much more powerful in
True for newly build platform(s), but what about existing apps build using
thrift? As per http://
www.datastax.com/dev/blog/thrift-to-cql3http://www.datastax.com/dev/blog/thrift-to-cql3
it
should be easy.
I am just curious to understand the real reason behind such behavior.
-Vivek
On Fri, Aug
If you talk about comparator. Yes, that's a valid point and not possible
with CQL3.
-Vivek
On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote:
I use dynamic columns all the time and they vary in type.
With CQL you can define a default type, but you can't insert specific
In the interest of education and discussion.
I didn't mean to say CQL3 doesn't support dynamic columns. The example from
the page shows default type defined in the create statement.
create column family data
with key_validation_class=Int32Type
and comparator=DateType
and
It sounds like you want this:
create table data ( pk int, colname blob, value blob, primary key (pk,
colname));
that gives you arbitrary columns (cleverly labeled colname) in a single row,
where the value is value.
If you don't want the overhead of storing colname in every row, try with
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote:
Just curious - what do you need to do that requires thrift? We've build
our entire platform using CQL3 and we haven't hit any issues.
Here's one thing: If you're using wide rows and you want to do anything
other than just
Did you try to explore CQL3 collection support for the same? You can
definitely save on number of rows with that.
Point which i am trying to make out is, you can achieve it via CQL3 (
Jonathan's blog :
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows)
I agree with you
@lhazlewood
https://issues.apache.org/jira/browse/CASSANDRA-5959
Begin batch
multiple insert statements.
apply batch
It doesn't work for you?
-Vivek
On Sat, Aug 31, 2013 at 12:21 AM, Les Hazlewood lhazlew...@apache.orgwrote:
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad
Hi,
If i a create a table with CQL3 as
create table user(user_id text PRIMARY KEY, first_name text, last_name
text, emailid text);
and create index as:
create index on user(first_name);
then inserted some data as:
insert into user(user_id,first_name,last_name,emailId)
Is there anything that you can link that describes the pitfalls you mention? I'd
like a bit more information. Just for clarity's sake, are you recommending 1.0.9
- 1.0.12 - 1.1.12 - 1.2.x? Or would 1.0.9 - 1.1.12 - 1.2.x suffice?
Regarding the placement strategy mentioned in a different post,
CQL3 collections is meant to store stuff that is list, set, map. Plus,
collections currently do not supporting secondary indexes.
The point is often you don't know what columns are needed at design time.
If you know what's needed, use static columns.
Using a list, set or map to store data you
You need to introduce the new vnode enabled nodes in a new DC. Or you will
have similar issues to https://issues.apache.org/jira/browse/CASSANDRA-5525
Add vnode DC:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
Point
On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote:
@lhazlewood
https://issues.apache.org/jira/browse/CASSANDRA-5959
Begin batch
multiple insert statements.
apply batch
It doesn't work for you?
-Vivek
According to the OP batching inserts is slow. The SO
It seems really strange to me that you're create a table with specific types
then try to deviate from it. Why not just use the blob type, then you can
store whatever you want in there?
The whole point of adding strong typing is to adhere to it. I wouldn't
consider it a fault of the database
Well, it appears that this just isn't possible. I created CASSANDRA-5959
as a result. (Backstory + performance testing results are described in the
issue):
https://issues.apache.org/jira/browse/CASSANDRA-5959
--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath |
create a column family as:
create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);
insert into dynamicTable(key, nameAsDouble, valueAsBlob) values (
key, double(102.211),
textAsBlob('valueInBytes').
Do you think, it will work in case column name are double?
-Vivek
On Sat,
Yes, that's correct - and that's a scaled number. In practice:
On the local dev machine, CQL3 inserting 10,000 columns (for 1 row) in a
BATCH took 1.5 minutes. 50,000 columns (the desired amount) in a BATCH
took 7.5 minutes. The same Thrift functionality took _235 milliseconds_.
That's almost
This has nothing to do with compact storage.
Cassandra supports arbitrary dynamic columns of different name/value type
today. If people are happy with SQL metaphor, then CQL is fine.
Then again, if SQL metaphor was good for temporal databases, there wouldn't
be so many failed temporal databases
my bias perspective, I find the sweet spot is thrift for insert/update and
CQL for select queries.
CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.
On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:
If you're going to work with
I threw together a quick UDF to work around this issue. It just extracts
the value portion of the tuple while taking advantage of the CqlStorage
generated schema to keep the type correct.
You can get it here: https://github.com/iamthechad/cqlstorage-udf
I'll see if I can find more useful
Hello,
I've been trying to figure out how to port my application to CQL3 based on
http://cassandra.apache.org/doc/cql3/CQL.html.
I have a table with a primary key: ( (app, name), timestamp ). So, the
partition key would be composite (on app and name). I'm trying to figure
out if there is a way
I have an existing system in postgres that I would like to move to
cassandra. The system is for building registration forms for conferences.
For example, you might want to build a registration form (or survey) that
has a bunch of questions on it. An overview of this system I whiteboarded
here:
Hi, All
I am interested in using the new Cassandra feature Trigger to implement a
synchronous (or asynchronous but with deadline) index on Cassandra.
The Trigger API allows one to define a mutation job to do (in the future)
but is there any way to control when the (asynchronously executed) job is
53 matches
Mail list logo