date:20131021

mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Илья Шипицин

Hello!

is mixed linux/windows cluster configuration supported in 1.2 ?


Cheers,
Ilya Shipitsin

Re: Sorting keys for batch reads to minimize seeks

2013-10-21 Thread Edward Capriolo

I am not sure what you are working on will have an effect. You can not
actually control the way the operating system seeks data on disk. The io
scheduling is done outside cassandra. You can try to write the code in an
optimistic way taking phyical hardware into account, but then you have to
consider there are n concurrent requests on the io system.

On Friday, October 18, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com
wrote:
 Read latency depends on many factors, don't forget physics.
 If it meets your requirements, it is good.


 -Original Message-
 From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com]
 Sent: Friday, October 18, 2013 1:03 PM
 To: user@cassandra.apache.org
 Subject: Re: Sorting keys for batch reads to minimize seeks

 Hi,

 Thanks for your reply. Our latency currently is 23.618ms. However I
simply read that off one node just now while it wasn't under a load test. I
am going to be able to get a better number after the next test run.

 What is a good value for read latency?


 On 18/10/13 08:31, Viktor Jevdokimov wrote:
 The only thing you may win - avoid unnecessary network hops if:
 - request sorted keys (by token) from appropriate replica with
ConsistencyLevel.ONE and dynamic_snitch: false.
 - nodes has the same load
 - replica not doing GC, and GC pauses are much higher than internode
communication.

 For multiple keys request C* will do multiple single key reads, except
for range scan requests, where only starting key and batch size is used in
request.

 Consider multiple key request as a slow request by design, try to model
your data for low latency single key requests.

 So, what latencies do you want to achieve?



 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-03163 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments
 is intended solely for the attention and use of the named addressee
 and may be confidential. If you are not the intended recipient, you
 are reminded that the information remains the property of the sender.
 You must not use, disclose, distribute, copy, print or rely on this
 e-mail. If you have received this message in error, please contact the
 sender immediately and irrevocably delete this message and any
 copies.-Original Message-
 From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com]
 Sent: Thursday, October 17, 2013 7:40 PM
 To: user@cassandra.apache.org
 Subject: Sorting keys for batch reads to minimize seeks

 Hi,

 I am looking to somehow increase read performance on cassandra. We are
still playing with configurations but I was thinking if there would be
solutions in software that might help us speed up our read performance.

 E.g. one idea, not sure how sane that is, was to sort read-batches by
row-keys before submitting them to cassandra. The idea is that row-keys
should be closer together on the physical disk and therefor this may
minimize the amount of random seeks we have to do when querying say 1000
entries from cassandra. Does that make any sense?

 Is there anything else that we can do in software to improve
performance? Like specific batch sizes for reads? We are using the astyanax
library to access cassandra.

 Thanks!

Re: Is read performance improved by moving more volatile data to different CF?

2013-10-21 Thread Edward Capriolo

I would say no. If you design around row cache and your data acceas
patterns change your assertions will be invalidates and your performance
may be worst over time.

I would use the kiss here. Keep it a smple usng one column family.
Experiement with size teired vs leveled compaction.

On Thursday, October 17, 2013, Jan Algermissen jan.algermis...@nordsc.com
wrote:
 Hi,

 my rows consist of ~70 columns each, some containing small values, some
containing larger amounts of content (think small documents).

 My data is occasionally updated and read several times per day as
complete paging through all rows.

 The updates usually affect only about 10% of the small value columns.

 Speed of the full paging is of most interest to users.

 Given the very different volatility of the per-row data, do you think my
read speed would dramatically improve by splitting the less frequently
changes and the very frequently changed columns into two CFs? So I can
enable the row cache for the seldom changing, larger sized portion of the
data?

 Or would the effect likely by rather marginal?

 Jan

AUTO : Samuel CARRIERE is out of the office (retour 28/10/2013)

2013-10-21 Thread Samuel CARRIERE



Je suis absent(e) du bureau jusqu'au 28/10/2013




Remarque : ceci est une réponse automatique à votre message  Re: Is read
performance improved by moving more volatile data to different CF? envoyé
le 21/10/2013 18:03:29.

C'est la seule notification que vous recevrez pendant l'absence de cette
personne.

Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

2013-10-21 Thread Edward Capriolo

An easy way to test this would be to run stress or some other tool at a
slow rate of inserts and watch the tables flush and compact naturally.

On Tuesday, October 8, 2013, Sameer Farooqui sam...@blueplastic.com wrote:
 Hmm, good point. I'll test this out again and see the compaction behavior
is as expected given the relative sizes of the SSTables.




 On Tue, Oct 8, 2013 at 3:06 PM, Tyler Hobbs ty...@datastax.com wrote:

 Well, 6 was created by the other sstables being compacted, correct?  If
so, they were probably quite a bit smaller (~25% of the size).  Once you
have two more sstables of roughly that size, they should be compacted
automatically.


 On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.com
wrote:

 Thanks for the reply, Tyler. I thought that too.. that maybe the
SSTables are mismatched in size... but upon closer inspection, that doesn't
appear to be the case:
 -rw-r--r-- 1 cassandra cassandra  227 Oct  7 23:26
demodb-users-jb-1-Data.db
 -rw-r--r-- 1 cassandra cassandra  242 Oct  8 00:38
demodb-users-jb-6-Data.db

 The two files look to be nearly the same size. There just appears to be
something special about that first SSTable and it not getting compacted.

 On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote:

 SizeTieredCompactionStrategy only compacts sstables that are a similar
size (by default, they basically need to be within 50% of each other).
Perhaps your first SSTable was very large or small compared to the others?


 On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.com
wrote:

 Hi,
 I have a fresh 1-node C* 2.0 install with a demo keyspace created
with the SizeTiered compaction strategy.
 I've noticed that in the beginning this keyspace has just one SSTable:
 demodb-users-jb-1-Data.db
 But as I add more data to the table and do some flushes, the # of
SSTables builds up. After I have a handful of SSTables, I trigger a flush
using 'nodetool flush demodb users', but then not ALL of the SSTables get
compacted.
 I've noticed that the 1st SSTable remains the same and doesn't
disappear after the compaction, but the latter SSTables do get compacted
into one new Data file.
 Is there a reason why the first SSTable is special and it is not
disappearing after compaction?
 Also, I think I noticed that if I wait a few days and run another
compaction, then that 1st SSTable does not compacted (and it disappears).
 Can someone help explain why the 1st SSTable behaves this way?


 --
 Tyler Hobbs
 DataStax




 --
 Tyler Hobbs
 DataStax

Re: Is read performance improved by moving more volatile data to different CF?

2013-10-21 Thread Edward Capriolo

Stupid cell phone.

I would say no. If you design around row cache and your data access
patterns change, the original assertions may be invalidated and the
performance might be worst then the simple design.


On Mon, Oct 21, 2013 at 12:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I would say no. If you design around row cache and your data acceas
 patterns change your assertions will be invalidates and your performance
 may be worst over time.

 I would use the kiss here. Keep it a smple usng one column family.
 Experiement with size teired vs leveled compaction.

 On Thursday, October 17, 2013, Jan Algermissen jan.algermis...@nordsc.com
 wrote:
  Hi,
 
  my rows consist of ~70 columns each, some containing small values, some
 containing larger amounts of content (think small documents).
 
  My data is occasionally updated and read several times per day as
 complete paging through all rows.
 
  The updates usually affect only about 10% of the small value columns.
 
  Speed of the full paging is of most interest to users.
 
  Given the very different volatility of the per-row data, do you think my
 read speed would dramatically improve by splitting the less frequently
 changes and the very frequently changed columns into two CFs? So I can
 enable the row cache for the seldom changing, larger sized portion of the
 data?
 
  Or would the effect likely by rather marginal?
 
  Jan

Re: MemtablePostFlusher pending

2013-10-21 Thread Robert Coli

On Mon, Oct 21, 2013 at 2:17 AM, Kais Ahmed k...@neteck-fr.com wrote:

 We have recently run in production a new cluster C* 2.0.0 with 3 nodes RF
 3.


https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

What Version of Cassandra Should I Run in Production?

If I were you I would probably try to read my data into a 1.2.x cluster.
Downgrading versions on the same cluster is unlikely to work. You could try
going to 2.0.1 but there is not compelling reason to believe this will fix
your problem.

=Rob

decommission of one EC2 node in cluster causes other nodes to go DOWN/UP and results in May not be enough replicas...

2013-10-21 Thread John Pyeatt

We have a 6 node cassandra 1.2.10 cluster running on aws with
NetworkTopologyStrategy, a replication factor of 3 and the EC2Snitch. Each
AWS availability zone has 2 nodes in it.

When we are reading or writing data with consistency of Quorum to the
cluster while decommissioning a node we are getting 'May not be enough
replicas present to handle consistency level.

This doesn't make sense because we are only taking one node down, we have
an RF of three so even if we take one node down with a quorum read/write
there should still be enough nodes with the data (2).

Looking at the cassandra log on a server that we are not decommissioning we
are seeing this during the decommission of the other node.

 INFO [GossipTasks:1] 2013-10-21 15:18:10,695 Gossiper.java (line 803)
InetAddress /10.0.22.142 *is now DOWN*
 INFO [GossipTasks:1] 2013-10-21 15:18:10,696 Gossiper.java (line 803)
InetAddress /10.0.32.159 *is now DOWN*
 INFO [HANDSHAKE-/10.0.22.142] 2013-10-21 15:18:10,862
OutboundTcpConnection.java (line 399) Handshaking version with /10.0.22.142
 INFO [GossipTasks:1] 2013-10-21 15:18:11,696 Gossiper.java (line 803)
InetAddress /10.0.12.178* is now DOWN*
 INFO [GossipTasks:1] 2013-10-21 15:18:11,697 Gossiper.java (line 803)
InetAddress /10.0.22.106* is now DOWN*
 INFO [GossipTasks:1] 2013-10-21 15:18:11,698 Gossiper.java (line 803)
InetAddress /10.0.32.248 *is now DOWN*

Eventually we are seeing a message that looks like this.
 INFO [GossipStage:3] 2013-10-21 15:18:19,429 Gossiper.java (line 789)
InetAddress /10.0.32.248 is now UP

for each of the nodes. So eventually the remaining nodes in the cluster
come back to life.

While these nodes are down I can see why we get the May not be enough
replicas... message. Because everything is down.

My question is *why does gossip shutdown for these nodes that we aren't
decommissioning in the first place*?

-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
--
608.661.1184
john.pye...@singlewire.com

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Jon Haddad

I can't imagine any situation where this would be practical.  What would be the 
reason to even consider this?

On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.com wrote:
 is mixed linux/windows cluster configuration supported in 1.2 ?
 
 I don't think it's officially supported in any version; you would be among a 
 very small number of people operating in this way. However there is no 
 technical reason it shouldn't work.
 
 =Rob

Re: upgrading Cassandra server hardware best practice?

2013-10-21 Thread Robert Coli

On Fri, Oct 18, 2013 at 3:27 PM, Arindam Barua aba...@247-inc.com wrote:

  Is step 1 just to reduce downtime for the node?


Yes.


 Also, I’m assuming the initial_token of the new node should be set to be
 the same as the token of the old node, or close to that. Eg. [1] in
 “Replacing a Dead Node” talks about setting the new node’s intial_token to
 the value of the dead token – 1. (I’m not sure why the offset by 1 helps)


Using a new initial_token is for the case where you cannot manually copy
the data and replace the token owner in place by using auto_bootstrap:false.


 If the number of hosts with the new hardware (TBD) is different than the
 old, after doing what you suggested, I guess I can follow the regular steps
 for adding a new node/deleting a new node then.


Yes, or use nodetool move followed by nodetool cleanup.

=Rob

Fwd: {kundera-discuss} Kundera 2.8 released

2013-10-21 Thread Vivek Mishra

fyi.

-- Forwarded message --
From: Vivek Mishra vivek.mis...@impetus.co.in
Date: Tue, Oct 22, 2013 at 1:33 AM
Subject: {kundera-discuss} Kundera 2.8 released
To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com

Hi All,

We are happy to announce the release of Kundera 2.8 .

Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL
datastores. The idea behind Kundera is to make working with NoSQL databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB,
Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational databases.

Major Changes:
==
1) Support for CouchDB as datastore.
2) Support for MappedSuperclass and JPA Inheritence strategy.

Github Bug Fixes:
===

https://github.com/impetus-opensource/Kundera/pull/409
https://github.com/impetus-opensource/Kundera/issues/396
https://github.com/impetus-opensource/Kundera/issues/379
https://github.com/impetus-opensource/Kundera/issues/340
https://github.com/impetus-opensource/Kundera/issues/327
https://github.com/impetus-opensource/Kundera/issues/320
https://github.com/impetus-opensource/Kundera/issues/261
https://github.com/impetus-opensource/Kundera/pull/142
https://github.com/impetus-opensource/Kundera/issues/55
https://github.com/impetus-opensource/Kundera/issues/420
https://github.com/impetus-opensource/Kundera/issues/414
https://github.com/impetus-opensource/Kundera/issues/411
https://github.com/impetus-opensource/Kundera/issues/401
https://github.com/impetus-opensource/Kundera/issues/378
https://github.com/impetus-opensource/Kundera/issues/354
https://github.com/impetus-opensource/Kundera/issues/315
https://github.com/impetus-opensource/Kundera/issues/298
https://github.com/impetus-opensource/Kundera/issues/204
https://github.com/impetus-opensource/Kundera/issues/179
https://github.com/impetus-opensource/Kundera/issues/128
https://github.com/impetus-opensource/Kundera/issues/432
https://github.com/impetus-opensource/Kundera/issues/422

How to Download:
To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.8 Kundera maven libraries are now
available at:
https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests

Survey/Feedback:
http://www.surveymonkey.com/s/BMB9PWG

Thank you all for your contributions and using Kundera!

Sincerely,
Kundera Team

NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

--
You received this message because you are subscribed to the Google Groups
kundera-discuss group.
To unsubscribe from this group and stop receiving emails from it, send an
email to kundera-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Илья Шипицин

Technical reason is path separator, which is different on linux and
windows. If you would search through maling list, you would have found
evidence it does not work and it is not supported.

But, the most recent notice I have found was about 0.7 and there was no
jira bug number. Just unsupported.

вторник, 22 октября 2013 г. пользователь Robert Coli писал:

 On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин 
 chipits...@gmail.comjavascript:_e({}, 'cvml', 'chipits...@gmail.com');
  wrote:

 is mixed linux/windows cluster configuration supported in 1.2 ?


 I don't think it's officially supported in any version; you would be among
 a very small number of people operating in this way. However there is no
 technical reason it shouldn't work.

 =Rob

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Илья Шипицин

We want to migrate hundred gigabytes cluster from winows to linux without
operation interruption. I.e. node by node.

вторник, 22 октября 2013 г. пользователь Jon Haddad писал:

 I can't imagine any situation where this would be practical.  What would
 be the reason to even consider this?

 On Oct 21, 2013, at 11:06 AM, Robert Coli 
 rc...@eventbrite.comjavascript:_e({}, 'cvml', 'rc...@eventbrite.com');
 wrote:

 On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин 
 chipits...@gmail.comjavascript:_e({}, 'cvml', 'chipits...@gmail.com');
  wrote:

 is mixed linux/windows cluster configuration supported in 1.2 ?


 I don't think it's officially supported in any version; you would be among
 a very small number of people operating in this way. However there is no
 technical reason it shouldn't work.

 =Rob

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Edward Capriolo

We ran a Cassandra LAN party once with a mixed environment.
http://www.datastax.com/dev/blog/cassandra-nyc-lan-party

This was obviously a trivial setup. I think areas of concern would if you
have column families located on different devices and streaming related
issues. It might work just fine however, but I would test the migration
first before just trying to run mixed mode.



On Mon, Oct 21, 2013 at 4:15 PM, Илья Шипицин chipits...@gmail.com wrote:

 We want to migrate hundred gigabytes cluster from winows to linux without
 operation interruption. I.e. node by node.

 вторник, 22 октября 2013 г. пользователь Jon Haddad писал:

 I can't imagine any situation where this would be practical.  What would
 be the reason to even consider this?

 On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.comwrote:

 is mixed linux/windows cluster configuration supported in 1.2 ?


 I don't think it's officially supported in any version; you would be
 among a very small number of people operating in this way. However there is
 no technical reason it shouldn't work.

 =Rob

Re: Huge multi-data center latencies

2013-10-21 Thread Hobin Yoon

So it turned out the DataStax java client round-robins servers by default,
which made periodic huge latencies. Switching to
DCAwareRoundRobinPolicy solved the problem.

Another question is how do you get the local DC name? The application can
parse conf/cassandra-topology.properties manually, but since the server
already knows which DC it belongs to, it would be nice that it can just
specify local DC without giving actual names.

Hobin


On Sat, Oct 19, 2013 at 5:18 PM, Hobin Yoon hobiny...@gmail.com wrote:

 I am experiencing huge latencies with a multi-data center Cassandra
 cluster. With consistency level ONE, I expected almost the same latency
 with the single data center setup. What could possibly affect the latency
 in multi-data center setup?

 multi-DC setup
 min max avg (ms)
 1 969 164.554264

 single-DC setup
 min max avg (ms)
 1 51 2.371786

 I am using the Datastax Java client library (
 https://github.com/datastax/java-driver).

 This is the keyspace description.

 ~/work/cassandra/bin$ ./cqlsh `hostname`
 Connected to cdbp at mdc-s70:9160.
 [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol
 19.36.0]
 Use HELP for help.
 cqlsh desc keyspace pbdp;

 CREATE KEYSPACE pbdp WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC2': '1',
   'DC3': '1',
   'DC0': '1',
   'DC1': '1'
 };

 USE pbdp;

 CREATE TABLE tweet (
   tid bigint PRIMARY KEY,
   created_at_rt bigint,
   created_at_st text,
   lati float,
   longi float,
   real_coord boolean,
   sn text,
   text_ text
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 Thanks,
 Hobin

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman

So looking at Patrick McFadin's data modeling videos I now know about using
compound keys as a way of partitioning data on a by-day basis.

My other questions probably go more to the storage engine itself. How do
you refer to the columns in the wide row? What kind of names are assigned
to the columns?

Les
On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this row?
 The first time an insert takes place there will be one entry for all of the
 fields. Is there anything besides the sensor_id that is used to determine
 that the subsequent insertions for that sensor will go into the same row as
 opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Jon Haddad

If you're working with CQL, you don't need to worry about the column names, 
it's handled for you.

If you specify multiple keys as part of the primary key, they become clustering 
keys and are mapped to the column names.  So if you have a sensor_id / 
time_stamp, all your sensor readings will be in the same row in the traditional 
cassandra sense, sorted by your time_stamp.

On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about using 
 compound keys as a way of partitioning data on a by-day basis.
 
 My other questions probably go more to the storage engine itself. How do you 
 refer to the columns in the wide row? What kind of names are assigned to the 
 columns?
 
 Les
 
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:
 Please correct me if I'm not describing this correctly. But if I am 
 collecting sensor data and have a table defined as follows:
 
  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));
 
 The partitioning value is the sensor_id and the rest of the PK components 
 become part of the column name for the additional fields, in this case 
 voltage and amp.
 
 What goes into determining what additional data is inserted into this row? 
 The first time an insert takes place there will be one entry for all of the 
 fields. Is there anything besides the sensor_id that is used to determine 
 that the subsequent insertions for that sensor will go into the same row as 
 opposed to starting a new row?
 
 Base on something I read (but can't currently find again), I thought that as 
 long as all of the elements of the PK remain the same (same sensor_id and 
 still within the same hour as the first reading), that the next insertion 
 would be tacked onto the end of the first row. Is this correct?
 
 For subsequent entries into the same row for additional voltage/amp readings, 
 what are the names of the columns for these readings? My understanding is 
 that the column name becomes a concatenation of the non-row key field names 
 plus the data field names.So if the first go-around you have 
 time_stamp:voltage and time_stamp:amp, what do the subsequent column 
 names become? 
 
 Thanks.
 
 Les

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman

What if you plan on using Kundera and JPQL and not CQL?

Les
On Oct 21, 2013 4:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're working with CQL, you don't need to worry about the column
 names, it's handled for you.

 If you specify multiple keys as part of the primary key, they become
 clustering keys and are mapped to the column names.  So if you have a
 sensor_id / time_stamp, all your sensor readings will be in the same row in
 the traditional cassandra sense, sorted by your time_stamp.

 On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about
 using compound keys as a way of partitioning data on a by-day basis.

 My other questions probably go more to the storage engine itself. How do
 you refer to the columns in the wide row? What kind of names are assigned
 to the columns?

 Les
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this
 row? The first time an insert takes place there will be one entry for all
 of the fields. Is there anything besides the sensor_id that is used to
 determine that the subsequent insertions for that sensor will go into the
 same row as opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Les Hartzman

So I just saw a post about how Kundera translates all JPQL to CQL.


On Mon, Oct 21, 2013 at 4:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're working with CQL, you don't need to worry about the column
 names, it's handled for you.

 If you specify multiple keys as part of the primary key, they become
 clustering keys and are mapped to the column names.  So if you have a
 sensor_id / time_stamp, all your sensor readings will be in the same row in
 the traditional cassandra sense, sorted by your time_stamp.

 On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote:

 So looking at Patrick McFadin's data modeling videos I now know about
 using compound keys as a way of partitioning data on a by-day basis.

 My other questions probably go more to the storage engine itself. How do
 you refer to the columns in the wide row? What kind of names are assigned
 to the columns?

 Les
 On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote:

 Please correct me if I'm not describing this correctly. But if I am
 collecting sensor data and have a table defined as follows:

  create table sensor_data (
sensor_id int,
time_stamp int,  // time to the hour granularity
voltage float,
amp float,
PRIMARY KEY (sensor_id, time_stamp) ));

 The partitioning value is the sensor_id and the rest of the PK components
 become part of the column name for the additional fields, in this case
 voltage and amp.

 What goes into determining what additional data is inserted into this
 row? The first time an insert takes place there will be one entry for all
 of the fields. Is there anything besides the sensor_id that is used to
 determine that the subsequent insertions for that sensor will go into the
 same row as opposed to starting a new row?

 Base on something I read (but can't currently find again), I thought that
 as long as all of the elements of the PK remain the same (same sensor_id
 and still within the same hour as the first reading), that the next
 insertion would be tacked onto the end of the first row. Is this correct?

 For subsequent entries into the same row for additional voltage/amp
 readings, what are the names of the columns for these readings? My
 understanding is that the column name becomes a concatenation of the
 non-row key field names plus the data field names.So if the first go-around
 you have time_stamp:voltage and time_stamp:amp, what do the
 subsequent column names become?

 Thanks.

 Les

Opening multiple contexts...

2013-10-21 Thread Krishna Chaitanya

Hello,
  I am new to the cassandra world and would like to know if it is
possible to open multiple namespaces from a single program. I am using the
libQtCassandra library. Is it possible to open multiple namespaces and
multiple tables and store different data into different contexts and tables
within the same program. I tried creating two QCassandra objects and tried
opening two contexts but its resulting in a core dump. Please help. Thanks
in advance!!!

-- 
Regards,
BNSK*.
*

Unable to start dse 3.1.4 server on Mac as a process

2013-10-21 Thread Rich Reffner

New to Cassandra and struggling to get DSE server started. Any help is 
appreciated! Thanks so much!

...
 INFO 21:04:08,568 Initializing system.Schema
 INFO 21:04:08,576 Initializing system.schema_keyspaces
 INFO 21:04:08,582 Initializing system.range_xfers
 INFO 21:04:08,587 Initializing system.HintsColumnFamily
 INFO 21:04:08,591 Initializing system.schema_columnfamilies
 INFO 21:04:08,597 Initializing system.NodeIdInfo
 INFO 21:04:08,602 Initializing system.schema_columns
 INFO 21:04:08,607 Initializing system.IndexInfo
 INFO 21:04:08,612 Initializing system.Migrations
 INFO 21:04:08,616 Initializing system.peers
ERROR 21:04:08,619 Exception encountered during startup
java.lang.RuntimeException: Can't open incompatible SSTable! Current version 
ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:376)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:353)
at org.apache.cassandra.db.Table.initCf(Table.java:329)
at org.apache.cassandra.db.Table.init(Table.java:272)
at org.apache.cassandra.db.Table.open(Table.java:109)
at org.apache.cassandra.db.Table.open(Table.java:87)
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:478)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:242)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:137)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:334)
java.lang.RuntimeException: Can't open incompatible SSTable! Current version 
ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:376)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:353)
at org.apache.cassandra.db.Table.initCf(Table.java:329)
at org.apache.cassandra.db.Table.init(Table.java:272)
at org.apache.cassandra.db.Table.open(Table.java:109)
at org.apache.cassandra.db.Table.open(Table.java:87)
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:478)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:242)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:137)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:334)
Exception encountered during startup: Can't open incompatible SSTable! Current 
version ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5

 
- Rich

mixed linux/windows cluster in Cassandra-1.2

Re: Sorting keys for batch reads to minimize seeks

Re: Is read performance improved by moving more volatile data to different CF?

AUTO : Samuel CARRIERE is out of the office (retour 28/10/2013)

Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted

Re: Is read performance improved by moving more volatile data to different CF?

Re: MemtablePostFlusher pending

decommission of one EC2 node in cluster causes other nodes to go DOWN/UP and results in May not be enough replicas...

Re: mixed linux/windows cluster in Cassandra-1.2

Re: upgrading Cassandra server hardware best practice?

Fwd: {kundera-discuss} Kundera 2.8 released

Re: mixed linux/windows cluster in Cassandra-1.2

Re: mixed linux/windows cluster in Cassandra-1.2

Re: mixed linux/windows cluster in Cassandra-1.2

Re: Huge multi-data center latencies

Re: Wide rows/composite keys clarification needed

Re: Wide rows/composite keys clarification needed

Re: Wide rows/composite keys clarification needed

Re: Wide rows/composite keys clarification needed

Opening multiple contexts...

Unable to start dse 3.1.4 server on Mac as a process

21 matches

Site Navigation

Mail list logo

Footer information