Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Giampaolo Trapasso
Thanks Michael for the reply. I'm quite new to Cassandra, so it make sense
to explain the use case. I just want to try different choices of data
modelling and compare number of reads and writes. At the moment I'm not
interested in a real stress test, I just want to understand implications of
my choices, and, of course want to see OpsCenter in action. I thought that
CCM+OpsCenter combo was good as choice. Do you think that there's something
else that I can try? Thank you in advance.

giampaolo





2016-01-07 19:24 GMT+01:00 Michael Shuler :

> On 01/07/2016 12:22 PM, Michael Shuler wrote:
> >> [, ,  > 127.0.0.2='-4611686018427387904'>,  127.0.0.1='-9223372036854775808'>]
> >
> > A couple of those, .4 and .2 are identical.
>
> Sorry, they are signed, so they are unique. (bad me.) Keep digging, I
> guess.
>
> --
> Michael
>


Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Nick Bailey
Cassandra switched jmx to only bind to localhost, so I believe you just
need to change jmx_host to localhost for all conf files.

On Thu, Jan 7, 2016 at 4:48 PM, Giampaolo Trapasso <
giampaolo.trapa...@radicalbit.io> wrote:

> Thanks Michael for the reply. I'm quite new to Cassandra, so it make sense
> to explain the use case. I just want to try different choices of data
> modelling and compare number of reads and writes. At the moment I'm not
> interested in a real stress test, I just want to understand implications of
> my choices, and, of course want to see OpsCenter in action. I thought that
> CCM+OpsCenter combo was good as choice. Do you think that there's something
> else that I can try? Thank you in advance.
>
> giampaolo
>
>
>
>
>
> 2016-01-07 19:24 GMT+01:00 Michael Shuler :
>
>> On 01/07/2016 12:22 PM, Michael Shuler wrote:
>> >> [, , > > 127.0.0.2='-4611686018427387904'>, > 127.0.0.1='-9223372036854775808'>]
>> >
>> > A couple of those, .4 and .2 are identical.
>>
>> Sorry, they are signed, so they are unique. (bad me.) Keep digging, I
>> guess.
>>
>> --
>> Michael
>>
>
>


Re: Best way to get Cassandra status in Bash

2016-01-07 Thread Giovanni Usai

Hello,
thanks a lot!
The solution proposed by Gerard works perfectly.

This is the snippet of what I have done:

echo "Checking if Cassandra is up and running ..."
# Try to connect on Cassandra's JMX port 7199
nc -z localhost 7199
nc_return=$?

# Try to connect on Cassandra CQLSH port 9042
nc -z localhost 9042
let "cassandra_status = nc_return + $?"

retries=1
while (( retries < 6 && cassandra_status != 0 )); do
echo "Cassandra doesn't reply to requests on ports 7199 and/or 
9042. Sleeping for a while and trying again... retry ${retries}"


# Sleep for a while
sleep 2s

# Try again to connect to Cassandra
echo "Checking if Cassandra is up and running ..."
nc -z localhost 7199
nc_return=$?

nc -z localhost 9042
let "cassandra_status = nc_return + $?"

let "retries++"
done

if [ $cassandra_status -ne 0 ]; then
echo "/!\ ERROR: Cassandra startup has ended with errors; 
please check log file ${DATAFARI_LOGS}/cassandra-startup.log"

else
echo "Cassandra startup completed successfully --- OK"
$CASSANDRA_HOME/bin/cqlsh -f 
$DATAFARI_HOME/bin/common/config/cassandra/tables

fi

Best regards,
*Giovanni Usai
* giovanni.u...@francelabs.com 


www.francelabs.com 

CEEI Nice Premium
1 Bd. Maître Maurice Slama
06200 Nice FRANCE

Ph: +33 (0)9 72 43 72 85

On 01/05/2016 05:17 PM, Giovanni Usai wrote:

Hello,

thanks to everyone for the fast replies!

Unfortunately, since yesterday afternoon I have been assigned to a 
more urgent task, so I will implement the solutions you proposed in 
the spare time and I will let you know the outcomes asap (hopefully in 
few weeks).


Thanks a lot again!

Best regards,
*Giovanni Usai
* giovanni.u...@francelabs.com


www.francelabs.com 

CEEI Nice Premium
1 Bd. Maître Maurice Slama
06200 Nice FRANCE

Ph: +33 (0)9 72 43 72 85

On 01/04/2016 05:52 PM, Gerard Maas wrote:

Hi Giovanni,

You could use netcat (nc) to test that the cassandra port is up and 
use a timeout to decide when to take an action


nc -z localhost 9160

check for the exit code to decide what action to take.

-kr, Gerard.


On Mon, Jan 4, 2016 at 4:56 PM, Giovanni Usai 
> 
wrote:


Hello Gerard,
thanks for your reply.

It seems nodetool works only when the cluster is up and running.
In case of a bad startup of Cassandra, if I run "nodetool status"
I get one of these 2 errors:

1) error: No nodes present in the cluster. Has this node finished
starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has
this node finished starting up?
at

org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:129)
at

org.apache.cassandra.service.StorageService.effectiveOwnership(StorageService.java:3960)
at

org.apache.cassandra.service.StorageService.effectiveOwnership(StorageService.java:176)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

2) nodetool: Failed to connect to '127.0.0.1:7199
' - ConnectException: 'Connection refused'.

In both cases, the nodetool status command kills my Bash script.

Since I want to do some updates on Cassandra right after startup,
I must wait until the cluster is ready to process requests.
Depending on the hardware, the Cassandra startup may take some
time, so I need to be able to detect when Cassandra is up and
running.

What I am trying to do is something like follows:

execute cassandra start [=> $CASSANDRA_HOME/bin/cassandra -p
$CASSANDRA_PID_FILE]
while (cassandra_status != OK && retries < N){
cassandra_status = some command that returns the status of
cassandra startup
retries++
}
if (cassandra_status != OK){
echo the user and do some countermeasures
} else {
make updates on Cassandra [=> $CASSANDRA_HOME/bin/cqlsh -f
$DATAFARI_HOME/bin/common/config/cassandra/tables]
}

Do you have any idea about the command to use here?
cassandra_status = some command that returns the status of
cassandra startup

Thanks

Best regards,
*Giovanni Usai
* giovanni.u...@francelabs.com 


www.francelabs.com 

CEEI Nice Premium
1 Bd. Maître Maurice Slama
06200 Nice FRANCE

Ph: +33 (0)9 72 43 72 85

On 01/04/2016 03:51 PM, Gerard Maas wrote:

(Hit enter too 

Re: Requesting some details for my use case

2016-01-07 Thread Bhuvan Rawal
Hi Jack,

We are valuing reliability and consistency over performance right now. In
E-commerce industry we can expect unexpected spikes at odd times.

Ill be grateful if you tell me about reliability and failover scenarios.

On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky 
wrote:

> DataStax has documented quite a few customers/case studies:
> http://www.datastax.com/resources/casestudies
>
> Materialized Views should be considered if you can go straight to 3.0, but
> you can always do the same synthesized views yourself in your app, which is
> current standard best practice anyways. MV is just a way to automate that
> best practice.
>
> The key to performance is to characterize your load requirements and then
> make sure to provision your cluster with enough nodes to support that load.
> You'll have to do a proof of concept implementation to verify your own
> requirements. Like start with a 6 or 8 node cluster for a subset of the
> data and add nodes as needed to accommodate load. The trick is to limit the
> amount of data on each node so that incoming requests can be processed as
> rapidly as possible to meet latency requirements, and then to scale up load
> capacity by adding nodes.
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal  wrote:
>
>> *Thanks Jack* *for the detailed advice*.
>>
>> Yes it is a Java Application.
>>
>> We have a Denormalized view of our data already in place,  we use it for
>> storing it in MongoDB as a cache, however will get our hands dirty before
>> implementation. We would like to have a single DB view. And replace MongoDB
>> & MySQL with a single data store. If we talk numbers then we can expect 10
>> Million create/update requests a day and ~500 Million read requests.
>>
>> The question here not "should I or should I not", but "which one".
>>
>> A lot of the features you have mentioned are supported but not advisable. 
>> *(automated
>> Materialized View feature) (Triggers are supported, but not advised)
>> (Secondary indexes are supported, but not advised). *By when do you
>> believe that these will be stable enough to use for enterprise
>> implementation?
>>
>> We have made our minds clear far as shift to NoSQL is concerned as MySQL
>> is not able to serve our purpose and is currently a bottleneck in the
>> design.
>>
>>  From all the benchmarks we have analyzed for our use case, Cassandra
>> seems to be doing better as far as performance is concerned.  Our only
>> concern is to know as a Primary Database how Cassandra compares with HBase.
>> By Primary database I mean the attributes: Data Consistency, Transaction
>> Management and Rollback, brisk Failure Recovery, cross datacenter
>> replication and partition aware sharding.
>>
>> The general opinion of Cassandra is that its more of a cache, and as we
>> are going to be replacing our primary Data Store we need something fast but
>> not at the expense of reliability. Can you guide me towards a case study
>> where someone has tuned it in such a way to perform reliably for most use
>> cases.
>>
>> Also Ill be grateful if someone directs me to a repository where I can
>> find major customers of the DB's and their case studies.
>>
>> Thanks & Regards,
>> Bhuvan
>>
>> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky 
>> wrote:
>>
>>> Bear in mind that you won't be able to merely "tune" your schema - you
>>> will need to completely redesign your data model. Step one is to look at
>>> all of the queries you need to perform and get a handle on what flat,
>>> denormalized data model they will need to execute performantly in a NoSQL
>>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>>> not advised. The general model is that you have a "query table" for each
>>> form of query, with the primary key adapted to the needs of the query. That
>>> means a lot of denormalization and repetition of data. The new, automated
>>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>>> a new feature and not quite stable enough for production (no DataStax
>>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>>> advised - better to do that processing at the application level. DSE also
>>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>>
>>> Best to start with a basic proof of concept implementation to get your
>>> feet wet and learn the ins and outs before making a full commitment.
>>>
>>> Is this a Java app? The Java Driver is where you need to get started in
>>> terms of ingesting and querying data. It's a bit more sophisticated than
>>> just a simple JDBC interface. Most of your queries will need to be
>>> rewritten anyway even though the CQL syntax does indeed look a lot like
>>> SQL, but much of that will be because your data model will need to be made
>>> NoSQL-compatible.
>>>
>>> That 

Important notice for upgrades from 2.2.X to 3.Y

2016-01-07 Thread Sylvain Lebresne
The native protocol is the name we give to the protocol used between CQL
drivers and the server. That protocol is versioned and a new version,
version
4, was introduced in Cassandra 2.2.0. We recently uncovered a compatibility
bug
in that 4th version (https://issues.apache.org/jira/browse/CASSANDRA-10880)
that made said version of the protocol not fully compatible between 2.2.X
and
3.Y. As a consequence, you _must_ ensure that your clients use the protocol
version 3 if you plan an upgrade from any 2.2.X version to any 3.Y version.
Ensuring that might requires forcing the protocol version in the client
driver
used. For instance, in the DataStax Java driver, you can do so by calling
`.withProtocolVersion(ProtocolVersion.V3)` on your `Cluster.Builder` object.

The bug in question affects the automatic paging of result sets that the
protocol provides: the first page or results is always sent correctly, but
requesting the next pages might result in a failure. This mean that in
theory
you can disregard this problem if you know that you are not using said
paging,
but we still strongly encourage sticking to the protocol v3 for upgrade as
the
downsides are very minor (see below) and not worth taking risk.

The changes in the protocol v4 are relatively minor, and so forcing the use
of
v3 are relatively minor downsides, namely:
- the use of the recently added CQL type `date`, `time`, `tinyint` and
  `smallint` involves sending slightly bigger metadata in v3 that in v4. The
  resulting performance difference is unlikely to be noticeable.
- schema changes related to User Defined Functions are notified to clients
as a
  "keyspace change" in v3 which, being imprecise, might require a client
driver
  to request more schema metadata to update its own copy of said metadata.
This
  is again a very minor inefficiently.
- The protocol v4 has a feature that allows the server to send warnings to
the
  clients. This is as of yet little used by the server and in which case
where
  it is, the warning is also logged server side.

Note that using the protocol v4 is fine once you have finished with your
upgrade and all your nodes are on 3.Y. The problem is only with cluster
mixing
2.2.X and 3.Y nodes.

--
The Cassandra dev team


Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread anuja jain
HI All,
 If suppose I have a cassandra table with structure
CREATE TABLE test.t1 (
col1 text,
col2 text,
col3 text,
col4 text,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 ASC, col3 ASC, col4 ASC);

and it has following data

 col1 | col2 | col3 | col4
--+--+--+--
  abc |  abc |  abc |  abc

and I query the table saying
select * from t1 where col1='abc' order by col3;

it gives me following error
InvalidRequest: code=2200 [Invalid query] message="Order by currently only
support the ordering of columns following their declared order in the
PRIMARY KEY"

While reading on docs I came to know that only the first clustering column
can ordered by independently and for the other columns we need to follow
the sequence of the clustering columns.
My question is, what is the alternative if we need to order by col3 or col4
in my above example without including col2 in order by clause.


Thanks,
Anuja


How do I upgrade from 2.0.16 to 2.0.17 in my case????

2016-01-07 Thread Vasileios Vlachos
Hello,

My problem is described CASSANDRA-10872
. I upgraded a
second node on the same cluster in case there was something special with
the first node but I experienced identical behaviour. Both cassandra-env.sh
and cassandra-rackdc.properties were replaced
causing the node to come up in the default data centre DC1.

What is the best way to upgrade to 2.0.17 in a safe manner in this case?
How do we work around this?

Thanks,
Vasilis


Re: How to make the result of select dateof(now) from system.local be equal to date command ?

2016-01-07 Thread Adam Holmberg
The database timestamp is UTC, which should be the same as date -u on your
system. cqlsh does not convert query results to local timezone, but it
should be easy to do in your application code.

Independent of the above, I'm a little confused by your example because I
would expect CST to be UTC - 06:00, while the example is showing UTC +
08:00. You might want to check the date settings on your local machine and
the database.

Regards,
Adam Holmberg

On Thu, Jan 7, 2016 at 1:58 AM, 土卜皿  wrote:

> Hi, all
>
> When I run the command date:
>
> [root@localhost ~]# date
> Thu Jan  7 15:47:32 CST 2016
>
> But under cqlsh, I got a different time:
>
> [root@localhost ~]# cqlsh -u admin -p admin4587 172.21.0.131
> Connected to Cassandra Center at 172.21.0.131:9042.
> [cqlsh 5.0.1 | Cassandra 2.1.11 | CQL spec 3.2.1 | Native protocol v3]
> Use HELP for help.
> admin@cqlsh> select dateof(now()) from system.local;
>
> dateof(now())
> --
> 2016-01-07 07:47:32+
>
> (1 rows)
>
> What should I do for getting the same time 15:47:32 other than 07:47:32?
>
>
> Dillon Peng
>


Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-07 Thread Hiroyuki Yamada
Thanks Tyler.

I've read the python document and it's a bit more clear than before,
but i'm still confused at what combinations make lightweight transaction
operations work correctly.

So, let me clarify the conditions where lightweight transactions work.

QUORUM conditional write -> QUORUM read => OK (meets linearizability)
ANY conditional write -> SERIAL read =>  OK (meets linearizability)
ONE conditional write -> SERIAL read => OK ?
SERIAL conditional write -> ??? read => ERROR for some reasons (why?)

One question is that my understanding about the top 2 conditions are
correct ?
And the other question is "ONE conditional write - SERIAL read" is ok ?
Also, why SERIAL conditional write fails
even though SERIAL conditional write with (for example) ANY read afterwards
seems logically OK ?

The following document says that it seems like we can specify SERIAL in
writes,
so, when should I use SERIAL in writes except conditional writes (, which
fails) ?
<
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>


Thanks,
Hiro



On Fri, Jan 8, 2016 at 2:44 AM, Tyler Hobbs  wrote:

> The python driver docs explain this pretty well, I think:
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>
> On Thu, Jan 7, 2016 at 3:44 AM, Hiroyuki Yamada 
> wrote:
>
>> Hi,
>>
>> I've been doing some POCs of lightweight transactions and
>> I come up with some questions, so please let me ask them to you here.
>>
>> So the question is:
>> what consistency level should I set when using IF NOT EXIST or UPDATE IF
>> statements ?
>>
>> I used the statements with ONE and QUORUM first, then it seems fine.
>> But, when I set SERIAL, it gave me the following error.
>>
>> === error message ===
>> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
>> SERIAL is not supported as conditional update commit consistency. Use ANY
>> if you mean "make sure it is accepted but I don't care how many replicas
>> commit it for non-SERIAL reads"
>> === error message ===
>>
>>
>> So, I'm wondering what's SERIAL for when writing (and reading) and
>> what the differences are in setting ONE, QUORUM and ANY when using IF NOT
>> EXIST or UPDATE IF statements.
>>
>> Could you give me some advises ?
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Michael Shuler
On 01/07/2016 08:46 PM, Michael Shuler wrote:
> I'm not sure exactly what that service is, but if all 4 nodes (which are
> all really localhost aliases) are attempting to bind to the same IP:port
> for that stomp connection, they could be stepping on one another. Should
> those be 127.0.0.1 for node1, 127.0.0.12 for node2, etc.?

Since accurate typing is eluding me..

Should the stomp connection be 127.0.0.1 for node1, 127.0.0.2 for node2,
127.0.0.3 for node3, 127.0.0.4 for node4?

-- 
:)
Michael


Too many compactions, maybe keyspace system?

2016-01-07 Thread Shuo Chen
Hi,
I am using Cassandra 2.0.16 with 4 nodes and found too many compactions for
this cluster. This caused too much full gc and choked the system. I have
discussed the high gc in previous mails but didnot get the satisfied
answers.

To clarify the source of the compactions, I shutdown all the clients and
there is no reading and writing requests outside. Besides,
 hinted_handoff_enabled: false is set. The cluster is restarted yesterday
and have run for 1.5 days.

The heap is 8G and all the compaction settings are default settings. I
printed parts of gc histogram as follows:

 num #instances #bytes  class name
--
   1:  67758530 2168272960  java.util.concurrent.FutureTask
   2:  67759745 1626233880
 java.util.concurrent.Executors$RunnableAdapter
   3:  67758576 1626205824
 java.util.concurrent.LinkedBlockingQueue$Node
   4:  67758529 1626204696
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
   5: 16935   72995576  [B
   6:240534   11545632  java.nio.HeapByteBuffer
   7: 374165969800  [C
   8: 414475624856  
   9: 414475315504  
  10:1048505032800
 edu.stanford.ppl.concurrent.SnapTreeMap$Node
  11:  41104564144  
  12:1047813352992  org.apache.cassandra.db.Column
  13:  41102824016  
  14:  48502689568  [J
  15:  35642634240  
  16:  30161252544  
  17: 37192 892608  java.lang.String
  18: 17641 564512
 java.util.concurrent.ConcurrentHashMap$HashEntry

As we can see in this list, objects of compaction tasks remain in the heap.

The nodetool compactionstats as follows:
pending tasks: 64637410
Active compaction remaining time :n/a

Parts of nodetool compactionhistory:

Compaction History:
id   keyspace_name
 columnfamily_namecompacted_at  bytes_in
bytes_out  rows_merged
8e4f8830-b04f-11e5-a211-45b7aa88107c system
sstable_activity 1451629144115 3342   915
 {4:23}
96a6fcb0-b04b-11e5-a211-45b7aa88107c system hints
 1451627440123 18970740   18970740   {1:1}
aefc3f10-b1b2-11e5-a211-45b7aa88107c system
sstable_activity 1451781670273 3201   906
 {4:23}
1e76f1b0-b180-11e5-a211-45b7aa88107c system
sstable_activity 1451759952971 3303   700
 {4:23}
e4d7d220-b518-11e5-ac1a-45b7aa88107c system hints
 1452155422786 78374007746243{1:1}
f305ddd0-b52a-11e5-bce5-45b7aa88107c system hints
 1452163177517 55094720  {3:1}
5c5b15b0-b581-11e5-bce5-45b7aa88107c system
sstable_activity 1452200290955 3387   866
 {4:23}
9cc8af70-b24a-11e5-a211-45b7aa88107c system
sstable_activity 1451846923239 3134   773
 {4:23}
cf9e9400-b439-11e5-ac1a-45b7aa88107c system hints
 1452059609408 10524457   10490132   {3:1}
ea1b5140-b2e2-11e5-a211-45b7aa88107c system
sstable_activity 1451912336468 3243   804
 {4:23}
a4b41910-b2b1-11e5-a211-45b7aa88107c system hints
 1451891174689 15426047   15236142   {1:3}
f1bb37a0-b204-11e5-a211-45b7aa88107c system hints
 1451817000986 17713746   17566349   {1:1}
fcb5bd60-b4d9-11e5-ac1a-45b7aa88107c system
sstable_activity 1452128404534 3146   775
 {4:22}
dcd0c720-b0b4-11e5-a211-45b7aa88107c system
sstable_activity 1451672654993 3189   797
 {4:23}
ba0f2d60-b50c-11e5-ac1a-45b7aa88107c system
sstable_activity 1452150197046 3055   739
 {4:22}
a0681d00-b3ad-11e5-a211-45b7aa88107c system
sstable_activity 1451999400656 3082   770
 {4:23}
ce6b4d90-b35c-11e5-a211-45b7aa88107c system hints
 1451964688617 13546056   13363986   {1:2}
a17d8f80-b24b-11e5-a211-45b7aa88107c system hints
 1451847360632 16746317   16600978   {1:1}
1549d4a0-b51c-11e5-bce5-45b7aa88107c
webtrn_study_log_formallySCORM_STU_COURSE 1452156792554
46077372   41485236   {1:9, 2:4, 3:2, 4:5}
68e064a0-b439-11e5-ac1a-45b7aa88107c system local
 1452059437034 1255   680{4:1}
38591c10-b22b-11e5-a211-45b7aa88107c system hints
 

Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Michael Shuler
On 01/07/2016 02:09 AM, Giampaolo Trapasso wrote:
> I've configured all the four agents. For example /agent3/ configuration is
> 
> |[Giampaolo]: ~/opscenter/> cat agent3/conf/address.yaml stomp_interface:
> "127.0.0.1" agent_rpc_interface: 127.0.0.3 jmx_host: 127.0.0.3 jmx_port:
> 7300 |

This looks suspect. Each agent is configured for
stomp_interface:"127.0.0.1"?

I'm not sure exactly what that service is, but if all 4 nodes (which are
all really localhost aliases) are attempting to bind to the same IP:port
for that stomp connection, they could be stepping on one another. Should
those be 127.0.0.1 for node1, 127.0.0.12 for node2, etc.?

-- 
Michael



Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Nick Bailey
stomp_interface is the address to connect back to the central OpsCenter
daemon with, so 127.0.0.1 should be correct. I believe the issue is just
jmx_host needing to be set to 'localhost'

On Thu, Jan 7, 2016 at 8:50 PM, Michael Shuler 
wrote:

> On 01/07/2016 08:46 PM, Michael Shuler wrote:
> > I'm not sure exactly what that service is, but if all 4 nodes (which are
> > all really localhost aliases) are attempting to bind to the same IP:port
> > for that stomp connection, they could be stepping on one another. Should
> > those be 127.0.0.1 for node1, 127.0.0.12 for node2, etc.?
>
> Since accurate typing is eluding me..
>
> Should the stomp connection be 127.0.0.1 for node1, 127.0.0.2 for node2,
> 127.0.0.3 for node3, 127.0.0.4 for node4?
>
> --
> :)
> Michael
>


Re: Using CCM with Opscenter and manual agent installation

2016-01-07 Thread Michael Shuler
On 01/07/2016 10:17 PM, Nick Bailey wrote:
> stomp_interface is the address to connect back to the central OpsCenter
> daemon with, so 127.0.0.1 should be correct. I believe the issue is just
> jmx_host needing to be set to 'localhost'

This indeed looks promising, thanks Nick!

mshuler@hana:~$ ccm status
Cluster: 'test'
---
node1: UP
node3: UP
node2: UP
mshuler@hana:~$ netstat -ltunp|egrep '7.00'|grep java
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp0  0 127.0.0.1:7100  0.0.0.0:*
LISTEN  19006/java
tcp0  0 127.0.0.1:7200  0.0.0.0:*
LISTEN  18994/java
tcp0  0 127.0.0.1:7300  0.0.0.0:*
LISTEN  19021/java
tcp0  0 127.0.0.3:7000  0.0.0.0:*
LISTEN  19021/java
tcp0  0 127.0.0.2:7000  0.0.0.0:*
LISTEN  18994/java
tcp0  0 127.0.0.1:7000  0.0.0.0:*
LISTEN  19006/java

-- 
Kind regards,
Michael


Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

2016-01-07 Thread Bamoqi
My consideration is that whether doing so will result in better 
memory/disk cache locality.


Suppose I need to query for 2 different types of rows for a frequent 
user request, I can use 2 tables or 1 table:


2 tables:

  create table t1(
partitionkey int primary key,
col1 int, col2 int, ...
  )
  create table t2(
partitionkey int primary key,
col3 int, col4 int, ...
  )

query-2table:
  select col1,col2 from t1 where partitionkey = ?
  select col3,col4 from t1 where partitionkey = ?

1 table:

  create table t(
partitionkey int,
rowtype tinyint,
col1 int, col2 int, ...
col3 int, col4 int, ...
primary key( partitionkey, rowtype )
  )

query-1table-a:
  select col1,col2 from t where partitionkey = ? and type = 1
  select col3,col4 from t where partitionkey = ? and type = 2

or alternatively, query-1table-b:
  select type,col1,col2,col3,col4 from t where partitionkey = ?
  // switch on `type` in the app code

Is there significant performance difference in query-2table, 
query-1table-a, query-1table-b?
Is the cassandra client/coordinator smart enough to direct subsequent 
queries of the same (table, partitionkey) to the same node so they can 
reuse a cached page?


Regards & Thanks


RE: Data rebalancing algorithm

2016-01-07 Thread Alec Collier
Have a look at this:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

The vnodes mechanism is there to provide better scalability as new nodes are 
added/removed, by allowing a single node to own several small chunks of the 
token range.

Aside from that, the process is exactly the same as in the single node case, 
the coordinator calculates the token based on partition key and locates the 
responsible node in the same way. SSTables are located on the node’s disk per 
Cassandra table, no reference to vnodes at all. The term virtual nodes is a bit 
misleading in that sense.

Actually, Cassandra does have a total number of vnodes per cluster. Its set 
with the num_tokens parameter in the Cassandra.yaml.

Alec

From: Sergi Vladykin [mailto:sergi.vlady...@gmail.com]
Sent: Friday, 25 December 2015 8:31 AM
To: user@cassandra.apache.org
Subject: Re: Data rebalancing algorithm

Thanks a lot for your answers!
Paulo, I'll take a look at classes you've suggested.
Jack, the link you've provided lacks description on how virtual nodes are 
mapped to phisical sstables/indexes on disk.
To be more exact, I have the following better detailed questions:

1. How vnodes are mapped to sstables and indexes? Is one vnode a separate part 
of the sstable or all the data from all vnodes just mixed in SSTable or may be 
something else?

2. As far as I see Cassandra does not have predefined constant total number of 
vnodes for the whole cluster, right? Does it mean that on rebalancing some 
parts of data already mapped to some vnodes will be remapped to new vnodes on 
the new node?
3. How long can take the rebalancing if we have lets say 1TB of data on a 
single node and we are adding one more node to the cluster?

Sergi


2015-12-24 19:26 GMT+03:00 Jack Krupansky 
>:
Read details here:
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html


-- Jack Krupansky

On Thu, Dec 24, 2015 at 11:09 AM, Paulo Motta 
> wrote:
The new node will own some parts (ranges) of the ring according to the ring 
tokens the node is responsible for. These tokens are defined from the yaml 
property initial_token (manual assignment) or num_tokens (random assignment).

During the bootstrap process raw data from sstables sections containing the 
ranges the node is responsible for are transferred from nodes that previously 
owned the range to the new node so the source sstables are rebuilt in the 
joining node. After each sstable is transferred the new node it rebuilds 
primary and secondary indexes, bloom filters, etc and in the end of the 
bootstrap process the new sstables are added to the live data set.
See org.apache.cassandra.dht.BootStrapper.java and 
org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for more 
information.
ps: I don't particularly recall any document with specific details, so if 
anyone knows please be welcome to share. If you want more theoretical 
information, see the ring membership sections of the cassandra and/or dynamo 
paper.


2015-12-24 13:14 GMT-02:00 Sergi Vladykin 
>:
Guys,
I was not able to find in docs or in google detailed description of data 
rebalancing algorithm.
I mean how Cassandra moves SSTables when new node connects to the cluster, how
primary and secondary indexes are getting transfered to this new node, etc..

Can anyone provide relevant links please or just reply here?
I can read source code of course, but it would be nice if someone could answer 
right away :)

Sergi




This email, including any attachments, is confidential. If you are not the 
intended recipient, you must not disclose, distribute or use the information in 
this email in any way. If you received this email in error, please notify the 
sender immediately by return email and delete the message. Unless expressly 
stated otherwise, the information in this email should not be regarded as an 
offer to sell or as a solicitation of an offer to buy any financial product or 
service, an official confirmation of any transaction, or as an official 
statement of the entity sending this message. Neither Macquarie Group Limited, 
nor any of its subsidiaries, guarantee the integrity of any emails or attached 
files and are not responsible for any changes made to them by any other person.


Re: Data rebalancing algorithm

2016-01-07 Thread Jonathan Haddad
num_tokens is the number of tokens per node, not per cluster.

On Thu, Jan 7, 2016 at 10:09 PM Alec Collier 
wrote:

> Have a look at this:
>
> http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
>
>
>
> The vnodes mechanism is there to provide better scalability as new nodes
> are added/removed, by allowing a single node to own several small chunks of
> the token range.
>
>
>
> Aside from that, the process is exactly the same as in the single node
> case, the coordinator calculates the token based on partition key and
> locates the responsible node in the same way. SSTables are located on the
> node’s disk per Cassandra table, no reference to vnodes at all. The term
> virtual nodes is a bit misleading in that sense.
>
>
>
> Actually, Cassandra does have a total number of vnodes per cluster. Its
> set with the num_tokens parameter in the Cassandra.yaml.
>
>
>
> Alec
>
>
>
> *From:* Sergi Vladykin [mailto:sergi.vlady...@gmail.com]
> *Sent:* Friday, 25 December 2015 8:31 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Data rebalancing algorithm
>
>
>
> Thanks a lot for your answers!
>
> Paulo, I'll take a look at classes you've suggested.
>
> Jack, the link you've provided lacks description on how virtual nodes are
> mapped to phisical sstables/indexes on disk.
>
> To be more exact, I have the following better detailed questions:
>
>
>
> 1. How vnodes are mapped to sstables and indexes? Is one vnode a separate
> part of the sstable or all the data from all vnodes just mixed in SSTable
> or may be something else?
>
>
>
> 2. As far as I see Cassandra does not have predefined constant total
> number of vnodes for the whole cluster, right? Does it mean that on
> rebalancing some parts of data already mapped to some vnodes will be
> remapped to new vnodes on the new node?
>
> 3. How long can take the rebalancing if we have lets say 1TB of data on a
> single node and we are adding one more node to the cluster?
>
>
>
> Sergi
>
>
>
>
>
> 2015-12-24 19:26 GMT+03:00 Jack Krupansky :
>
> Read details here:
>
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>
>
>
>
> -- Jack Krupansky
>
>
>
> On Thu, Dec 24, 2015 at 11:09 AM, Paulo Motta 
> wrote:
>
> The new node will own some parts (ranges) of the ring according to the
> ring tokens the node is responsible for. These tokens are defined from the
> yaml property initial_token (manual assignment) or num_tokens (random
> assignment).
>
> During the bootstrap process raw data from sstables sections containing
> the ranges the node is responsible for are transferred from nodes that
> previously owned the range to the new node so the source sstables are
> rebuilt in the joining node. After each sstable is transferred the new node
> it rebuilds primary and secondary indexes, bloom filters, etc and in the
> end of the bootstrap process the new sstables are added to the live data
> set.
>
> See org.apache.cassandra.dht.BootStrapper.java and
> org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for
> more information.
>
> ps: I don't particularly recall any document with specific details, so if
> anyone knows please be welcome to share. If you want more theoretical
> information, see the ring membership sections of the cassandra and/or
> dynamo paper.
>
>
>
>
>
> 2015-12-24 13:14 GMT-02:00 Sergi Vladykin :
>
> Guys,
>
> I was not able to find in docs or in google detailed description of data
> rebalancing algorithm.
>
> I mean how Cassandra moves SSTables when new node connects to the cluster,
> how
>
> primary and secondary indexes are getting transfered to this new node,
> etc..
>
> Can anyone provide relevant links please or just reply here?
>
> I can read source code of course, but it would be nice if someone could
> answer right away :)
>
>
>
> Sergi
>
>
>
>
>
>
>
> This email, including any attachments, is confidential. If you are not the
> intended recipient, you must not disclose, distribute or use the
> information in this email in any way. If you received this email in error,
> please notify the sender immediately by return email and delete the
> message. Unless expressly stated otherwise, the information in this email
> should not be regarded as an offer to sell or as a solicitation of an offer
> to buy any financial product or service, an official confirmation of any
> transaction, or as an official statement of the entity sending this
> message. Neither Macquarie Group Limited, nor any of its subsidiaries,
> guarantee the integrity of any emails or attached files and are not
> responsible for any changes made to them by any other person.
>


Re: Revisit Cassandra EOL Policy

2016-01-07 Thread Janne Jalkanen

If you wish to have a specific EOL policy, you need to basically buy it. It's 
unusual for open source projects to give any sort of an EOL policy; that's 
something that people with very specific requirements are willing to cough up a 
lot of money on. And getting money by giving support on older versions, having 
contracts and EOL dates and all that stuff that corporations love is something 
that enables companies to actually make money on open source projects.

Have you considered contacting Datastax and checked their Cassandra EOL policy? 
 They seem to be very well aligned on what you are looking for.

http://www.datastax.com/support-policy#9 


/Janne

> On 07 Jan 2016, at 03:26, Anuj Wadehra  wrote:
> 
> I would appreciate if you guys share your thoughts on the concerns I 
> expressed regarding Cassandra End of Life policy. I think these concerns are 
> quite genuine and should be openly discussed so that EOL is more predictable 
> and generates less overhead for the users.
> 
> I would like to understand how various users are dealing with the situation. 
> Are you upgrading Cassandra every 3-6 mths? How do you cut short your 
> planning,test and release cycles for Cassandra upgrades in your 
> application/products?
> 
> 
> 
> 
> Thanks
> Anuj
> 
> 
> 
> On Tue, 5 Jan, 2016 at 8:04 pm, Anuj Wadehra
>  wrote:
> Hi,
> 
> As per my understanding, a Cassandra version n is implicitly declared EOL 
> when two major versions are released after the version n i.e. when version n 
> + 2 is released.
> 
> I think the EOL policy must be revisted in interest of the expanding 
> Cassandra user base. 
> 
> Concerns with current EOL Policy:
> 
> In March 2015, Apache web site mentioned that 2.0.14 is the most stable 
> version of the Cassandra recommended for Production. So, one would push its 
> clients to upgrade to 2.0.14 in Mar 2015. It takes months to roll out a 
> Cassandra upgrade to all your clients and by the time all your clients get 
> the upgrade, the version is declared EOL with the release of 2.2 in Aug 2015 
> (within 6 mths of being declared production ready). I completely understand 
> that supporting multiple versions is tougher but at the same time it is very 
> painful and somewhat unrealistic for users to push Cassandra upgrades to all 
> thier clients after every few months.
> 
> One proposed solution could be to declare a version n as EOL one year after 
> n+1 was declared Production Ready. E.g. if 2.1.7 is the first production 
> ready release of 2.1 which is released in Jun 2015, I would declare 2.0 EOL 
> in Jun 2016. This gives reasonable time for users to plan upgrades.
> 
> Moreover, I think the EOL policy and declarations must be documented 
> explicitly on Apache web site.
> 
> Please share your feedback on revisting the EOL policy.
> 
> Thanks
> Anuj
> 



Re: Revisit Cassandra EOL Policy

2016-01-07 Thread Robert Coli
On Wed, Jan 6, 2016 at 5:26 PM, Anuj Wadehra  wrote:

> I would like to understand how various users are dealing with the
> situation. Are you upgrading Cassandra every 3-6 mths? How do you cut short
> your planning,test and release cycles for Cassandra upgrades in your
> application/products?
>

I upgrade Cassandra an average of once a year.

I don't run X.Y.Z versions where Z is under 6, so in general this does not
result in me not-running-a-version-I-otherwise-would-have for longer than a
few months each year.

There is really not that much penalty to being behind the curve, in fact
there is often a significant penalty to being on the cutting edge.

=Rob


Re: Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread Tyler Hobbs
On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:

> My question is, what is the alternative if we need to order by col3 or
> col4 in my above example without including col2 in order by clause.
>

The server-side alternative is to create a second table (or a materialized
view, if you're using 3.0+) that uses a different clustering order.
Cassandra purposefully only supports simple and efficient queries that can
be handled quickly (with a few exceptions), and arbitrary ordering is not
part of that, especially if you consider complications like paging.


-- 
Tyler Hobbs
DataStax 


Re: Revisit Cassandra EOL Policy

2016-01-07 Thread Maciek Sakrejda
Anuj, do you have a link to the versioning policy? The tick-tock versioning
blog post [1] says that EOL happens after two major versions come out, but
I can't find this stated more formally anywhere. I'm interested in how long
a given version will receive patches for security issues or critical data
loss bugs (i.e., the policy of the Apache project itself, distinct from any
support that may be available through Datastax). The Postgres project has a
great write-up of their policy [2].

And for what it's worth, we are starting to use Cassandra and do have
automation around it. I don't have strong feelings about what the
versioning policy should look like, but having clear expectations about
what happens if there's a critical bug (i.e., can we expect a patch or do
we need to upgrade major versions?) is very useful.

[1]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
[2]: http://www.postgresql.org/support/versioning/
​