Multiple Network Interfaces in non-EC2

2016-09-11 Thread Amir Dafny-Man
Hi,

I followed the docs 
(http://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/configMultiNetworks.html)
 and I am familiar with https://issues.apache.org/jira/browse/CASSANDRA-9748.
Trying to establish a single 2 nodes, single DC setup. I tried Cassandra 3.7 
but also 2.2.7.
My setup:
RHEL6 distribution
Node1 external: 10.240.33.241
Node1 internal: 192.168.33.241
Node2 external: 10.240.33.244
Node2 internal: 192.168.33.244

cassandra-rackdc.properties (for both nodes) also tried with prefer_local=false:
dc=vdra015-xs-15
rack=rack1
prefer_local=true

Cassandra.yaml (changes over default):
seeds: "10.240.33.241"
listen_address: 192.168.33.241 or 192.168.33.244
broadcast_address: 10.240.33.241 or 10.240.33.244
listen_on_broadcast_address: true
rpc_address: 192.168.33.241 or 192.168.33.244
endpoint_snitch: GossipingPropertyFileSnitch

Routing table:
# ip r
192.168.33.0/24 dev eth1  proto kernel  scope link  src 192.168.33.241
10.1.21.0/24 dev eth2  proto kernel  scope link  src 10.1.21.241
10.1.22.0/24 dev eth3  proto kernel  scope link  src 10.1.22.241
10.1.23.0/24 dev eth4  proto kernel  scope link  src 10.1.23.241
10.240.32.0/21 dev eth0  proto kernel  scope link  src 10.240.33.241
default via 10.240.32.1 dev eth0

Experienced behavior:

1.   Node1 starts up normally
# netstat -anlp|grep java
tcp0  0 127.0.0.1:55452 0.0.0.0:*   
LISTEN  10036/java
tcp0  0 127.0.0.1:7199  0.0.0.0:*   
LISTEN  10036/java
tcp0  0 10.240.33.241:7000  0.0.0.0:*   
LISTEN  10036/java
tcp0  0 192.168.33.241:7000 0.0.0.0:*   
LISTEN  10036/java
tcp0  0 :::192.168.33.241:9042  :::*
LISTEN  10036/java

2.   When I try to start node2, it is unable to connect to node1 IP set in 
seeds
Exception (java.lang.RuntimeException) encountered during startup: Unable to 
gossip with any seeds
java.lang.RuntimeException: Unable to gossip with any seeds

3.   Running tcpdumpon node2, I can see that node2 is trying to connect to 
node1 external IP but with its source internal IP
# tcpdump -nn -i eth0 port 7000
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
09:29:05.239026 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65015480 ecr 0,nop,wscale 
9], length 0
09:29:06.238188 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65016480 ecr 0,nop,wscale 
9], length 0
09:29:08.238159 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65018480 ecr 0,nop,wscale 
9], length 0
09:29:12.238129 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65022480 ecr 0,nop,wscale 
9], length 0
09:29:20.238129 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65030480 ecr 0,nop,wscale 
9], length 0
09:29:36.238161 IP 192.168.33.244.52900 > 10.240.33.241.7000: Flags [S], seq 
77957108, win 14600, options [mss 1460,sackOK,TS val 65046480 ecr 0,nop,wscale 
9], length 0

4.   Running tcpdump on node1, shows packets are not arriving

Is something configured wrong or is this a bug?

Thanks,
Amir



Re: [ANNOUNCEMENT] Website update

2016-09-11 Thread Ashish Disawal
Website looks great.
Good job guys.

--
Ashish Disawal

On Mon, Sep 12, 2016 at 3:00 AM, Jens Rantil  wrote:

> Nice! The website also feels snappier!
>
>
> On Friday, July 29, 2016, Sylvain Lebresne  wrote:
>
>> Wanted to let everyone know that if you go to the Cassandra website
>> (cassandra.apache.org), you'll notice that there has been some change.
>> Outside
>> of a face lift, the main change is a much improved documentation section
>> (http://cassandra.apache.org/doc/). As indicated, that documentation is a
>> work-in-progress and still has a few missing section. That documentation
>> is
>> maintained in-tree and contributions (through JIRA as any other
>> contribution)
>> is more than welcome.
>>
>> Best,
>> On behalf of the Apache Cassandra developers.
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>
>


Re: How to define blob column in Java?

2016-09-11 Thread Alexandr Porunov
Hello Andy,

Thank you very much!

Sincerely,
Alexandr

On Sun, Sep 11, 2016 at 9:53 PM, Andrew Tolbert  wrote:

> Hi Alexandr,
>
> I am assuming you are referring to the @Table annotation in the mapping
> module in the Datastax Java Driver for Apache Cassandra (please correct me
> if I am wrong).
>
> You can achieve this with any of these three types using a custom codec
> ,
> but it will work as is using ByteBuffer.  Here's a quick example:
>
> import com.datastax.driver.core.Cluster;
> import com.datastax.driver.core.Session;
> import com.datastax.driver.core.utils.Bytes;
> import com.datastax.driver.mapping.Mapper;
> import com.datastax.driver.mapping.MappingManager;
> import com.datastax.driver.mapping.annotations.Column;
> import com.datastax.driver.mapping.annotations.PartitionKey;
> import com.datastax.driver.mapping.annotations.Table;
>
> import java.nio.ByteBuffer;
>
> public class MapperBlobExample {
>
> @Table(keyspace="ex", name="blob_ex")
> static class BlobEx {
>
> @PartitionKey
> int k;
>
> @Column
> ByteBuffer b;
>
> int getK() {
> return k;
> }
>
> void setK(int k) {
> this.k = k;
> }
>
> ByteBuffer getB() {
> return b;
> }
>
> void setB(ByteBuffer b) {
> this.b = b;
> }
> }
>
> public static void main(String args[]) {
> Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").
> build();
> try {
> Session session = cluster.connect();
> session.execute("CREATE KEYSPACE IF NOT EXISTS ex WITH
> replication = {'class': 'SimpleStrategy', 'replication_factor': 1};");
> session.execute("CREATE TABLE IF NOT EXISTS ex.blob_ex (k int
> PRIMARY KEY, b blob);");
>
> MappingManager manager = new MappingManager(session);
> Mapper mapper = manager.mapper(BlobEx.class);
>
> // insert row
> BlobEx ex = new BlobEx();
> ex.setK(0);
> ex.setB(Bytes.fromHexString("0xffee"));
> mapper.save(ex);
>
> // retrieve row
> BlobEx ex0 = mapper.get(0);
> System.out.println(Bytes.toHexString(ex0.getB()));
> } finally {
> cluster.close();
> }
> }
> }
>
> There are a few pitfalls around using ByteBuffer with the driver that you
> should be aware of (this example
> 
>  covers
> them).  The java-driver-user mailing list
> 
>  can
> also help.
>
> Thanks!
> Andy
>
> On Sun, Sep 11, 2016 at 1:50 AM Alexandr Porunov <
> alexandr.poru...@gmail.com> wrote:
>
>> Hello,
>>
>> I am using @Table annotation to define tables in cassandra. How properly
>> I need to define blob type in Java? With ByteBuffer, byte[], String?
>>
>> Sincerely,
>> Alexandr
>>
>


Re: Overhead of data types in cassandra

2016-09-11 Thread Eric Stevens
It's important to note that this answer differs quite significantly
depending on whether you're talking about Cassandra < 3.0 or >= 3.0

DataStax has a good article on < 3.0:
http://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architecturePlanningUserData_t.html
The Last Pickle has a good article on >= 3.0 (it's a lot more nuanced):
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

On Thu, Sep 8, 2016 at 12:12 PM Oleksandr Petrov 
wrote:

> You can find the information about that in Cassandra source code, for
> example. Search for serializers, like BytesSerializer:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/serializers/BytesSerializer.java
>  to
> get an idea how the data is serialized.
>
> But I'd also check out classes like Cell and SSTable structure to get an
> overview on what's the data layout.
>
> On Thu, Sep 8, 2016 at 4:23 AM Alexandr Porunov <
> alexandr.poru...@gmail.com> wrote:
>
>> Hello,
>>
>> Where can I find information about overhead of data types in cassandra?
>> I am interested about blob, text, uuid, timeuuid data types. Does a blob
>> type store a value with the length of the blob data? If yes then which type
>> of the length it is using (int, bigint)?
>> If I want to store 80 bits how much of disk space will be used for it? If
>> I want to store 64 bits is it better to use bigint?
>>
>> Sincerely,
>> Alexandr
>>
> --
> Alex Petrov
>


Re: Finding records that exist on Cassandra but not externally

2016-09-11 Thread Eric Stevens
I might be inclined to include a generation ID in the partition keys.  Keep
a separate table where you upgrade the generation ID when your processing
is complete.  You can even use CAS operations in case you goofed up and
generated two generations at the same time (or your processing time exceeds
your processing period) so that you'd know which generation failed.  Maybe
generation is a TimeUUID (don't do an integer, even though the generation
update would fail, you'd still have had two processes writing data to the
same generation ID).  Also this way if processing fails part way through
you don't end up with either corrupted or incomplete state.

Then in your primary table write data with a TTL of some whole multiple of
your expected processing period (I wouldn't recommend you make it close to
your processing period unless you are concerned about the storage costs,
things go wrong in processing, you don't want to end up with the most
recent active generation having fully expired).

It is an anti pattern to repeatedly overwrite the same data in Cassandra,
even if the prior data is TTL'd out.  You'll find that you still end up
having to seek far more SSTables than you anticipate due to the counter
intuitive way that tombstones and expired TTL's are expunged, though in
this exact pattern there's a few optimizations (I'm thinking of
tombstone-only compaction) which would make it less painful that it would
be for even very minor deviations from that pattern.

On Thu, Sep 8, 2016 at 5:32 AM Jens Rantil  wrote:

> Hi again Chris,
>
> Another option would be to have a look at using a Merkle Tree to quickly
> drill down to the differences. This is actually what Cassandra uses
> internally when running a repair between different nodes.
>
> Cheers,
> Jens
>
> On Wed, Sep 7, 2016 at 9:47 AM  wrote:
>
>> First off I hope this appropriate here- I couldn't decide whether this
>> was a question for Cassandra users or spark users so if you think it's in
>> the wiring place feel free to redirect me.
>>
>> I have a system that does a load of data manipulation using spark.  The
>> output of this program is a effectively the new state that I want my
>> Cassandra table to be in and the final step is to update Cassandra so that
>> it matches this state.
>>
>> At present I'm currently inserting all rows in my generated state into
>> Cassandra. This works for new rows and also for updating existing rows but
>> doesn't of course delete any rows that were already in Cassandra but not in
>> my new state.
>>
>> The problem I have now is how best to delete these missing rows. Options
>> I have considered are:
>>
>> 1. Setting a ttl on inserts which is roughly the same as my data refresh
>> period. This would probably be pretty performant but I really don't want to
>> do this because it would mean that all data in my database would disappear
>> if I had issues running my refresh task!
>>
>> 2. Every time I refresh the data I would first have to fetch all primary
>> keys from Cassandra and, compare them to primary keys locally to create a
>> list of pks to delete before the insert. This seems the most logicaly
>> correct option but is going to result in reading vast amounts of data from
>> Cassandra.
>>
>> 3. Truncating the entire table before refreshing Cassandra. This has the
>> benefit of being pretty simple in code but I'm not sure of the performance
>> implications of this and what will happen if I truncate while a node is
>> offline.
>>
>> For reference the table is on the order of 10s of millions of rows and
>> for any data refresh only a very small fraction (<.1%) will actually need
>> deleting. 99% of the time I'll just be overwriting existing keys.
>>
>> I'd be grateful if anyone could shed some advice on the best solution
>> here or whether there's some better way I haven't thought of.
>>
>> Thanks,
>>
>> Chris
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>


Re: Cassandra and Kubernetes and scaling

2016-09-11 Thread David Aronchick
Please let me know if I can help at all!

On Sun, Sep 11, 2016 at 2:55 PM, Jens Rantil  wrote:

> Hi Aiman,
>
> I noticed you never got any reply. This might be of interest: http://blog.
> kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-
> pet-set.html
>
> Cheers,
> Jens
>
> On Tuesday, May 24, 2016, Aiman Parvaiz  wrote:
>
>> Looking forward to hearing from the community about this.
>>
>> Sent from my iPhone
>>
>> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz 
>> wrote:
>> >
>> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
>> and have a few follow up questions.  It seems that especially after v1.2 of
>> Kubernetes, and the upcoming 1.3 features, this would be a very viable
>> option of running Cassandra on.
>> >
>> > My questions pertain to HostIds and Scaling Up/Down, and are related:
>> >
>> > 1.  If a container's host dies and is then brought up on another host,
>> can you start up with the same PersistentVolume as the original container
>> had?  Which begs the question would the new container get a new HostId,
>> implying it would need to bootstrap into the environment?   If it's a
>> bootstrap, does the old one get deco'd/assassinated?
>> >
>> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
>> just kick off Bootstrapping the node into the cluster, but what if you need
>> to scale down?  Would the Container get deco'd by the scaling down process?
>> or just terminated, leaving you with potential missing replicas
>> >
>> > 3. Scaling up and increasing the RF of a particular keyspace, would
>> there be a clean way to do this with the kubernetes tooling?
>> >
>> > In the end I'm wondering how much of the Kubernetes + Cassandra
>> involves nodetool, and how much is just a Docker image where you need to
>> manage that all yourself (painfully)
>> >
>> > --
>> > --mike
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>
>


Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-11 Thread Ryan Svihla
1. A batch with updates to a single partition turns into a single mutation so 
partition writes aren't possible (so may as well use Unlogged batches)
2. Yes, so use local_serial or serial reads and all updates you want to honor 
LWT need to be LWT as well, this way everything is buying into the same 
protocol and behaving accordingly. 
3. LWT works with batch (has to be same partition). 
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html if 
condition doesn't fire none of the batch will (same partition will mean it'll 
be the same mutation anyway so there really isn't any magic going on).

Your biggest issue with such a design will be contention (as it would with an 
rdbms with say row locking), as by intent you're making all reads and writes 
block until any pending ones are complete. I'm sure there are a couple things I 
forgot but this is the standard wisdom. 

Regards,

Ryan Svihla

> On Sep 11, 2016, at 3:49 PM, Jens Rantil  wrote:
> 
> Hi,
> 
> This might be off-topic, but you could always use Zookeeper locking and/or 
> Apache Kafka topic keys for doing things like this.
> 
> Cheers,
> Jens
> 
>> On Tuesday, September 6, 2016, Bhuvan Rawal  wrote:
>> Hi,
>> 
>> We are working to solve on a multi threaded distributed design which in 
>> which a thread reads current state from Cassandra (Single partition ~ 20 
>> Rows), does some computation and saves it back in. But it needs to be 
>> ensured that in between reading and writing by that thread any other thread 
>> should not have saved any operation on that partition.
>> 
>> We have thought of a solution for the same - having a write_time column in 
>> the schema and making it static. Every time the thread picks up a job read 
>> will be performed with LOCAL_QUORUM. While writing into Cassandra batch will 
>> contain a LWT (IF write_time is read time) otherwise read will be performed 
>> and computation will be done again and so on. This will ensure that while 
>> saving partition is in a state it was read from.
>> 
>> In order to avoid race condition we need to ensure couple of things:
>> 
>> 1. While saving data in a batch with a single partition (Rows may be 
>> Updates, Deletes, Inserts) are they Isolated per replica node. (Not 
>> necessarily on a cluster as a whole). Is there a possibility of client 
>> reading partial rows?
>> 
>> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could 
>> there a chance of inconsistency in this case (When LWT is being used in 
>> batches).
>> 
>> 3. Is it possible to use multiple LWT in a single Batch? In general how does 
>> LWT performs with Batch and is Paxos acted on before batch execution?
>> 
>> Can someone help us with this?
>> 
>> Thanks & Regards,
>> Bhuvan
> 
> 
> -- 
> Jens Rantil
> Backend engineer
> Tink AB
> 
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
> 
> Facebook Linkedin Twitter
> 


Re: Cassandra and Kubernetes and scaling

2016-09-11 Thread Jens Rantil
Hi Aiman,

I noticed you never got any reply. This might be of interest:
http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-pet-set.html

Cheers,
Jens

On Tuesday, May 24, 2016, Aiman Parvaiz  wrote:

> Looking forward to hearing from the community about this.
>
> Sent from my iPhone
>
> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz  > wrote:
> >
> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
> and have a few follow up questions.  It seems that especially after v1.2 of
> Kubernetes, and the upcoming 1.3 features, this would be a very viable
> option of running Cassandra on.
> >
> > My questions pertain to HostIds and Scaling Up/Down, and are related:
> >
> > 1.  If a container's host dies and is then brought up on another host,
> can you start up with the same PersistentVolume as the original container
> had?  Which begs the question would the new container get a new HostId,
> implying it would need to bootstrap into the environment?   If it's a
> bootstrap, does the old one get deco'd/assassinated?
> >
> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
> just kick off Bootstrapping the node into the cluster, but what if you need
> to scale down?  Would the Container get deco'd by the scaling down process?
> or just terminated, leaving you with potential missing replicas
> >
> > 3. Scaling up and increasing the RF of a particular keyspace, would
> there be a clean way to do this with the kubernetes tooling?
> >
> > In the end I'm wondering how much of the Kubernetes + Cassandra involves
> nodetool, and how much is just a Docker image where you need to manage that
> all yourself (painfully)
> >
> > --
> > --mike
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Schema Disagreement vs Nodetool resetlocalschema

2016-09-11 Thread Jens Rantil
Hi Michael,

Did you ever get an answer on this? I'm curious to hear for future
reference.

Thanks,
Jens

On Monday, June 20, 2016, Michael Fong 
wrote:

> Hi,
>
>
>
> We have recently encountered several schema disagreement issue while
> upgrading Cassandra. In one of the cases, the 2-node cluster idled for over
> 30 minutes and their schema remain unsynced. Due to other logic flows,
> Cassandra cannot be restarted, and hence we need to come up an alternative
> on-the-fly. We are thinking to do a nodetool resetlocalschema to force the
> schema synchronization. How safe is this method? Do we need to disable
> thrift/gossip protocol before performing this function, and enable them
> back after resync completes?
>
>
>
> Thanks in advance!
>
>
>
> Sincerely,
>
>
>
> Michael Fong
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: [ANNOUNCEMENT] Website update

2016-09-11 Thread Jens Rantil
Nice! The website also feels snappier!

On Friday, July 29, 2016, Sylvain Lebresne  wrote:

> Wanted to let everyone know that if you go to the Cassandra website
> (cassandra.apache.org), you'll notice that there has been some change.
> Outside
> of a face lift, the main change is a much improved documentation section
> (http://cassandra.apache.org/doc/). As indicated, that documentation is a
> work-in-progress and still has a few missing section. That documentation is
> maintained in-tree and contributions (through JIRA as any other
> contribution)
> is more than welcome.
>
> Best,
> On behalf of the Apache Cassandra developers.
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Jens Rantil
Yes. `nodetool setstreamthroughput` is your friend.

On Sunday, September 11, 2016, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> Make sure there is no spike in the load-avg on the existing nodes, as that
> might affect your application read request latencies.
>
> On Sun, Sep 11, 2016, 17:10 Jens Rantil  > wrote:
>
>> Hi Bhuvan,
>>
>> I have done such expansion multiple times and can really recommend
>> bootstrapping a new DC and pointing your clients to it. The process is so
>> much faster and the documentation you referred to has worked out fine for
>> me.
>>
>> Cheers,
>> Jens
>>
>>
>> On Sunday, September 11, 2016, Bhuvan Rawal > > wrote:
>>
>>> Hi,
>>>
>>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
>>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
>>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
>>> take 7-8 hours.
>>>
>>> If this activity is performed serially then it will take 5-6 days. I had
>>> a look at CASSANDRA-7069
>>>  and a bit of
>>> discussion in the past at - http://grokbase.com/t/
>>> cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to
>>> know if the limitation is still applicable and race condition could occur
>>> in 3.6 version.
>>>
>>> If this is not the case can we add a new datacenter as mentioned here
>>> opsAddDCToCluster
>>> 
>>>  and
>>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
>>> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>>>
>>>
>>> Thanks & Regards,
>>> Bhuvan
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> 
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook  Linkedin
>> 
>>  Twitter 
>>
>>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: large number of pending compactions, sstables steadily increasing

2016-09-11 Thread Jens Rantil
I just want to chime in and say that we also had issues keeping up with
compaction once (with vnodes/ssd disks) and I also want to recommend
keeping track of your open file limit which might bite you.

Cheers,
Jens

On Friday, August 19, 2016, Mark Rose  wrote:

> Hi Ezra,
>
> Are you making frequent changes to your rows (including TTL'ed
> values), or mostly inserting new ones? If you're only inserting new
> data, it's probable using size-tiered compaction would work better for
> you. If you are TTL'ing whole rows, consider date-tiered.
>
> If leveled compaction is still the best strategy, one way to catch up
> with compactions is to have less data per partition -- in other words,
> use more machines. Leveled compaction is CPU expensive. You are CPU
> bottlenecked currently, or from the other perspective, you have too
> much data per node for leveled compaction.
>
> At this point, compaction is so far behind that you'll likely be
> getting high latency if you're reading old rows (since dozens to
> hundreds of uncompacted sstables will likely need to be checked for
> matching rows). You may be better off with size tiered compaction,
> even if it will mean always reading several sstables per read (higher
> latency than when leveled can keep up).
>
> How much data do you have per node? Do you update/insert to/delete
> rows? Do you TTL?
>
> Cheers,
> Mark
>
> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel  > wrote:
> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
> fix
> > issue) which seems to be stuck in a weird state -- with a large number of
> > pending compactions and sstables. The node is compacting about 500gb/day,
> > number of pending compactions is going up at about 50/day. It is at about
> > 2300 pending compactions now. I have tried increasing number of
> compaction
> > threads and the compaction throughput, which doesn't seem to help
> eliminate
> > the many pending compactions.
> >
> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
> latter
> > has fixed the issue in the past, but most recently I was getting OOM
> errors,
> > probably due to the large number of sstables. I upgraded to 2.2.7 and am
> no
> > longer getting OOM errors, but also it does not resolve the issue. I do
> see
> > this message in the logs:
> >
> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
> >> CompactionManager.java:610 - Cannot perform a full major compaction as
> >> repaired and unrepaired sstables cannot be compacted together. These
> two set
> >> of sstables will be compacted separately.
> >
> > Below are the 'nodetool tablestats' comparing a normal and the
> problematic
> > node. You can see problematic node has many many more sstables, and they
> are
> > all in level 1. What is the best way to fix this? Can I just delete those
> > sstables somehow then run a repair?
> >>
> >> Normal node
> >>>
> >>> keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 31905656
> >>>
> >>> Write Latency: 0.051713177939359714 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> SSTable count: 1908
> >>>
> >>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306,
> 0,
> >>> 0, 0, 0]
> >>>
> >>> Space used (live): 301894591442
> >>>
> >>> Space used (total): 301894591442
> >>>
> >>>
> >>>
> >>> Problematic node
> >>>
> >>> Keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 30520190
> >>>
> >>> Write Latency: 0.05171286705620116 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> SSTable count: 14105
> >>>
> >>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0,
> >>> 0, 0]
> >>>
> >>> Space used (live): 561143255289
> >>>
> >>> Space used (total): 561143255289
> >
> > Thanks,
> >
> > Ezra
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread sai krishnam raju potturi
Make sure there is no spike in the load-avg on the existing nodes, as that
might affect your application read request latencies.

On Sun, Sep 11, 2016, 17:10 Jens Rantil  wrote:

> Hi Bhuvan,
>
> I have done such expansion multiple times and can really recommend
> bootstrapping a new DC and pointing your clients to it. The process is so
> much faster and the documentation you referred to has worked out fine for
> me.
>
> Cheers,
> Jens
>
>
> On Sunday, September 11, 2016, Bhuvan Rawal  wrote:
>
>> Hi,
>>
>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
>> take 7-8 hours.
>>
>> If this activity is performed serially then it will take 5-6 days. I had
>> a look at CASSANDRA-7069
>>  and a bit of
>> discussion in the past at -
>> http://grokbase.com/t/cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster.
>> Wanted to know if the limitation is still applicable and race condition
>> could occur in 3.6 version.
>>
>> If this is not the case can we add a new datacenter as mentioned here
>> opsAddDCToCluster
>> 
>>  and
>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
>> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>>
>>
>> Thanks & Regards,
>> Bhuvan
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> 
>  Twitter 
>
>


Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Jens Rantil
Hi Bhuvan,

I have done such expansion multiple times and can really recommend
bootstrapping a new DC and pointing your clients to it. The process is so
much faster and the documentation you referred to has worked out fine for
me.

Cheers,
Jens

On Sunday, September 11, 2016, Bhuvan Rawal  wrote:

> Hi,
>
> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
> take 7-8 hours.
>
> If this activity is performed serially then it will take 5-6 days. I had a
> look at CASSANDRA-7069
>  and a bit of
> discussion in the past at - http://grokbase.com/t/
> cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to
> know if the limitation is still applicable and race condition could occur
> in 3.6 version.
>
> If this is not the case can we add a new datacenter as mentioned here
> opsAddDCToCluster
> 
>  and
> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>
>
> Thanks & Regards,
> Bhuvan
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-11 Thread Jens Rantil
Hi,

This might be off-topic, but you could always use Zookeeper locking and/or
Apache Kafka topic keys for doing things like this.

Cheers,
Jens

On Tuesday, September 6, 2016, Bhuvan Rawal  wrote:

> Hi,
>
> We are working to solve on a multi threaded distributed design which in
> which a thread reads current state from Cassandra (Single partition ~ 20
> Rows), does some computation and saves it back in. But it needs to be
> ensured that in between reading and writing by that thread any other thread
> should not have saved any operation on that partition.
>
> We have thought of a solution for the same - *having a write_time column*
> in the schema and making it static. Every time the thread picks up a job
> read will be performed with LOCAL_QUORUM. While writing into Cassandra
> batch will contain a LWT (IF write_time is read time) otherwise read will
> be performed and computation will be done again and so on. This will ensure
> that while saving partition is in a state it was read from.
>
> In order to avoid race condition we need to ensure couple of things:
>
> 1. While saving data in a batch with a single partition (*Rows may be
> Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
> necessarily on a cluster as a whole). Is there a possibility of client
> reading partial rows?
>
> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could
> there a chance of inconsistency in this case (When LWT is being used in
> batches).
>
> 3. Is it possible to use multiple LWT in a single Batch? In general how
> does LWT performs with Batch and is Paxos acted on before batch execution?
>
> Can someone help us with this?
>
> Thanks & Regards,
> Bhuvan
>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Strangeloop?

2016-09-11 Thread Eric Evans
Hi,

It may be somewhat late for such an email, but Strangeloop is next
week in St. Louis, and I'm curious how many from the Cassandra
community might be there.

I'll be there for the Papers We Love Conference on Wednesday, in
addition to the conference itself on Friday and Saturday (I'm speaking
on Saturday).  I'd love to chat with anyone interested over
beers/coffee/whatever, and if there were enough folks, we could set up
something a little more formal.

Let me know!

Cheers,

-- 
Eric Evans
john.eric.ev...@gmail.com


Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Bhuvan Rawal
Hi,

We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
leverage more memory instead of m4.2xlarge). Bootstrapping a node would
take 7-8 hours.

If this activity is performed serially then it will take 5-6 days. I had a
look at CASSANDRA-7069
 and a bit of
discussion in the past at -
http://grokbase.com/t/cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster.
Wanted to know if the limitation is still applicable and race condition
could occur in 3.6 version.

If this is not the case can we add a new datacenter as mentioned here
opsAddDCToCluster

and
bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
cassandra.yaml and rebuilding nodes simultaneously in the new dc?


Thanks & Regards,
Bhuvan


Re: How to define blob column in Java?

2016-09-11 Thread Andrew Tolbert
Hi Alexandr,

I am assuming you are referring to the @Table annotation in the mapping
module in the Datastax Java Driver for Apache Cassandra (please correct me
if I am wrong).

You can achieve this with any of these three types using a custom codec
,
but it will work as is using ByteBuffer.  Here's a quick example:

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.utils.Bytes;
import com.datastax.driver.mapping.Mapper;
import com.datastax.driver.mapping.MappingManager;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;

import java.nio.ByteBuffer;

public class MapperBlobExample {

@Table(keyspace="ex", name="blob_ex")
static class BlobEx {

@PartitionKey
int k;

@Column
ByteBuffer b;

int getK() {
return k;
}

void setK(int k) {
this.k = k;
}

ByteBuffer getB() {
return b;
}

void setB(ByteBuffer b) {
this.b = b;
}
}

public static void main(String args[]) {
Cluster cluster =
Cluster.builder().addContactPoint("127.0.0.1").build();
try {
Session session = cluster.connect();
session.execute("CREATE KEYSPACE IF NOT EXISTS ex WITH
replication = {'class': 'SimpleStrategy', 'replication_factor': 1};");
session.execute("CREATE TABLE IF NOT EXISTS ex.blob_ex (k int
PRIMARY KEY, b blob);");

MappingManager manager = new MappingManager(session);
Mapper mapper = manager.mapper(BlobEx.class);

// insert row
BlobEx ex = new BlobEx();
ex.setK(0);
ex.setB(Bytes.fromHexString("0xffee"));
mapper.save(ex);

// retrieve row
BlobEx ex0 = mapper.get(0);
System.out.println(Bytes.toHexString(ex0.getB()));
} finally {
cluster.close();
}
}
}

There are a few pitfalls around using ByteBuffer with the driver that you
should be aware of (this example

covers
them).  The java-driver-user mailing list

can
also help.

Thanks!
Andy

On Sun, Sep 11, 2016 at 1:50 AM Alexandr Porunov 
wrote:

> Hello,
>
> I am using @Table annotation to define tables in cassandra. How properly I
> need to define blob type in Java? With ByteBuffer, byte[], String?
>
> Sincerely,
> Alexandr
>