Re: Apache Cassandra transactions commit and rollback

2018-12-07 Thread Hiroyuki Yamada
Hi Ramya,

Scalar DB is one of the options.
https://github.com/scalar-labs/scalardb

But, first of all, please re-think about your design if you really need it.
For example, If eventual consistency between multiple rows are acceptable, and
writes are idempotent, then you should go with C* write with retries simply.
Using transaction is basically the last option.

Thanks,
Hiro

On Wed, Nov 28, 2018 at 10:27 PM Ramya K  wrote:
>
> Hi All,
>
>   I'm exploring Cassandra for our project and would like to know the best 
> practices for handling transactions in real time. Also suggest if any drivers 
> or tools are available for this.
>
>   I've read about Apache Kundera transaction layer over Cassandra, is there 
> bottlenecks with this.
>
>   Please suggest your views on this.
>
> Reagrds,
> Ramya.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-07 Thread Oleksandr Shulgin
On Thu, Dec 6, 2018 at 3:39 PM Riccardo Ferrari  wrote:

> To be honest I've never seen the OOM in action on those instances. My Xmx
> was 8GB just like yours and that let me think you have some process that is
> competing for memory, is it? Do you have any cron, any backup, anything
> that can trick the OOMKiller ?
>

Riccardo,

As I've mentioned previously, apart from docker running Cassandra on JVM,
there is a small number of houskeeping processes, namely cron to trigger
log rotation, a log shipping agent, node metrics exporter (prometheus) and
some other small things.  None of those come close in their memory
requirements compared to Cassandra and are routinely pretty low in memory
usage reports from atop and similar tools.  The overhead of these seems to
be minimal.

My unresponsiveness was seconds long. This is/was bad becasue gossip
> protocol was going crazy by marking nodes down and all the consequences
> this can lead in distributed system, think about hints, dynamic snitch, and
> whatever depends on node availability ...
> Can you share some number about your `tpstats` or system load in general?
>

Here's some pretty typical tpstats output from one of the nodes:

Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0  319319724 0
   0
ViewMutationStage 0 0  0 0
   0
ReadStage 0 0   80006984 0
   0
RequestResponseStage  0 0  258548356 0
   0
ReadRepairStage   0 02707455 0
   0
CounterMutationStage  0 0  0 0
   0
MiscStage 0 0  0 0
   0
CompactionExecutor1551552918 0
   0
MemtableReclaimMemory 0 0   4042 0
   0
PendingRangeCalculator0 0111 0
   0
GossipStage   0 06343859 0
   0
SecondaryIndexManagement  0 0  0 0
   0
HintsDispatcher   0 0226 0
   0
MigrationStage0 0  0 0
   0
MemtablePostFlush 0 0   4046 0
   0
ValidationExecutor1 1   1510 0
   0
Sampler   0 0  0 0
   0
MemtableFlushWriter   0 0   4042 0
   0
InternalResponseStage 0 0   5890 0
   0
AntiEntropyStage  0 0   5532 0
   0
CacheCleanupExecutor  0 0  0 0
   0
Repair#2501 1  1 0
   0
Native-Transport-Requests 2 0  260447405 0
  18

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
HINT 0
MUTATION 1
COUNTER_MUTATION 0
BATCH_STORE  0
BATCH_REMOVE 0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Speaking of CPU utilization, it is consistently within 30-60% on all nodes
(and even less in the night).


> On the tuning side I just went through the following article:
> https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html
>
> No rollbacks, just moving forward! Right now we are upgrading the instance
> size to something more recent than m1.xlarge (for many different reasons,
> including security, ECU and network).Nevertheless it might be a good idea
> to upgrade to the 3.X branch to leverage on better off-heap memory
> management.
>

One thing we have noticed very recently is that our nodes are indeed
running low on memory.  It even seems now that the IO is a side effect of
impending OOM, not the other way round as we have thought initially.

After a fresh JVM start the memory allocation looks roughly like this:

 total   used   free sharedbuffers cached
Mem:   14G14G   173M   1.1M12M   3.2G
-/+ buffers/cache:11G   3.4G
Swap:   0B 0B 0B

Then, within a number of days, the allocated disk cache shrinks all the way
down to unreasonable numbers like only 150M.  At the same time "free" stays
at the original level and "used" grows all the way up to 14G.  Shortly
after that the node becomes unavailable because of the IO and ultimately
afte