RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
No. We do not use update.
All inserts are idempotent and there is no read-before-write query.

On the corrupted data row, we have verified that the data only written once.

Thanks for your answer!

From: Laing, Michael [mailto:michael.la...@nytimes.com]
Sent: mercredi 25 novembre 2015 15:39
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
> wrote:
Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.




Re: I am a Datastax certified Cassandra architect now :)

2015-11-25 Thread Ariel Weisberg
Hi,

Congratulations! I hope the certification brings good things for you.

Regards, Ariel


On Sun, Nov 22, 2015, at 01:00 PM, Prem Yadav wrote:
> Just letting the community know that I just passed the Cassandra
> architect certification with flying colors :). Have to say I learnt a
> lot from this forum.
>
> Thanks, Prem


Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Jack Krupansky
Be sire to include your actual insert statement. Also, what consistency
level was used for the insert (all, quorum, local quorum, one, or...)?


-- Jack Krupansky

On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO 
wrote:

> No. We do not use update.
>
> All inserts are idempotent and there is no read-before-write query.
>
>
>
> On the corrupted data row, we have verified that the data only written
> once.
>
>
>
> Thanks for your answer!
>
>
>
> *From:* Laing, Michael [mailto:michael.la...@nytimes.com]
> *Sent:* mercredi 25 novembre 2015 15:39
> *To:* user@cassandra.apache.org
> *Subject:* Re: list data value multiplied x2 in multi-datacenter
> environment
>
>
>
> You don't have any syntax in your application anywhere such as:
>
>
>
> UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;
>
>
>
> Just a quick idempotency check :)
>
>
>
> On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
> wrote:
>
> Is the data corrupted exactly the same way on all three nodes and in both
> data centers, or just on one or two nodes or in only one data center?
>
>
>
> Are both columns doubled in the same row, or only one of them in a
> particular row?
>
>
>
> Does sound like a bug though, worthy of a Jira ticket.
>
>
> -- Jack Krupansky
>
>
>
> On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
>
> Hello all,
>
>
>
> We encounter an issue on our Production environment that cannot be
> reproduced on Test environment: list (T = double or text) value is
> randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored
> in C* = [a, b, c, a, b, c]).
>
>
>
> I know that it sounds weird but we just want to know whether it is a known
> issue (found nothing with Google…). We are working on a small dataset to
> narrow down issue with log data and maybe create a ticket in for DataStax
> Java Driver or Cassandra teams.
>
>
>
> Cassandra v2.0.14
>
> DataStax Java Driver v2.1.7.1
>
> OS RHEL6
>
> Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
>
> UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)
>
>
>
> The only difference between Prod and UAT cluster is the multi-datacenter
> mode on Prod one.
>
> We do not insert twice the same data on the same column of any specific
> row. All inserts/updates are idempotent!
>
>
>
> Data table:
>
> CREATE TABLE data (
>
> field1 text,
>
> field2 int,
>
> field3 text,
>
> field4 double,
>
> field5 list, -- randomly having corrupted data, containing [1,
> 2, 3, 1, 2, 3] instead of [1, 2, 3]
>
> field6 text,
>
> field7 list,   -- randomly having corrupted data, containing [a,
> b, c, a, b, c] instead of [a, b, c]
>
> PRIMARY KEY ((field1, field2), field3)
>
> ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
>
>
>
> Thanks in advance for your help.
>
> Best regards,
>
> Minh
>
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary,consider the environment.
>
>
> --
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation
> de
> ce message qui n'est pas conforme a sa destination, toute diffusion ou
> toute
> publication, totale ou partielle, est interdite. L'Internet ne permettant
> pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP
> Paribas
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie.
> N'imprimez ce message que si necessaire, pensez a l'environnement.
>
>
>
>
>


Re: Help diagnosing performance issue

2015-11-25 Thread Antoine Bonavita

Sebastian (and others, help is always appreciated),

After 24h OK, read latencies started to degrade (up to 20ms) and I had 
to ramp down volumes again.


The degradation is clearly linked to the number read IOPs which went up 
to 1.65k/s after 24h.


If anybody can give me hints on what I should look at, I'm very happy to 
do so.


A.

On 11/23/2015 12:07 PM, Antoine Bonavita wrote:

Sebastian,

I tried to ramp up volume with this new setting and ran into the same
problems.

After that I restarted my nodes. This pretty much instantly got read
latencies back to normal (< 5ms) on the 32G nodes.

I am currently ramping up volumes again and here is what I am seeing on
32G nodes:
* Read latencies are OK (<5ms)
* A lot of read IOPS (~ 400 read/s)
* I enabled logging for the DateCompactionStrategy and I get only this
kind of lines :
DEBUG [CompactionExecutor:186] 2015-11-23 12:02:45,915
DateTieredCompactionStrategy.java:137 - Compaction buckets are []
DEBUG [CompactionExecutor:186] 2015-11-23 12:03:16,704
DateTieredCompactionStrategy.java:137 - Compaction buckets are
[[BigTableReader(path='/var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-6452-big-Data.db')]]

* When I run pcstats I still get about 100 *-Data.db files loaded at 15%
(which is what I was seeing with max_sstable_age_days set at 5).

I'm really happy with the first item in my list but the other items seem
to indicate something is still wrong and it does not look like it's
compaction.

Any help would be truly appreciated.

A.

On 11/20/2015 12:58 AM, Antoine Bonavita wrote:

Sebastian,

I took into account your suggestion and set max_sstable_age_days to 1.

I left the TTL at 432000 and the gc_grace_seconds at 172800. So, I
expect SSTable older than 7 days to get deleted. Am I right ?

I did not change dclocal_read_repair_chance because I have only one DC
at this point in time. Did you mean that I should set read_repair_chance
to 0 ?

Thanks again for your time and help. Really appreciated.

A.


On 11/19/2015 02:36 AM, Sebastian Estevez wrote:

When you say drop you mean reduce the value (to 1 day for example),
not "don't set the value", right ?


Yes.

If I set max sstable age days to 1, my understanding is that
SSTables with expired data (5 days) are not going to be compacted
ever. And therefore my disk usage will keep growing forever. Did I
miss something here ?


We will expire sstables who's highest TTL is beyond gc_grace_seconds as
of CASSANDRA-5228
. This is nice
because the sstable is just dropped for free, no need to scan it and
remove tombstones which is very expensive and DTCS will guarantee that
all the data within an sstable is close together in time.

So, if I set max sstable age days to 1, I have to run repairs at
least once a day, correct ?

I'm afraid I don't get your point about painful compactions.


I was referring to the problems described here CASSANDRA-9644





All the best,


datastax_logo.png 

Sebastián Estévez

Solutions Architect |954 905 8615 | sebastian.este...@datastax.com


linkedin.png facebook.png
twitter.png
g+.png









DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to
any size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Nov 18, 2015 at 5:53 PM, Antoine Bonavita > wrote:

Sebastian,

Your help is very much appreciated. I re-read the blog post and also
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ but some
things are still confusing me.

Please see my questions inline below.

On 11/18/2015 04:21 PM, Sebastian Estevez wrote:

Yep, I think you've mixed up your DTCS levers. I would read, or
re-read
Marcus's post
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

*TL;DR:*

   * *base_time_seconds*  is the size of your initial window
   * *max_sstable_age_days* is the time after which you stop
compacting
 sstables
   * *default_time_to_live* is the time after which data
expires and
 sstables will start to become available for GC. (432000 is
5 days)


 Could it be that compaction is putting 

Re: Unable to use using multiple network interfaces in Cassandra 3.0

2015-11-25 Thread Paulo Motta
Hello Sergey,

Currently Cassandra listens in one interface (the listen_address), so you
can only use multiple interfaces if your NAT configuration can route from
your public IP address to your private interface, as typically happens on
EC2 and other clouds. We're currently working to support listening on
multiple network interfaces to support this use case. You can follow the
progress and get more background on this ticket:
https://issues.apache.org/jira/browse/CASSANDRA-9748. There's a preliminary
patch available if you're willing to patch your Cassandra install.

Regards,

Paulo

2015-11-25 15:36 GMT-08:00 Sergey Panov :

> Hello,
>
> We faced with the issue that Cassandra 3.0 does not work when we tried to
> set up private and public networks usage.
> The following doc was used:
> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configMultiNetworks.html
>
> First node (172.20.30.91) settings
> - seeds: "172.20.30.91,172.20.30.92,172.20.30.93"
> # private IP
> listen_address: 10.12.0.1
> # public IP
> rpc_address: 172.20.30.91
> broadcast_address: 172.20.30.91
> internode_compression: none
>
> ..
>
> Fourth node (172.20.30.94) settings
> - seeds: "172.20.30.91,172.20.30.92,172.20.30.93"
> # private IP
> listen_address: 10.12.0.4
> # public IP
> rpc_address: 172.20.30.94
> broadcast_address: 172.20.30.94
> internode_compression: none
>
>
> All seeds detects only themself, other nodes show errors:
>
> assandraDaemon.java:702 - Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1344)
> ~[apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:529)
> ~[apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:777)
> ~[apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:694)
> ~[apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:581)
> ~[apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:345)
> [apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:561)
> [apache-cassandra-3.0.0.jar:3.0.0]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689)
> [apache-cassandra-3.0.0.jar:3.0.0]
>
>
> Please advice. Thank you!
>
> --
>
> Sergey Panov
>
>


Unable to use using multiple network interfaces in Cassandra 3.0

2015-11-25 Thread Sergey Panov

Hello,

We faced with the issue that Cassandra 3.0 does not work when we tried 
to set up private and public networks usage.
The following doc was used: 
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configMultiNetworks.html


First node (172.20.30.91) settings
- seeds: "172.20.30.91,172.20.30.92,172.20.30.93"
# private IP
listen_address: 10.12.0.1
# public IP
rpc_address: 172.20.30.91
broadcast_address: 172.20.30.91
internode_compression: none

..

Fourth node (172.20.30.94) settings
- seeds: "172.20.30.91,172.20.30.92,172.20.30.93"
# private IP
listen_address: 10.12.0.4
# public IP
rpc_address: 172.20.30.94
broadcast_address: 172.20.30.94
internode_compression: none


All seeds detects only themself, other nodes show errors:

assandraDaemon.java:702 - Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at 
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1344) 
~[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:529) 
~[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:777) 
~[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:694) 
~[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:581) 
~[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:345) 
[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:561) 
[apache-cassandra-3.0.0.jar:3.0.0]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.0.0.jar:3.0.0]



Please advice. Thank you!

--

Sergey Panov



OpsCenter doen not work with Cassandra 3.0

2015-11-25 Thread Sergey Panov

Hello,

Today we tried to setup DataStaxOpsCenter-5.2.2.2015102711-linux-x64 to 
work with Cassandra 3.0.


OpsCenter logs have the following:

2015-11-26 02:17:39+0300 [] INFO: Starting factory 
0x2521b00>
2015-11-26 02:17:40+0300 [] INFO: Stopping factory 
0x2521b00>
2015-11-26 02:17:40+0300 [] WARN: [control connection] Error connecting 
to 172.20.30.91: Unexpected response during Connection setup: 
ProtocolError('Server pr
otocol version (4) does not match the specified driver protocol version 
(2). Consider setting Cluster.protocol_version to 4.',)
2015-11-26 02:17:40+0300 [] ERROR: Control connection failed to connect, 
shutting down Cluster: ('Unable to connect to any servers', 
{u'172.20.30.91': Protocol
Error("Unexpected response during Connection setup: 
ProtocolError('Server protocol version (4) does not match the specified 
driver protocol version (2). Consid

er setting Cluster.protocol_version to 4.',)",)})
2015-11-26 02:17:40+0300 [] WARN: ProcessingError while calling 
CreateClusterConfController: Unable to connect to cluster. Error is: 
Unable to connect to any

seed nodes, tried [u'172.20.30.91']


How can we switch to protocol_version to 4?
Does OpsCenter officially support the latest version of Cassandra?
Did anybody try to setup it?

Please advice. Thank you!

--

Sergey Panov



Re: OpsCenter doen not work with Cassandra 3.0

2015-11-25 Thread Alex Popescu
Hi Sergey,

There were some changes of the system schema tables in the released version
of Cassandra 3.0 that are preventing tools like OpsCenter and DevCenter to
connect. We are working on releasing updated versions that are fully
compatible with C 3.0.

thanks for your understanding,

On Wed, Nov 25, 2015 at 3:45 PM, Sergey Panov  wrote:

> Hello,
>
> Today we tried to setup DataStaxOpsCenter-5.2.2.2015102711-linux-x64 to
> work with Cassandra 3.0.
>
> OpsCenter logs have the following:
>
> 2015-11-26 02:17:39+0300 [] INFO: Starting factory
>  0x2521b00>
> 2015-11-26 02:17:40+0300 [] INFO: Stopping factory
>  0x2521b00>
> 2015-11-26 02:17:40+0300 [] WARN: [control connection] Error connecting to
> 172.20.30.91: Unexpected response during Connection setup:
> ProtocolError('Server pr
> otocol version (4) does not match the specified driver protocol version
> (2). Consider setting Cluster.protocol_version to 4.',)
> 2015-11-26 02:17:40+0300 [] ERROR: Control connection failed to connect,
> shutting down Cluster: ('Unable to connect to any servers',
> {u'172.20.30.91': Protocol
> Error("Unexpected response during Connection setup: ProtocolError('Server
> protocol version (4) does not match the specified driver protocol version
> (2). Consid
> er setting Cluster.protocol_version to 4.',)",)})
> 2015-11-26 02:17:40+0300 [] WARN: ProcessingError while calling
> CreateClusterConfController: Unable to connect to cluster. Error is: Unable
> to connect to any
> seed nodes, tried [u'172.20.30.91']
>
>
> How can we switch to protocol_version to 4?
> Does OpsCenter officially support the latest version of Cassandra?
> Did anybody try to setup it?
>
> Please advice. Thank you!
>
> --
>
> Sergey Panov
>
>


-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Re: I am a Datastax certified Cassandra architect now :)

2015-11-25 Thread Neha Dave
Congrats Prem !!!
Me too planning to take up the certificate.
Please, Provide tips?

regards
Neha

On Wed, Nov 25, 2015 at 10:36 PM, Ariel Weisberg  wrote:

> Hi,
>
> Congratulations! I hope the certification brings good things for you.
>
> Regards,
> Ariel
>
>
> On Sun, Nov 22, 2015, at 01:00 PM, Prem Yadav wrote:
>
> Just letting the community know that I just passed the Cassandra architect
> certification with flying colors :).
> Have to say I learnt a lot from this forum.
>
> Thanks,
> Prem
>
>
>


[ANNOUNCE] CFP open for ApacheCon North America 2016

2015-11-25 Thread Rich Bowen
Community growth starts by talking with those interested in your
project. ApacheCon North America is coming, are you?

We are delighted to announce that the Call For Presentations (CFP) is
now open for ApacheCon North America. You can submit your proposed
sessions at
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
for big data talks and
http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
for all other topics.

ApacheCon North America will be held in Vancouver, Canada, May 9-13th
2016. ApacheCon has been running every year since 2000, and is the place
to build your project communities.

While we will consider individual talks we prefer to see related
sessions that are likely to draw users and community members. When
submitting your talk work with your project community and with related
communities to come up with a full program that will walk attendees
through the basics and on into mastery of your project in example use
cases. Content that introduces what's new in your latest release is also
of particular interest, especially when it builds upon existing well
know application models. The goal should be to showcase your project in
ways that will attract participants and encourage engagement in your
community, Please remember to involve your whole project community (user
and dev lists) when building content. This is your chance to create a
project specific event within the broader ApacheCon conference.

Content at ApacheCon North America will be cross-promoted as
mini-conferences, such as ApacheCon Big Data, and ApacheCon Mobile, so
be sure to indicate which larger category your proposed sessions fit into.

Finally, please plan to attend ApacheCon, even if you're not proposing a
talk. The biggest value of the event is community building, and we count
on you to make it a place where your project community is likely to
congregate, not just for the technical content in sessions, but for
hackathons, project summits, and good old fashioned face-to-face networking.

-- 
rbo...@apache.org
http://apache.org/


Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread DuyHai Doan
There was several bugs in the past related to list in CQL.

Indeed the timestamp used for list columns are computed server side using a
special algorithm. I wonder if in case of read-repair or/and
hinted-handoff, would the original timestamp (the timestamp generated by
the coordinator at the first insert/update) be used or the server will
generate another one using its algorithm, it may explain the behavior.



On Wed, Nov 25, 2015 at 9:36 PM, Ngoc Minh VO 
wrote:

> Our insert/select queries use CL = QUORUM.
>
>
>
> We don’t use BatchStatement to import data but executeAsync(Statement)
> with a fixed-size queue.
>
>
>
> Regards,
>
>
>
> *From:* Jack Krupansky [mailto:jack.krupan...@gmail.com]
> *Sent:* mercredi 25 novembre 2015 18:09
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: list data value multiplied x2 in multi-datacenter
> environment
>
>
>
> Be sire to include your actual insert statement. Also, what consistency
> level was used for the insert (all, quorum, local quorum, one, or...)?
>
>
>
>
> -- Jack Krupansky
>
>
>
> On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO 
> wrote:
>
> No. We do not use update.
>
> All inserts are idempotent and there is no read-before-write query.
>
>
>
> On the corrupted data row, we have verified that the data only written
> once.
>
>
>
> Thanks for your answer!
>
>
>
> *From:* Laing, Michael [mailto:michael.la...@nytimes.com]
> *Sent:* mercredi 25 novembre 2015 15:39
> *To:* user@cassandra.apache.org
> *Subject:* Re: list data value multiplied x2 in multi-datacenter
> environment
>
>
>
> You don't have any syntax in your application anywhere such as:
>
>
>
> UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;
>
>
>
> Just a quick idempotency check :)
>
>
>
> On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
> wrote:
>
> Is the data corrupted exactly the same way on all three nodes and in both
> data centers, or just on one or two nodes or in only one data center?
>
>
>
> Are both columns doubled in the same row, or only one of them in a
> particular row?
>
>
>
> Does sound like a bug though, worthy of a Jira ticket.
>
>
> -- Jack Krupansky
>
>
>
> On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
>
> Hello all,
>
>
>
> We encounter an issue on our Production environment that cannot be
> reproduced on Test environment: list (T = double or text) value is
> randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored
> in C* = [a, b, c, a, b, c]).
>
>
>
> I know that it sounds weird but we just want to know whether it is a known
> issue (found nothing with Google…). We are working on a small dataset to
> narrow down issue with log data and maybe create a ticket in for DataStax
> Java Driver or Cassandra teams.
>
>
>
> Cassandra v2.0.14
>
> DataStax Java Driver v2.1.7.1
>
> OS RHEL6
>
> Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
>
> UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)
>
>
>
> The only difference between Prod and UAT cluster is the multi-datacenter
> mode on Prod one.
>
> We do not insert twice the same data on the same column of any specific
> row. All inserts/updates are idempotent!
>
>
>
> Data table:
>
> CREATE TABLE data (
>
> field1 text,
>
> field2 int,
>
> field3 text,
>
> field4 double,
>
> field5 list, -- randomly having corrupted data, containing [1,
> 2, 3, 1, 2, 3] instead of [1, 2, 3]
>
> field6 text,
>
> field7 list,   -- randomly having corrupted data, containing [a,
> b, c, a, b, c] instead of [a, b, c]
>
> PRIMARY KEY ((field1, field2), field3)
>
> ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
>
>
>
> Thanks in advance for your help.
>
> Best regards,
>
> Minh
>
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary,consider the environment.
>
>
> --
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement 

RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Our insert/select queries use CL = QUORUM.

We don’t use BatchStatement to import data but executeAsync(Statement) with a 
fixed-size queue.

Regards,

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: mercredi 25 novembre 2015 18:09
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

Be sire to include your actual insert statement. Also, what consistency level 
was used for the insert (all, quorum, local quorum, one, or...)?


-- Jack Krupansky

On Wed, Nov 25, 2015 at 11:43 AM, Ngoc Minh VO 
> wrote:
No. We do not use update.
All inserts are idempotent and there is no read-before-write query.

On the corrupted data row, we have verified that the data only written once.

Thanks for your answer!

From: Laing, Michael 
[mailto:michael.la...@nytimes.com]
Sent: mercredi 25 novembre 2015 15:39
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
> wrote:
Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.





list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
"multiplied" by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google...). We are working on a small dataset to 
narrow down issue with log data and maybe create a ticket in for DataStax Java 
Driver or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh


This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet 
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified. 
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.


Compaction And Write performance

2015-11-25 Thread aeljami.ext
Hi all,

Does compaction throughput impact write performance ?

Increasing the value of compaction_throughput_mb_per_sec can improve the insert 
data ? If yes, is it possible to explain to me the concept ?

Thanks.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorization.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange shall not be liable if this 
message was modified, changed or falsified.
Thank you.



Re: Compaction And Write performance

2015-11-25 Thread Prem Yadav
Compaction is done to improve the reads. The compaction process is very CPU
intensive and it can make writes perform slow. Writes are also CPU-bound.



On Wed, Nov 25, 2015 at 11:12 AM,  wrote:

> Hi all,
>
>
>
> Does compaction throughput impact write performance ?
>
>
>
> Increasing the value of *compaction_throughput_mb_per_sec* can improve
> the insert data ? If yes, is it possible to explain to me the concept ?
>
>
>
> Thanks.
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
>
>


Re: Compaction And Write performance

2015-11-25 Thread Kiran mk
Yes to an extent if you have descent machines and but not making use of its
resources.  By default the compaction throughput is 16 MB/s which does
performs compaction very slower and runs for hours together which will lag
the compaction and makes more number of pending compaction jobs.  You can
increase value up to 128 MB/s (if I am not wrong).

Increasing to 32 to 64 MB/s by assessing the load would definitely give you
good write performance.

But if your machines are already IO Intensive and overloaded, then never
try to change the value.

Best Regards,
Kiran.M.K.

On Wed, Nov 25, 2015 at 4:42 PM,  wrote:

> Hi all,
>
>
>
> Does compaction throughput impact write performance ?
>
>
>
> Increasing the value of *compaction_throughput_mb_per_sec* can improve
> the insert data ? If yes, is it possible to explain to me the concept ?
>
>
>
> Thanks.
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
>
>


-- 
Best Regards,
Kiran.M.K.


Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-25 Thread Romain Hardouin
My previous answer (sstableloader) allows you moving from larger to smaller 
cluster

Sent from Yahoo Mail on Android 
 
  On Tue, Nov 24, 2015 at 11:30, Anishek Agarwal wrote:   
Peer, 
that talks about having a similar sized cluster, I was wondering if there is a 
way for moving from larger to smaller cluster. I will try a few things as soon 
as i get time and update here.
On Thu, Nov 19, 2015 at 5:48 PM, Peer, Oded  wrote:


Have you read the DataStax documentation?

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

 

 

From: Romain Hardouin [mailto:romainh...@yahoo.fr]
Sent: Wednesday, November 18, 2015 3:59 PM
To: user@cassandra.apache.org
Subject: Re: Strategy tools for taking snapshots to load in another cluster 
instance

 

| 
You can take a snapshot via nodetool then load sstables on your test cluster 
with sstableloader: 
docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html





Sent from Yahoo Mail on Android

| 
From:"Anishek Agarwal" 
Date:Wed, Nov 18, 2015 at 11:24
Subject:Strategy tools for taking snapshots to load in another cluster instance

Hello

 

We have 5 node prod cluster and 3 node test cluster. Is there a way i can take 
snapshot of a table in prod and load it test cluster. The cassandra versions 
are same. 

 

Even if there is a tool that can help with this it will be great.

 

If not, how do people handle scenarios where data in prod is required in 
staging/test clusters for testing to make sure things are correct ? Does the 
cluster size have to be same to allow copying of relevant snapshot data etc? 

 

 

thanks

anishek
 |

 |


 


  


Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Jack Krupansky
Is the data corrupted exactly the same way on all three nodes and in both
data centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a
particular row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
wrote:

> Hello all,
>
>
>
> We encounter an issue on our Production environment that cannot be
> reproduced on Test environment: list (T = double or text) value is
> randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored
> in C* = [a, b, c, a, b, c]).
>
>
>
> I know that it sounds weird but we just want to know whether it is a known
> issue (found nothing with Google…). We are working on a small dataset to
> narrow down issue with log data and maybe create a ticket in for DataStax
> Java Driver or Cassandra teams.
>
>
>
> Cassandra v2.0.14
>
> DataStax Java Driver v2.1.7.1
>
> OS RHEL6
>
> Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
>
> UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)
>
>
>
> The only difference between Prod and UAT cluster is the multi-datacenter
> mode on Prod one.
>
> We do not insert twice the same data on the same column of any specific
> row. All inserts/updates are idempotent!
>
>
>
> Data table:
>
> CREATE TABLE data (
>
> field1 text,
>
> field2 int,
>
> field3 text,
>
> field4 double,
>
> field5 list, -- randomly having corrupted data, containing [1,
> 2, 3, 1, 2, 3] instead of [1, 2, 3]
>
> field6 text,
>
> field7 list,   -- randomly having corrupted data, containing [a,
> b, c, a, b, c] instead of [a, b, c]
>
> PRIMARY KEY ((field1, field2), field3)
>
> ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
>
>
>
> Thanks in advance for your help.
>
> Best regards,
>
> Minh
>
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary,consider the environment.
>
>
> --
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation
> de
> ce message qui n'est pas conforme a sa destination, toute diffusion ou
> toute
> publication, totale ou partielle, est interdite. L'Internet ne permettant
> pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP
> Paribas
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie.
> N'imprimez ce message que si necessaire, pensez a l'environnement.
>


Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Laing, Michael
You don't have any syntax in your application anywhere such as:

UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...;

Just a quick idempotency check :)

On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky 
wrote:

> Is the data corrupted exactly the same way on all three nodes and in both
> data centers, or just on one or two nodes or in only one data center?
>
> Are both columns doubled in the same row, or only one of them in a
> particular row?
>
> Does sound like a bug though, worthy of a Jira ticket.
>
> -- Jack Krupansky
>
> On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
>
>> Hello all,
>>
>>
>>
>> We encounter an issue on our Production environment that cannot be
>> reproduced on Test environment: list (T = double or text) value is
>> randomly “multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored
>> in C* = [a, b, c, a, b, c]).
>>
>>
>>
>> I know that it sounds weird but we just want to know whether it is a
>> known issue (found nothing with Google…). We are working on a small dataset
>> to narrow down issue with log data and maybe create a ticket in for
>> DataStax Java Driver or Cassandra teams.
>>
>>
>>
>> Cassandra v2.0.14
>>
>> DataStax Java Driver v2.1.7.1
>>
>> OS RHEL6
>>
>> Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
>>
>> UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)
>>
>>
>>
>> The only difference between Prod and UAT cluster is the multi-datacenter
>> mode on Prod one.
>>
>> We do not insert twice the same data on the same column of any specific
>> row. All inserts/updates are idempotent!
>>
>>
>>
>> Data table:
>>
>> CREATE TABLE data (
>>
>> field1 text,
>>
>> field2 int,
>>
>> field3 text,
>>
>> field4 double,
>>
>> field5 list, -- randomly having corrupted data, containing
>> [1, 2, 3, 1, 2, 3] instead of [1, 2, 3]
>>
>> field6 text,
>>
>> field7 list,   -- randomly having corrupted data, containing
>> [a, b, c, a, b, c] instead of [a, b, c]
>>
>> PRIMARY KEY ((field1, field2), field3)
>>
>> ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
>>
>>
>>
>> Thanks in advance for your help.
>>
>> Best regards,
>>
>> Minh
>>
>> This message and any attachments (the "message") is
>> intended solely for the intended addressees and is confidential.
>> If you receive this message in error,or are not the intended
>> recipient(s),
>> please delete it and any copies from your systems and immediately notify
>> the sender. Any unauthorized view, use that does not comply with its
>> purpose,
>> dissemination or disclosure, either whole or partial, is prohibited.
>> Since the internet
>> cannot guarantee the integrity of this message which may not be reliable,
>> BNP PARIBAS
>> (and its subsidiaries) shall not be liable for the message if modified,
>> changed or falsified.
>> Do not print this message unless it is necessary,consider the environment.
>>
>>
>> --
>>
>> Ce message et toutes les pieces jointes (ci-apres le "message")
>> sont etablis a l'intention exclusive de ses destinataires et sont
>> confidentiels.
>> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
>> merci de le detruire ainsi que toute copie de votre systeme et d'en
>> avertir
>> immediatement l'expediteur. Toute lecture non autorisee, toute
>> utilisation de
>> ce message qui n'est pas conforme a sa destination, toute diffusion ou
>> toute
>> publication, totale ou partielle, est interdite. L'Internet ne permettant
>> pas d'assurer
>> l'integrite de ce message electronique susceptible d'alteration, BNP
>> Paribas
>> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
>> dans l'hypothese
>> ou il aurait ete modifie, deforme ou falsifie.
>> N'imprimez ce message que si necessaire, pensez a l'environnement.
>>
>
>


RE: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Ngoc Minh VO
Hello,

The data are corrupted on all the 6 replicas (3 per datacenter). I used 
consistency level ONE and queried on all node -> same result.

In our use-case, only 1 of the 4 data columns (field4, 5, 6, 7) contains the 
data, the 3 others contain NULL.

We are trying to create a small dataset for Jira ticket. It is strange that 
nobody encounters the same issue.
Minh

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: mercredi 25 novembre 2015 15:16
To: user@cassandra.apache.org
Subject: Re: list data value multiplied x2 in multi-datacenter environment

Is the data corrupted exactly the same way on all three nodes and in both data 
centers, or just on one or two nodes or in only one data center?

Are both columns doubled in the same row, or only one of them in a particular 
row?

Does sound like a bug though, worthy of a Jira ticket.

-- Jack Krupansky

On Wed, Nov 25, 2015 at 4:05 AM, Ngoc Minh VO 
> wrote:
Hello all,

We encounter an issue on our Production environment that cannot be reproduced 
on Test environment: list (T = double or text) value is randomly 
“multiplied” by 2 (i.e. value sent to C*= [a, b, c], value stored in C* = [a, 
b, c, a, b, c]).

I know that it sounds weird but we just want to know whether it is a known 
issue (found nothing with Google…). We are working on a small dataset to narrow 
down issue with log data and maybe create a ticket in for DataStax Java Driver 
or Cassandra teams.

Cassandra v2.0.14
DataStax Java Driver v2.1.7.1
OS RHEL6
Prod Cluster topology = 16 nodes over 2 datacenters (RF = 3 per DC)
UAT Cluster topology = 6 nodes on 1 datacenter (RF = 3)

The only difference between Prod and UAT cluster is the multi-datacenter mode 
on Prod one.
We do not insert twice the same data on the same column of any specific row. 
All inserts/updates are idempotent!

Data table:
CREATE TABLE data (
field1 text,
field2 int,
field3 text,
field4 double,
field5 list, -- randomly having corrupted data, containing [1, 2, 
3, 1, 2, 3] instead of [1, 2, 3]
field6 text,
field7 list,   -- randomly having corrupted data, containing [a, b, 
c, a, b, c] instead of [a, b, c]
PRIMARY KEY ((field1, field2), field3)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

--

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.