Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Carlos Rolo
Is not a good idea to do LCS on spinning. Change to STCS, and reduce the
compactors to 2 (if you have more than 2). Check if that helps.

On Apr 7, 2017 20:18, "Matija Gobec"  wrote:

> It does as the "new" data, even if the values are the same, has new write
> time timestamp.
> Spinning disks are hard to run LCS on. Do you maybe have some kind of non
> stripe raid in place?
>
> On Fri, Apr 7, 2017 at 8:46 PM, Giri P  wrote:
>
>> Does LCS try compacting already compacted files if it see same key loaded
>> again ?
>>
>> On Fri, Apr 7, 2017 at 11:39 AM, Giri P  wrote:
>>
>>> cassandra version : 2.1
>>> volume : initially loading 28 days worth of data around 1 TB and then we
>>>  process hourly
>>> load: only cassandra running on nodes
>>> disks: spinning disks
>>>
>>>
>>>
>>> On Fri, Apr 7, 2017 at 11:27 AM, Jonathan Haddad 
>>> wrote:
>>>
 What version of Cassandra? How much data? How often are you reloading
 it? Is compaction throttled? What disks are you using? Any other load on
 the machine?
 On Fri, Apr 7, 2017 at 11:19 AM Giri P  wrote:

> Hi,
>
> we are continuously loading a table which has properties properties
> compaction strategy LCS and bloom filter off and compactions are not
> catching up . Even the compaction is running slow on that table even after
> we increases throughput and concurrent compactors.
>
> Can someone point me to what I should be looking to tune this ?
>
> Thanks
> Giri
>

>>>
>>
>

-- 


--





Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Matija Gobec
It does as the "new" data, even if the values are the same, has new write
time timestamp.
Spinning disks are hard to run LCS on. Do you maybe have some kind of non
stripe raid in place?

On Fri, Apr 7, 2017 at 8:46 PM, Giri P  wrote:

> Does LCS try compacting already compacted files if it see same key loaded
> again ?
>
> On Fri, Apr 7, 2017 at 11:39 AM, Giri P  wrote:
>
>> cassandra version : 2.1
>> volume : initially loading 28 days worth of data around 1 TB and then we
>>  process hourly
>> load: only cassandra running on nodes
>> disks: spinning disks
>>
>>
>>
>> On Fri, Apr 7, 2017 at 11:27 AM, Jonathan Haddad 
>> wrote:
>>
>>> What version of Cassandra? How much data? How often are you reloading
>>> it? Is compaction throttled? What disks are you using? Any other load on
>>> the machine?
>>> On Fri, Apr 7, 2017 at 11:19 AM Giri P  wrote:
>>>
 Hi,

 we are continuously loading a table which has properties properties
 compaction strategy LCS and bloom filter off and compactions are not
 catching up . Even the compaction is running slow on that table even after
 we increases throughput and concurrent compactors.

 Can someone point me to what I should be looking to tune this ?

 Thanks
 Giri

>>>
>>
>


Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Giri P
Does LCS try compacting already compacted files if it see same key loaded
again ?

On Fri, Apr 7, 2017 at 11:39 AM, Giri P  wrote:

> cassandra version : 2.1
> volume : initially loading 28 days worth of data around 1 TB and then we
>  process hourly
> load: only cassandra running on nodes
> disks: spinning disks
>
>
>
> On Fri, Apr 7, 2017 at 11:27 AM, Jonathan Haddad 
> wrote:
>
>> What version of Cassandra? How much data? How often are you reloading it?
>> Is compaction throttled? What disks are you using? Any other load on the
>> machine?
>> On Fri, Apr 7, 2017 at 11:19 AM Giri P  wrote:
>>
>>> Hi,
>>>
>>> we are continuously loading a table which has properties properties
>>> compaction strategy LCS and bloom filter off and compactions are not
>>> catching up . Even the compaction is running slow on that table even after
>>> we increases throughput and concurrent compactors.
>>>
>>> Can someone point me to what I should be looking to tune this ?
>>>
>>> Thanks
>>> Giri
>>>
>>
>


Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Giri P
cassandra version : 2.1
volume : initially loading 28 days worth of data around 1 TB and then we
 process hourly
load: only cassandra running on nodes
disks: spinning disks



On Fri, Apr 7, 2017 at 11:27 AM, Jonathan Haddad  wrote:

> What version of Cassandra? How much data? How often are you reloading it?
> Is compaction throttled? What disks are you using? Any other load on the
> machine?
> On Fri, Apr 7, 2017 at 11:19 AM Giri P  wrote:
>
>> Hi,
>>
>> we are continuously loading a table which has properties properties
>> compaction strategy LCS and bloom filter off and compactions are not
>> catching up . Even the compaction is running slow on that table even after
>> we increases throughput and concurrent compactors.
>>
>> Can someone point me to what I should be looking to tune this ?
>>
>> Thanks
>> Giri
>>
>


Re: too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Jonathan Haddad
What version of Cassandra? How much data? How often are you reloading it?
Is compaction throttled? What disks are you using? Any other load on the
machine?
On Fri, Apr 7, 2017 at 11:19 AM Giri P  wrote:

> Hi,
>
> we are continuously loading a table which has properties properties
> compaction strategy LCS and bloom filter off and compactions are not
> catching up . Even the compaction is running slow on that table even after
> we increases throughput and concurrent compactors.
>
> Can someone point me to what I should be looking to tune this ?
>
> Thanks
> Giri
>


Re: Migrating from Datastax Distribution to Apache Cassandra

2017-04-07 Thread Michael Shuler
Example DDC 3.7.0 to Apache Cassandra 3.10 upgrade with all default
configs, no data, and both the DDC and Apache Cassandra lines in
sources.list (sorry for any weird wrapping, but I think the list strips
attachments):

mshuler@hana:~$ apt-cache policy datastax-ddc cassandra
datastax-ddc:
  Installed: (none)
  Candidate: 3.7.0
  Version table:
 3.7.0 0
500 https://debian.datastax.com/datastax-ddc/ 3.7/main amd64
Packages
cassandra:
  Installed: (none)
  Candidate: 3.10
  Version table:
 3.10 0
500 http://www.apache.org/dist/cassandra/debian/ 310x/main amd64
Packages
mshuler@hana:~$
mshuler@hana:~$ sudo apt-get install datastax-ddc
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  datastax-ddc-tools
The following NEW packages will be installed:
  datastax-ddc datastax-ddc-tools
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.4 MB of archives.
After this operation, 38.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 https://debian.datastax.com/datastax-ddc/ 3.7/main datastax-ddc
all 3.7.0 [28.4 MB]
Get:2 https://debian.datastax.com/datastax-ddc/ 3.7/main
datastax-ddc-tools all 3.7.0 [4,484 B]
Fetched 28.4 MB in 3s (8,210 kB/s)
Selecting previously unselected package datastax-ddc.
(Reading database ... 171852 files and directories currently installed.)
Preparing to unpack .../datastax-ddc_3.7.0_all.deb ...
Unpacking datastax-ddc (3.7.0) ...
Selecting previously unselected package datastax-ddc-tools.
Preparing to unpack .../datastax-ddc-tools_3.7.0_all.deb ...
Unpacking datastax-ddc-tools (3.7.0) ...
Processing triggers for systemd (215-17+deb8u6) ...
Setting up datastax-ddc (3.7.0) ...
vm.max_map_count = 1048575
net.ipv4.tcp_keepalive_time = 300
update-rc.d: warning: start and stop actions are no longer supported;
falling back to defaults
Setting up datastax-ddc-tools (3.7.0) ...
Processing triggers for systemd (215-17+deb8u6) ...
mshuler@hana:~$
mshuler@hana:~$ sudo apt-get install cassandra cassandra-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  datastax-ddc datastax-ddc-tools
The following NEW packages will be installed:
  cassandra cassandra-tools
0 upgraded, 2 newly installed, 2 to remove and 0 not upgraded.
Need to get 29.2 MB of archives.
After this operation, 939 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://www.apache.org/dist/cassandra/debian/ 310x/main cassandra
all 3.10 [29.2 MB]
Get:2 http://www.apache.org/dist/cassandra/debian/ 310x/main
cassandra-tools all 3.10 [4,558 B]
Fetched 29.2 MB in 2s (13.6 MB/s)
(Reading database ... 172046 files and directories currently installed.)
Removing datastax-ddc-tools (3.7.0) ...
Removing datastax-ddc (3.7.0) ...
Selecting previously unselected package cassandra.
(Reading database ... 171870 files and directories currently installed.)
Preparing to unpack .../cassandra_3.10_all.deb ...
Unpacking cassandra (3.10) ...
Selecting previously unselected package cassandra-tools.
Preparing to unpack .../cassandra-tools_3.10_all.deb ...
Unpacking cassandra-tools (3.10) ...
Processing triggers for systemd (215-17+deb8u6) ...
Setting up cassandra (3.10) ...
Installing new version of config file /etc/cassandra/cassandra-env.sh ...
Installing new version of config file /etc/cassandra/cassandra.yaml ...
Installing new version of config file /etc/cassandra/jvm.options ...
Installing new version of config file /etc/cassandra/logback-tools.xml ...
Installing new version of config file /etc/cassandra/logback.xml ...
vm.max_map_count = 1048575
net.ipv4.tcp_keepalive_time = 300
update-rc.d: warning: start and stop actions are no longer supported;
falling back to defaults
Setting up cassandra-tools (3.10) ...
mshuler@hana:~$

-- 
Warm regards,
Michael


too many compactions pending and compaction is slow on few tables

2017-04-07 Thread Giri P
Hi,

we are continuously loading a table which has properties properties
compaction strategy LCS and bloom filter off and compactions are not
catching up . Even the compaction is running slow on that table even after
we increases throughput and concurrent compactors.

Can someone point me to what I should be looking to tune this ?

Thanks
Giri


Re: Migrating from Datastax Distribution to Apache Cassandra

2017-04-07 Thread Michael Shuler
This is prudent advice, but a rolling upgrade from DDC 3.7.0 to Apache
Cassandra 3.10, after updating your sources.list should also work fine.
Just back up all your configurations, and if your data is mission
critical, follow good backup strategy for that, too. Testing the upgrade
in your production staging environment is also very prudent.

The DDC deb packages were built directly from the Apache Cassandra
debian/ contents, after a little patching out of the java dependency.

The default configuration file and data locations are identical.

Do keep in mind, if you have a custom JDK install that didn't come from
a deb package that satisfies the java dependency, the Apache Cassandra
deb will pull in OpenJDK - you can use it or not by setting the
appropriate configurations.

http://cassandra.apache.org/download/ has the Apache Cassandra 3.10
sources list and gpg key info for deb installations.

-- 
Kind regards,
Michael

On 04/07/2017 11:36 AM, daemeon reiydelle wrote:
> Having done variants of this, I would suggest you bring up new nodes at
> approximately the same Apache version as a separate data center, in your
> same cluster. Replication strategy may need to be tweaked
> 
> ***
> ...**
> 
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872*/
> /
> 
> On Fri, Apr 7, 2017 at 1:55 AM, Eren Yilmaz  > wrote:
> 
> Hi,
> 
> __ __
> 
> We have Cassandra 3.7 installation on Ubuntu, from Datastax
> distribution (using the repo). Since Datastax has announced that
> they will no longer support a community Cassandra distribution, I
> want to migrate to Apache distribution. Are there any differences
> between distributions? Can I use the upgrading procedures as
> described in
> 
> https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgrdCassandraDetails.html
> 
> ?
> 
> __ __
> 
> Thanks,
> 
> Eren
> 
> 



Re: Migrating from Datastax Distribution to Apache Cassandra

2017-04-07 Thread daemeon reiydelle
Having done variants of this, I would suggest you bring up new nodes at
approximately the same Apache version as a separate data center, in your
same cluster. Replication strategy may need to be tweaked


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Apr 7, 2017 at 1:55 AM, Eren Yilmaz 
wrote:

> Hi,
>
>
>
> We have Cassandra 3.7 installation on Ubuntu, from Datastax distribution
> (using the repo). Since Datastax has announced that they will no longer
> support a community Cassandra distribution, I want to migrate to Apache
> distribution. Are there any differences between distributions? Can I use
> the upgrading procedures as described in https://docs.datastax.com/en/
> latest-upgrade/upgrade/cassandra/upgrdCassandraDetails.html?
>
>
>
> Thanks,
>
> Eren
>


Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jonathan Haddad
Hey Jerry - very happy to hear the post answered your questions.  Alex
wrote another great post on TWCS you might find useful, since you're using
it: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html



On Fri, Apr 7, 2017 at 8:20 AM Jerry Lam  wrote:

> Hi Jon,
>
> This Cassandra community is very helpful!!! Thanks for sharing this
> blogpost with me. It answers all my questions related to TWCS with
> clustering key and limit clause!
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Apr 7, 2017 at 10:30 AM, Jon Haddad 
> wrote:
>
> Alex Dejanovski wrote a good post on how the LIMIT clause works and why it
> doesn’t (until 3.4) work the way you think it would.
>
>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
>
> On Apr 7, 2017, at 7:23 AM, Jerry Lam  wrote:
>
> Hi Jan,
>
> Thank you for the clarification and knowledge sharing.
>
> A follow-up question is:
>
> Does Cassandra need to read all sstables for customer_id = 1L if my query
> is:
>
> select view_id from customer_view where customer_id = 1L limit 1
>
> Since I have the date_day as the clustering key and it is sorted by
> descending order. I'm assuming that the above query will return the latest
> view_id for customer_id 1L.
>
> Since I'm using TWCS, does Cassandra is smart enough to just query the
> latest sstable that matches the partition key (customer_id = 1L) or it has
> to go through the entire merge process?
>
> Thank you,
>
> Jerry
>
>
> On Fri, Apr 7, 2017 at 2:08 AM,  wrote:
>
> Hi Jerry,
>
>
>
> the compaction strategy just tells Cassandra how to compact your sstables
> and with TWCS when to stop compacting further. But of course your data can
> and most likely will live in multiple sstables.
>
>
>
> The magic that happens is the the coordinator node for your request will
> merge the data for you on the fly. It is an easy job, as your data per
> sstable is already sorted.
>
>
>
> But be careful, if you end up with a worst case. If a customer_id is
> insertet every hour you can end up with reading many sstables decreasing
> read performance if the data should be kept a year or so.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Jerry Lam 
> *Gesendet: *Freitag, 7. April 2017 00:30
> *An: *user@cassandra.apache.org
> *Betreff: *How does clustering key works with
> TimeWindowCompactionStrategy (TWCS)
>
>
>
> Hi guys,
>
>
>
> I'm a new and happy user of Cassandra. We are using Cassandra for time
> series data so we choose TWCS because of its predictability and its ease of
> configuration.
>
>
>
> My question is we have a table with the following schema:
>
>
>
> CREATE TABLE IF NOT EXISTS customer_view (
>
> customer_id bigint,
>
> date_day Timestamp,
>
> view_id bigint,
>
> PRIMARY KEY (customer_id, date_day)
>
> ) WITH CLUSTERING ORDER BY (date_day DESC)
>
>
>
> What I understand is that the data will be order by date_day within the
> partition using the clustering key. However, the same customer_id can be
> inserted to this partition several times during the day and the TWCS says
> it will only compact the sstables within the window interval set in the
> configuration (in our case is 1 hour).
>
>
>
> How does Cassandra guarantee the clustering key order when the same
> customer_id appears in several sstables? Does it need to do a merge and
> then sort to find out the latest view_id for the customer_id? Or there are
> some magics happen behind the book can tell?
>
>
>
> Best Regards,
>
>
>
> Jerry
>
>
>
>
>
>
>


Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jerry Lam
Hi Jon,

This Cassandra community is very helpful!!! Thanks for sharing this
blogpost with me. It answers all my questions related to TWCS with
clustering key and limit clause!

Best Regards,

Jerry



On Fri, Apr 7, 2017 at 10:30 AM, Jon Haddad 
wrote:

> Alex Dejanovski wrote a good post on how the LIMIT clause works and why it
> doesn’t (until 3.4) work the way you think it would.
>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-
> in-cassandra-might-not-work-as-you-think.html
>
> On Apr 7, 2017, at 7:23 AM, Jerry Lam  wrote:
>
> Hi Jan,
>
> Thank you for the clarification and knowledge sharing.
>
> A follow-up question is:
>
> Does Cassandra need to read all sstables for customer_id = 1L if my query
> is:
>
> select view_id from customer_view where customer_id = 1L limit 1
>
> Since I have the date_day as the clustering key and it is sorted by
> descending order. I'm assuming that the above query will return the latest
> view_id for customer_id 1L.
>
> Since I'm using TWCS, does Cassandra is smart enough to just query the
> latest sstable that matches the partition key (customer_id = 1L) or it has
> to go through the entire merge process?
>
> Thank you,
>
> Jerry
>
>
> On Fri, Apr 7, 2017 at 2:08 AM,  wrote:
>
>> Hi Jerry,
>>
>>
>>
>> the compaction strategy just tells Cassandra how to compact your sstables
>> and with TWCS when to stop compacting further. But of course your data can
>> and most likely will live in multiple sstables.
>>
>>
>>
>> The magic that happens is the the coordinator node for your request will
>> merge the data for you on the fly. It is an easy job, as your data per
>> sstable is already sorted.
>>
>>
>>
>> But be careful, if you end up with a worst case. If a customer_id is
>> insertet every hour you can end up with reading many sstables decreasing
>> read performance if the data should be kept a year or so.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *Jerry Lam 
>> *Gesendet: *Freitag, 7. April 2017 00:30
>> *An: *user@cassandra.apache.org
>> *Betreff: *How does clustering key works with
>> TimeWindowCompactionStrategy (TWCS)
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> I'm a new and happy user of Cassandra. We are using Cassandra for time
>> series data so we choose TWCS because of its predictability and its ease of
>> configuration.
>>
>>
>>
>> My question is we have a table with the following schema:
>>
>>
>>
>> CREATE TABLE IF NOT EXISTS customer_view (
>>
>> customer_id bigint,
>>
>> date_day Timestamp,
>>
>> view_id bigint,
>>
>> PRIMARY KEY (customer_id, date_day)
>>
>> ) WITH CLUSTERING ORDER BY (date_day DESC)
>>
>>
>>
>> What I understand is that the data will be order by date_day within the
>> partition using the clustering key. However, the same customer_id can be
>> inserted to this partition several times during the day and the TWCS says
>> it will only compact the sstables within the window interval set in the
>> configuration (in our case is 1 hour).
>>
>>
>>
>> How does Cassandra guarantee the clustering key order when the same
>> customer_id appears in several sstables? Does it need to do a merge and
>> then sort to find out the latest view_id for the customer_id? Or there are
>> some magics happen behind the book can tell?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Jerry
>>
>>
>>
>
>
>


Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
I don't see an issue with the size of the data / node. You can attempt the
rebuild again and play around with throughput if your network can handle it.

It can be changed on-the-fly with nodetool:

 nodetool setstreamthroughput

This article is also worth a read -
https://support.datastax.com/hc/en-us/articles/205409646-How-to-performance-tune-data-streaming-activities-like-repair-and-bootstrap

-- Jacob Shadix

On Fri, Apr 7, 2017 at 9:23 AM, Roland Otta 
wrote:

> good point!
>
> on the source side i can see the following error
>
> ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Streaming error occurred on session with peer 10.192.116.1 through 192.168.
> 0.114
> org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
> at 
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.
> applyToChannel(BufferedDataOutputStreamPlus.java:350)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.write(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.
> OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.
> StreamMessage.serialize(StreamMessage.java:48)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.run(ConnectionHandler.java:342)
> ~[apache-cassandra-3.7.jar:3.7]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.IOException: Broken pipe
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> ~[na:1.8.0_77]
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
> ~[na:1.8.0_77]
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
> ~[na:1.8.0_77]
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
> ~[na:1.8.0_77]
> at 
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141)
> ~[apache-cassandra-3.7.jar:3.7]
> ... 10 common frames omitted
> DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Closing stream connection handler on /10.192.116.1
> INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Session with /10.192.116.1 is complete
> WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Stream failed
>
>
> the dataset is approx 300GB / Node.
>
> does that mean that cassandra does not try to reconnect (for streaming) in
> case of short network dropouts?
>
> On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
>
> Did you look at the logs on the source DC as well? How big is the dataset?
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
> wrote:
>
> Hi!
>
> we are on 3.7.
>
> we have some debug messages ... but i guess they are not related to that
> issue
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 -
> Ignoring interval time of 2002469610 for /192.168.0.27
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2598593732 for /10.192.116.4
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002612298 for /10.192.116.5
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002660534 for /10.192.116.9
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027212880 for /10.192.116.3
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027279042 for /192.168.0.188
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027313992 for 

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jon Haddad
Alex Dejanovski wrote a good post on how the LIMIT clause works and why it 
doesn’t (until 3.4) work the way you think it would.

http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html

> On Apr 7, 2017, at 7:23 AM, Jerry Lam  wrote:
> 
> Hi Jan,
> 
> Thank you for the clarification and knowledge sharing. 
> 
> A follow-up question is:
> 
> Does Cassandra need to read all sstables for customer_id = 1L if my query is:
> 
> select view_id from customer_view where customer_id = 1L limit 1
> 
> Since I have the date_day as the clustering key and it is sorted by 
> descending order. I'm assuming that the above query will return the latest 
> view_id for customer_id 1L. 
> 
> Since I'm using TWCS, does Cassandra is smart enough to just query the latest 
> sstable that matches the partition key (customer_id = 1L) or it has to go 
> through the entire merge process?
> 
> Thank you,
> 
> Jerry
> 
> 
> On Fri, Apr 7, 2017 at 2:08 AM,  > wrote:
> Hi Jerry,
> 
>  
> 
> the compaction strategy just tells Cassandra how to compact your sstables and 
> with TWCS when to stop compacting further. But of course your data can and 
> most likely will live in multiple sstables.
> 
>  
> 
> The magic that happens is the the coordinator node for your request will 
> merge the data for you on the fly. It is an easy job, as your data per 
> sstable is already sorted.
> 
>  
> 
> But be careful, if you end up with a worst case. If a customer_id is insertet 
> every hour you can end up with reading many sstables decreasing read 
> performance if the data should be kept a year or so.
> 
>  
> 
> Jan
> 
>  
> 
> Gesendet von meinem Windows 10 Phone
> 
>  
> 
> Von: Jerry Lam 
> Gesendet: Freitag, 7. April 2017 00:30
> An: user@cassandra.apache.org 
> Betreff: How does clustering key works with TimeWindowCompactionStrategy 
> (TWCS)
> 
>  
> 
> Hi guys,
> 
>  
> 
> I'm a new and happy user of Cassandra. We are using Cassandra for time series 
> data so we choose TWCS because of its predictability and its ease of 
> configuration.
> 
>  
> 
> My question is we have a table with the following schema:
> 
>  
> 
> CREATE TABLE IF NOT EXISTS customer_view (
> 
> customer_id bigint,
> 
> date_day Timestamp,
> 
> view_id bigint,
> 
> PRIMARY KEY (customer_id, date_day)
> 
> ) WITH CLUSTERING ORDER BY (date_day DESC)
> 
>  
> 
> What I understand is that the data will be order by date_day within the 
> partition using the clustering key. However, the same customer_id can be 
> inserted to this partition several times during the day and the TWCS says it 
> will only compact the sstables within the window interval set in the 
> configuration (in our case is 1 hour). 
> 
>  
> 
> How does Cassandra guarantee the clustering key order when the same 
> customer_id appears in several sstables? Does it need to do a merge and then 
> sort to find out the latest view_id for the customer_id? Or there are some 
> magics happen behind the book can tell?
> 
>  
> 
> Best Regards,
> 
>  
> 
> Jerry
> 
>  
> 
> 



Re: Node always dieing

2017-04-07 Thread Cogumelos Maravilha
There's a tweak.

I've forgot to put this in the new instance:

At /lib/udev/rules.d/

|cat ||40-vm-hotadd.rules**||# On Hyper-V and Xen Virtual Machines we
want to add memory and cpus as soon as they appear|

|ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation",
ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"|
|ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"|
|GOTO="vm_hotadd_end"|
|LABEL="vm_hotadd_apply"|
|# Memory hotadd request|
|#SUBSYSTEM=="memory", ACTION=="add",
DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state",
ATTR{state}="online"|
|# CPU hotadd request|
|SUBSYSTEM=="cpu", ACTION=="add",
DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"|
|LABEL="vm_hotadd_end"|
|#SUBSYSTEM=="memory", ACTION=="add",
DEVPATH=="/devices/system/memory/memory[0-9]*", TEST=="state",
ATTR{state}="online"|

Don't ask from where this code came from! Sharing workarounds.
On 04/06/2017 06:13 PM, Carlos Rolo wrote:
> i3 are having those issues more than the other instances it seems. Not
> the first report I heard about.
> Regards,
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>  
> Pythian - Love your data
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> _linkedin.com/in/carlosjuzarterolo
> _
> Mobile: +351 918 918 100
> www.pythian.com 
> On Thu, Apr 6, 2017 at 5:36 PM, Cogumelos Maravilha
> > wrote:
>
> Yes but this time I going to give lots of time between killing and
> pickup.
>
> Thanks a lot.
> On 04/06/2017 05:31 PM, Avi Kivity wrote:
>>
>> Your disk is bad.  Kill that instance and hope someone else gets it.
>>
>> On 04/06/2017 07:27 PM, Cogumelos Maravilha wrote:
>>>
>>> Interesting
>>>
>>> [  720.693768] blk_update_request: I/O error, dev nvme0n1,
>>> sector 1397303056 [  750.698840] blk_update_request: I/O error,
>>> dev nvme0n1, sector 1397303080 [ 1416.202103]
>>> blk_update_request: I/O error, dev nvme0n1, sector 1397303080
>>>
>>> On 04/06/2017 05:26 PM, Avi Kivity wrote:

 Is there anything in dmesg?

 On 04/06/2017 07:25 PM, Cogumelos Maravilha wrote:
>
> Now dies and restart (systemd) without logging why
>
> system.log
>
> INFO  [Native-Transport-Requests-2] 2017-04-06 16:06:55,362
> AuthCache.java:172 - (Re)initializing RolesCache (validity
> period /update interval/max entries) (2000/2000/1000) INFO 
> [main] 2017-04-06 16:17:42,535 YamlConfigurationLoader.java:89
> - Configuration location: file:/etc/cassandra/cassandra. yaml
>
> debug.log DEBUG [GossipStage:1] 2017-04-06 16:16:56,272
> FailureDetector.java:457 - Ignoring interval time of
> 2496703934 for /10.100.120.52  DEBUG
> [GossipStage:1] 2017-04-06 16:16:59,090
> FailureDetector.java:457 - Ignoring interval time of
> 2818071981 for /10.100.120.161  INFO 
> [main] 2017-04-06 16:17:42,535 YamlConfigurationLoader.java:89
> - Configuration location: file:/etc/cassandra/cassandra.yaml
> DEBUG [main] 2017-04-06 16:17:42,540
> YamlConfigurationLoader.java:108 - Loading settings from
> file:/etc/cassandra/cassandra.yaml
> On 04/06/2017 04:18 PM, Cogumelos Maravilha wrote:
>> find //mnt/cassandra// \! -user cassandra
>> nothing
>>
>> I've found some "strange" solutions on Internet
>> chmod -R 2777 /tmp
>> chmod -R 2775 cassandra folder
>>
>> Lets give some time to see the result
>> On 04/06/2017 03:14 PM, Michael Shuler wrote:
>>> All it takes is one frustrated `sudo cassandra` run. Checking only 
>>> the
>>> top level directory ownership is insufficient, since root could own
>>> files/dirs created below the top level. Find all files not owned by 
>>> user
>>> cassandra:  `find //mnt/cassandra// \! -user cassandra`
>>>
>>> Just another thought.
>>>
>>> -- Michael On 04/06/2017 05:23 AM, Cogumelos Maravilha wrote:
 From cassandra.yaml:

 hints_directory: /mnt/cassandra/hints
 data_file_directories:
 - /mnt/cassandra/data
 commitlog_directory: /mnt/cassandra/commitlog
 saved_caches_directory: /mnt/cassandra/saved_caches

 drwxr-xr-x   3 cassandra cassandra   23 Apr  5 16:03 mnt/

 drwxr-xr-x 6 cassandra cassandra  68 Apr  5 16:17 ./
 drwxr-xr-x 3 cassandra cassandra  23 Apr  5 16:03 ../
 drwxr-xr-x 2 cassandra cassandra  80 Apr  6 10:07 commitlog/
 drwxr-xr-x 8 cassandra cassandra 124 Apr  5 16:17 data/
 drwxr-xr-x 2 cassandra 

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jerry Lam
Hi Jan,

Thank you for the clarification and knowledge sharing.

A follow-up question is:

Does Cassandra need to read all sstables for customer_id = 1L if my query
is:

select view_id from customer_view where customer_id = 1L limit 1

Since I have the date_day as the clustering key and it is sorted by
descending order. I'm assuming that the above query will return the latest
view_id for customer_id 1L.

Since I'm using TWCS, does Cassandra is smart enough to just query the
latest sstable that matches the partition key (customer_id = 1L) or it has
to go through the entire merge process?

Thank you,

Jerry


On Fri, Apr 7, 2017 at 2:08 AM,  wrote:

> Hi Jerry,
>
>
>
> the compaction strategy just tells Cassandra how to compact your sstables
> and with TWCS when to stop compacting further. But of course your data can
> and most likely will live in multiple sstables.
>
>
>
> The magic that happens is the the coordinator node for your request will
> merge the data for you on the fly. It is an easy job, as your data per
> sstable is already sorted.
>
>
>
> But be careful, if you end up with a worst case. If a customer_id is
> insertet every hour you can end up with reading many sstables decreasing
> read performance if the data should be kept a year or so.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Jerry Lam 
> *Gesendet: *Freitag, 7. April 2017 00:30
> *An: *user@cassandra.apache.org
> *Betreff: *How does clustering key works with
> TimeWindowCompactionStrategy (TWCS)
>
>
>
> Hi guys,
>
>
>
> I'm a new and happy user of Cassandra. We are using Cassandra for time
> series data so we choose TWCS because of its predictability and its ease of
> configuration.
>
>
>
> My question is we have a table with the following schema:
>
>
>
> CREATE TABLE IF NOT EXISTS customer_view (
>
> customer_id bigint,
>
> date_day Timestamp,
>
> view_id bigint,
>
> PRIMARY KEY (customer_id, date_day)
>
> ) WITH CLUSTERING ORDER BY (date_day DESC)
>
>
>
> What I understand is that the data will be order by date_day within the
> partition using the clustering key. However, the same customer_id can be
> inserted to this partition several times during the day and the TWCS says
> it will only compact the sstables within the window interval set in the
> configuration (in our case is 1 hour).
>
>
>
> How does Cassandra guarantee the clustering key order when the same
> customer_id appears in several sstables? Does it need to do a merge and
> then sort to find out the latest view_id for the customer_id? Or there are
> some magics happen behind the book can tell?
>
>
>
> Best Regards,
>
>
>
> Jerry
>
>
>


Re: Copy from CSV on OS X problem with varint values <= -2^63

2017-04-07 Thread Brice Dutheil
@Boris, what formula did you use on homebrew and what is the git version of
this formula ?

Anyway the current cassandra formula is here :
https://github.com/Homebrew/homebrew-core/blob/master/Formula/cassandra.rb

I am not a Homebrew developper, the formula does a lot of facy stuff, yet I
see a resource for the python driver that seems to be 3.8.0
.
At first sight nothing on driver 3.10. Could it be a bad (brew) bottle ?

— Brice

On Fri, Apr 7, 2017 at 1:00 AM, Boris Babic  wrote:

Stefania
>
> Downloading and simply running from folder without homebrew interference
> it now looks like the driver matches what you say in the last email.
> I will try writing variants again to confirm it works.
>
> cqlsh --debug
> Using CQL driver:  apache-cassandra-3.10/bin/../lib/cassandra-driver-internal-
> only-3.7.0.post0-2481531.zip/cassandra-driver-3.7.0.post0-
> 2481531/cassandra/__init__.py'>
> Using connect timeout: 5 seconds
> Using 'utf-8' encoding
> Using ssl: False
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
>
>
>
> On Apr 6, 2017, at 11:58 AM, Stefania Alborghetti  datastax.com> wrote:
>
> It doesn't look like the embedded driver, it should come from a zip file
> labeled with version 3.7.0.post0-2481531 for cassandra 3.10:
>
> Using CQL driver:  cassandra/bin/../lib/cassandra-driver-internal-
> only-3.7.0.post0-2481531.zip/cassandra-driver-3.7.0.post0-
> 2481531/cassandra/__init__.py'>
>
> Sorry, I should have posted this example in my previous email, rather than
> an example based on the non-embedded driver.
>
> I don't know who to contact regarding homebrew installation, but you
> could download the Cassandra package, unzip it, and run cqlsh and Cassandra
> from that directory?
>
>
> On Thu, Apr 6, 2017 at 4:59 AM, Boris Babic  wrote:
> Stefania
>
> This is the output of my --debug, I never touched CQLSH_NO_BUNDLED and did
> not know about it.
> As you can see I have used homebrew to install Cassandra and looks like
> its the embedded version as it sits under the Cassandra folder ?
>
> cqlsh --debug
> Using CQL driver:  3.10_1/libexec/vendor/lib/python2.7/site-packages/cassandra/__init__.pyc'>
> Using connect timeout: 5 seconds
> Using 'utf-8' encoding
> Using ssl: False
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
> Use HELP for help.
>
>
> On Apr 5, 2017, at 12:07 PM, Stefania Alborghetti  datastax.com> wrote:
>
> You are welcome.
>
> I traced the problem to a commit of the Python driver that shipped in
> version 3.8 of the driver. It is fixed in 3.8.1. More details
> on CASSANDRA-13408. I don't think it's related to the OS.
>
> Since Cassandra 3.10 ships with an older version of the driver embedded in
> a zip file in the lib folder, and this version is not affected,
> I'm guessing that either the embedded version does not work on OS X, or you
> are manually using a different version of the driver by
> setting CQLSH_NO_BUNDLED (which is why I could reproduce it on my laptop).
>
> You can run cqlsh with --debug to see the version of the driver that cqlsh
> is using, for example:
>
> cqlsh --debug
> Using CQL driver:  dist-packages/cassandra_driver-3.8.1-py2.7-linux-x86_
> 64.egg/cassandra/__init__.pyc'>
>
> Can you confirm if you were overriding the Python driver by setting
> CQLSH_NO_BUNDLED and the version of the driver?
>
>
>
> On Tue, Apr 4, 2017 at 6:12 PM, Boris Babic  wrote:
> Thanks Stefania, going from memory don't think I noticed this on windows
> but haven't got a machine handy to test it on at the moment.
>
> On Apr 4, 2017, at 19:44, Stefania Alborghetti  datastax.com> wrote:
>
> I've reproduced the same problem on Linux, and I've opened
> CASSANDRA-13408. As a workaround, disable prepared statements and it
> will work (WITH HEADER = TRUE AND PREPAREDSTATEMENTS = False).
>
> On Tue, Apr 4, 2017 at 5:02 PM, Boris Babic  wrote:
>
> On Apr 4, 2017, at 7:00 PM, Boris Babic  wrote:
>
> Hi
>
> I’m testing the write of various datatypes on OS X for fun running
> cassandra 3.10 on a single laptop instance, and from what i can see
> varint should map to java.math.BigInteger and have no problems with
> Long.MIN_VALE , -9223372036854775808, but i can’t see what I’m doing wrong.
>
> cqlsh: 5.0.1
> cassandra 3.10
> osx el capitan.
>
> data.csv:
>
> id,varint
> -2147483648 <(214)%20748-3648>,-9223372036854775808
> 2147483647 <(214)%20748-3647>,9223372036854775807
>
> COPY mykeyspace.data (id,varint) FROM 'data.csv' WITH HEADER=true;
>
>   Failed to make batch statement: Received an argument of invalid type
> for column "varint". Expected: ,
> Got: ; (descriptor 'bit_length' requires a 'int' object 

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
good point!

on the source side i can see the following error

ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Streaming error occurred on session with peer 10.192.116.1 through 192.168.
0.114
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145) 
~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:350)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
 ~[apache-cassandra-3.7.jar:3.7]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_77]
at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) 
~[na:1.8.0_77]
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) 
~[na:1.8.0_77]
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) 
~[na:1.8.0_77]
at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141) 
~[apache-cassandra-3.7.jar:3.7]
... 10 common frames omitted
DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Closing stream connection handler on /10.192.116.1
INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Session with /10.192.116.1 is complete
WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Stream failed


the dataset is approx 300GB / Node.

does that mean that cassandra does not try to reconnect (for streaming) in case 
of short network dropouts?

On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
Did you look at the logs on the source DC as well? How big is the dataset?

-- Jacob Shadix

On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
> wrote:
Hi!

we are on 3.7.

we have some debug messages ... but i guess they are not related to that issue
DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - 
Ignoring interval time of 2002469610 for /192.168.0.27
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2598593732 for /10.192.116.4
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002612298 for /10.192.116.5
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002660534 for /10.192.116.9
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027212880 for /10.192.116.3
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027279042 for /192.168.0.188
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027313992 for /10.192.116.10

beside that the debug.log is clean

all the mentioned cassandra.yml parameters are the shipped defaults 
(streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
i also checked the pending compactions. there are no pending compactions at the 
moment.

bg - roland otta

On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
What version are you running? Do you see any errors in the system.log 
(SocketTimeout, for instance)?

And 

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
Did you look at the logs on the source DC as well? How big is the dataset?

-- Jacob Shadix

On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
wrote:

> Hi!
>
> we are on 3.7.
>
> we have some debug messages ... but i guess they are not related to that
> issue
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 -
> Ignoring interval time of 2002469610 for /192.168.0.27
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2598593732 for /10.192.116.4
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002612298 for /10.192.116.5
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002660534 for /10.192.116.9
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027212880 for /10.192.116.3
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027279042 for /192.168.0.188
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027313992 for /10.192.116.10
>
> beside that the debug.log is clean
>
> all the mentioned cassandra.yml parameters are the shipped defaults (
> streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
> i also checked the pending compactions. there are no pending compactions
> at the moment.
>
> bg - roland otta
>
> On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
>
> What version are you running? Do you see any errors in the system.log
> (SocketTimeout, for instance)?
>
> And what values do you have for the following in cassandra.yaml:
> - - stream_throughput_outbound_megabits_per_sec
> - - compaction_throughput_mb_per_sec
> - - streaming_socket_timeout_in_ms
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
> wrote:
>
> hi,
>
> we are trying to setup a new datacenter and are initalizing the data
> with nodetool rebuild.
>
> after some hours it seems that the node stopped streaming (at least
> there is no more streaming traffic on the network interface).
>
> nodetool netstats shows that the streaming is still in progress
>
> Mode: NORMAL
> Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
> Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
> /192.168.0.26
> Receiving 257 files, 145444246572 bytes total. Already received
> 1 files, 1744027 bytes total
> bds/adcounter_total 76456/47310255 bytes(0%) received from
> idx:0/192.168.0.26
> bds/upselling_event 1667571/1667571 bytes(100%) received
> from idx:0/192.168.0.26
> /192.168.0.188
> /192.168.0.27
> Receiving 169 files, 79355302464 bytes total. Already received
> 1 files, 81585975 bytes total
> bds/ad_event_history 81585975/81585975 bytes(100%) received
> from idx:0/192.168.0.27
> /192.168.0.189
> Receiving 140 files, 19673034809 bytes total. Already received
> 1 files, 5996604 bytes total
> bds/adcounter_per_day 5956840/42259846 bytes(14%) received
> from idx:0/192.168.0.189
> bds/user_event 39764/39764 bytes(100%) received from
> idx:0/192.168.0.189
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  3 0
> Small messages  n/a 0   68632465 0
> Gossip messages n/a 0 217661 0
>
>
>
> it is in that state for approx 15 hours now
>
> does it make sense waiting for the streaming to finish or do i have to
> restart the node, discard data and restart the rebuild?
>
>
>


Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
Hi!

we are on 3.7.

we have some debug messages ... but i guess they are not related to that issue
DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - 
Ignoring interval time of 2002469610 for /192.168.0.27
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2598593732 for /10.192.116.4
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002612298 for /10.192.116.5
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002660534 for /10.192.116.9
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027212880 for /10.192.116.3
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027279042 for /192.168.0.188
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027313992 for /10.192.116.10

beside that the debug.log is clean

all the mentioned cassandra.yml parameters are the shipped defaults 
(streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
i also checked the pending compactions. there are no pending compactions at the 
moment.

bg - roland otta

On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
What version are you running? Do you see any errors in the system.log 
(SocketTimeout, for instance)?

And what values do you have for the following in cassandra.yaml:
- - stream_throughput_outbound_megabits_per_sec
- - compaction_throughput_mb_per_sec
- - streaming_socket_timeout_in_ms

-- Jacob Shadix

On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
> wrote:
hi,

we are trying to setup a new datacenter and are initalizing the data
with nodetool rebuild.

after some hours it seems that the node stopped streaming (at least
there is no more streaming traffic on the network interface).

nodetool netstats shows that the streaming is still in progress

Mode: NORMAL
Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
/192.168.0.26
Receiving 257 files, 145444246572 bytes total. Already received
1 files, 1744027 bytes total
bds/adcounter_total 76456/47310255 bytes(0%) received from
idx:0/192.168.0.26
bds/upselling_event 1667571/1667571 bytes(100%) received
from idx:0/192.168.0.26
/192.168.0.188
/192.168.0.27
Receiving 169 files, 79355302464 bytes total. Already received
1 files, 81585975 bytes total
bds/ad_event_history 81585975/81585975 bytes(100%) received
from idx:0/192.168.0.27
/192.168.0.189
Receiving 140 files, 19673034809 bytes total. Already received
1 files, 5996604 bytes total
bds/adcounter_per_day 5956840/42259846 bytes(14%) received
from idx:0/192.168.0.189
bds/user_event 39764/39764 bytes(100%) received from
idx:0/192.168.0.189
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 2  3 0
Small messages  n/a 0   68632465 0
Gossip messages n/a 0 217661 0



it is in that state for approx 15 hours now

does it make sense waiting for the streaming to finish or do i have to
restart the node, discard data and restart the rebuild?




Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
What version are you running? Do you see any errors in the system.log
(SocketTimeout, for instance)?

And what values do you have for the following in cassandra.yaml:
- - stream_throughput_outbound_megabits_per_sec
- - compaction_throughput_mb_per_sec
- - streaming_socket_timeout_in_ms

-- Jacob Shadix

On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
wrote:

> hi,
>
> we are trying to setup a new datacenter and are initalizing the data
> with nodetool rebuild.
>
> after some hours it seems that the node stopped streaming (at least
> there is no more streaming traffic on the network interface).
>
> nodetool netstats shows that the streaming is still in progress
>
> Mode: NORMAL
> Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
> Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
> /192.168.0.26
> Receiving 257 files, 145444246572 bytes total. Already received
> 1 files, 1744027 bytes total
> bds/adcounter_total 76456/47310255 bytes(0%) received from
> idx:0/192.168.0.26
> bds/upselling_event 1667571/1667571 bytes(100%) received
> from idx:0/192.168.0.26
> /192.168.0.188
> /192.168.0.27
> Receiving 169 files, 79355302464 bytes total. Already received
> 1 files, 81585975 bytes total
> bds/ad_event_history 81585975/81585975 bytes(100%) received
> from idx:0/192.168.0.27
> /192.168.0.189
> Receiving 140 files, 19673034809 bytes total. Already received
> 1 files, 5996604 bytes total
> bds/adcounter_per_day 5956840/42259846 bytes(14%) received
> from idx:0/192.168.0.189
> bds/user_event 39764/39764 bytes(100%) received from
> idx:0/192.168.0.189
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  3 0
> Small messages  n/a 0   68632465 0
> Gossip messages n/a 0 217661 0
>
>
>
> it is in that state for approx 15 hours now
>
> does it make sense waiting for the streaming to finish or do i have to
> restart the node, discard data and restart the rebuild?
>


cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
hi,

we are trying to setup a new datacenter and are initalizing the data
with nodetool rebuild.

after some hours it seems that the node stopped streaming (at least
there is no more streaming traffic on the network interface).

nodetool netstats shows that the streaming is still in progress

Mode: NORMAL
Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
/192.168.0.26
Receiving 257 files, 145444246572 bytes total. Already received
1 files, 1744027 bytes total
bds/adcounter_total 76456/47310255 bytes(0%) received from
idx:0/192.168.0.26
bds/upselling_event 1667571/1667571 bytes(100%) received
from idx:0/192.168.0.26
/192.168.0.188
/192.168.0.27
Receiving 169 files, 79355302464 bytes total. Already received
1 files, 81585975 bytes total
bds/ad_event_history 81585975/81585975 bytes(100%) received
from idx:0/192.168.0.27
/192.168.0.189
Receiving 140 files, 19673034809 bytes total. Already received
1 files, 5996604 bytes total
bds/adcounter_per_day 5956840/42259846 bytes(14%) received
from idx:0/192.168.0.189
bds/user_event 39764/39764 bytes(100%) received from
idx:0/192.168.0.189
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 2  3 0
Small messages  n/a 0   68632465 0
Gossip messages n/a 0 217661 0



it is in that state for approx 15 hours now

does it make sense waiting for the streaming to finish or do i have to
restart the node, discard data and restart the rebuild?


Migrating from Datastax Distribution to Apache Cassandra

2017-04-07 Thread Eren Yilmaz
Hi,

We have Cassandra 3.7 installation on Ubuntu, from Datastax distribution (using 
the repo). Since Datastax has announced that they will no longer support a 
community Cassandra distribution, I want to migrate to Apache distribution. Are 
there any differences between distributions? Can I use the upgrading procedures 
as described in 
https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgrdCassandraDetails.html?

Thanks,
Eren


Re: The changing clustering key

2017-04-07 Thread Monmohan Singh
*"your primary goal is to fetch a user by dept_id and user_id and
additionally keep versions of the user data?"*
My primary goal was to just fetch users for a dept, sorted by modified
date. Now the limitation from cassandra that mod_date can't be a clustering
key if it can be updated forces me to have all versions.  I was looking for
any standard design practices around this , because it seems very common
use case to me.

Thanks for the link, I do understand the purpose of different keys :)
Regards
Monmohan


On Fri, 7 Apr 2017 at 13:57  wrote:

> Hi,
>
>
>
> your primary goal is to fetch a user by dept_id and user_id and
> additionally keep versions of the user data?
>
>
>
> {
>
>dept_id text,
>
>user_id text,
>
>mod_date timestamp,
>
>user_name text,
>
>PRIMARY KEY ((dept_id,user_id), mod_date)
>
>WITH CLUSTERING ORDER BY (mod_date DESC);
>
> }
>
>
>
> There is a difference between partition key and cluster keys. My suggestion
> will end up with all versions of a particular (dept_id,user_id) on a
> partition (say node) and all versions of your data on that portion in
> descending order by mod_date.
>
>
>
> For a normal loopkup you do not need to know mod_date, a simple SELECT *
> FROM users WHERE dept_id=foo and user_id=bar LIMIT 1 will do.
>
>
>
> http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/
>
>
>
>
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Monmohan Singh 
> *Gesendet: *Donnerstag, 6. April 2017 13:54
> *An: *user@cassandra.apache.org
> *Betreff: *The changing clustering key
>
>
>
> Dear Cassandra experts,
>
> I have a data modeling question for cases where data needs to be sorted by
> keys which can be modified.
>
> So , say we have a user table
>
> {
>
>dept_id text,
>
>user_id text,
>
>user_name text,
>
>mod_date timestamp
>
>PRIMARY KEY (dept_id,user_id)
>
> }
>
> Now I can query cassandra to get all users by a dept_id
>
> What if I wanted to query to get all users in a dept, sorted by mod_date.
>
> So, one way would be to
>
> {
>
>dept_id text,
>
>user_id text,
>
>mod_date timestamp,
>
>user_name text,
>
>PRIMARY KEY (dept_id,user_id, mod_date)
>
> }
>
> But, mod_date changes every time user name is updated. So it can't be part
> of clustering key.
>
>
>
> Attempt 1:  Don't update the row but instead create new record for every
> update. So, say the record for user foo is like below
>
> {'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to
> 'bar' and then to 'baz' . In that case we add another row to table, so the
> table data would look like
>
>
>
> {'dept_id1','user_id1',TimeStamp3','baz'}
>
> {'dept_id1','user_id1',TimeStamp2','bar'}
>
> {'dept_id1','user_id1',TimeStamp1','foo'}
>
>
>
> Now we can get all users in a dept, sorted by mod_date but it presents a
> different problem. The data returned is duplicated.
>
>
>
> Attempt 2 : Add another column to identify the head record much like a
> linked list
>
> {
>
>dept_id text,
>
>user_id text,
>
>mod_date timestamp,
>
>user_name text,
>
>next_record text
>
>PRIMARY KEY (user_id,user_id, mod_date)
>
> }
>
> Every time an update happens it adds a row and also adds the PK of new
> record except in the latest record.
>
>
>
> {'dept_id1','user_id1',TimeStamp3','baz','HEAD'}
>
> {'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'}
>
> {'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'}
>
> and also add a secondary index to 'next_record' column.
>
>
>
> Now I can support get all users in a dept, sorted by mod_date by
>
> SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by
> mod_date.
>
>
>
> But it looks fairly involved solution and perhaps I am missing something ,
> a simpler solution ..
>
>
>
> The other option is delete and insert but for high frequency changes I
> think Cassandra has issues with tombstones.
>
>
>
> Thanks for helping on this.
>
> Regards
>
> Monmohan
>


AW: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread j.kesten
Hi Jerry,

the compaction strategy just tells Cassandra how to compact your sstables and 
with TWCS when to stop compacting further. But of course your data can and most 
likely will live in multiple sstables. 

The magic that happens is the the coordinator node for your request will merge 
the data for you on the fly. It is an easy job, as your data per sstable is 
already sorted.

But be careful, if you end up with a worst case. If a customer_id is insertet 
every hour you can end up with reading many sstables decreasing read 
performance if the data should be kept a year or so.

Jan

Gesendet von meinem Windows 10 Phone

Von: Jerry Lam
Gesendet: Freitag, 7. April 2017 00:30
An: user@cassandra.apache.org
Betreff: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Hi guys,

I'm a new and happy user of Cassandra. We are using Cassandra for time series 
data so we choose TWCS because of its predictability and its ease of 
configuration.

My question is we have a table with the following schema:

CREATE TABLE IF NOT EXISTS customer_view (
customer_id bigint,
date_day Timestamp,
view_id bigint,
PRIMARY KEY (customer_id, date_day)
) WITH CLUSTERING ORDER BY (date_day DESC)

What I understand is that the data will be order by date_day within the 
partition using the clustering key. However, the same customer_id can be 
inserted to this partition several times during the day and the TWCS says it 
will only compact the sstables within the window interval set in the 
configuration (in our case is 1 hour). 

How does Cassandra guarantee the clustering key order when the same customer_id 
appears in several sstables? Does it need to do a merge and then sort to find 
out the latest view_id for the customer_id? Or there are some magics happen 
behind the book can tell?

Best Regards,

Jerry