Re: large system hint partition

2016-09-18 Thread Ezra Stuetzel
Yeah I tried that, but oddly the table had nothing in it.

I changed the compaction strategy from leveled to sizetierd and ran a major
compaction on each node. I haven't seen the message logged on any node in a
few days which makes me think that fixed it because it is normally logged
multiple times per day.

On Sun, Sep 18, 2016 at 4:29 AM, Carlos Alonso  wrote:

> By inspecting the contents on your system.hints table, specifically the
> host_id column, you can see which is the destination host of those hints
> and check if it is one of the alive or dead ones.
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 18 September 2016 at 04:35, Ezra Stuetzel 
> wrote:
>
>> Hey Nicolas,
>>
>> There are no dead nodes. 'nodetool status' and 'nodetool describecluster'
>> both show 4 healthy nodes. In the past we have had some nodes we eliminated
>> by using 'nodetool assassinate'. However, I checked system.peers table on
>> all 4 of our nodes and they each show 3 peers as expected. So it doesn't
>> appear that any nodes have any awareness of an unreachable node which could
>> be causing hints to back up. Any ideas for further troubleshooting what the
>> hints are?
>>
>> Thanks,
>> Ezra
>>
>> On Fri, Sep 16, 2016 at 4:13 PM, Nicolas Douillet <
>> nicolas.douil...@gmail.com> wrote:
>>
>>> Hi Erza,
>>>
>>> Have you a dead node in your cluster?
>>> Because the coordinator stores a hint about dead replicas in the local
>>> system.hints when a node is dead or didn't respond to a write request.
>>>
>>> --
>>> Nicolas
>>>
>>>
>>>
>>> Le sam. 17 sept. 2016 à 00:12, Ezra Stuetzel 
>>> a écrit :
>>>
 What would be the likely causes of large system hint partitions?
 Normally large partition warnings are for user defined tables which they
 are writing large partitions to. In this case, it appears C* is writing
 large partitions to the system.hints table. Gossip is not backed up.

 version: C* 2.2.7

 WARN  [MemtableFlushWriter:134] 2016-09-16 04:27:39,220
 BigTableWriter.java:184 - Writing large partition
 system/hints:7ce838aa-f30f-494a-8caa-d44d1440e48b (128181097 bytes)


 Thanks,

 Ezra

>>>
>>
>


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Created CASSANDRA-12663
 pls feel free to
make edits. From a birds eye view it seems a bit ineffecient to keep doing
computations and generating data which may not be put to use. (A user may
never read via Secondary Indices on primary transactional DC but he/she is
currently forced to create them on every dc in cluster).

On Mon, Sep 19, 2016 at 1:05 AM, Jonathan Haddad  wrote:

> I don't see why having per DC indexes would be an issue, from a technical
> standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
> doesn't exist already).  Post back to the ML with the issue #.
>
> On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal  wrote:
>
>> Can it be possible with change log feature implemented in CASSANDRA-8844
>> ?  i.e. to have
>> two clusters (With different schema definitions for secondary indices) and
>> segregating analytics workload on the other cluster with CDC log shipper
>> enabled on parent DC which is taking care of transactional workload?
>>
>> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha 
>> wrote:
>>
>>> Only way I know is in elassandra .
>>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>>> cassandra (having only data).
>>>
>>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal 
>>> wrote:
>>>
 Hi,

 Is it possible to have secondary indices (SASI or native ones) defined
 on a table restricted to a particular DC? For instance it is very much
 possible in mysql to have a parent server on which writes are being done
 without any indices (other than the required ones), and to have indices on
 replica db's, this helps the parent database to be lightweight and free
 from building secondary index on every write.

 For analytics & auditing purposes it is essential to serve different
 access patterns than that modeled from a partition key fetch perspective,
 although a limited reads are needed by users but if enabled cluster wide it
 will require index write for every row written on that table on every
 single node on every DC even the one which may be serving read operations.

 What could be the potential means to solve this problem inside of
 cassandra (Not having to ship off the data into elasticsearch etc).

 Best Regards,
 Bhuvan

>>>
>>>
>>


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Jonathan Haddad
I don't see why having per DC indexes would be an issue, from a technical
standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
doesn't exist already).  Post back to the ML with the issue #.

On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal  wrote:

> Can it be possible with change log feature implemented in CASSANDRA-8844
> ?  i.e. to have two
> clusters (With different schema definitions for secondary indices) and
> segregating analytics workload on the other cluster with CDC log shipper
> enabled on parent DC which is taking care of transactional workload?
>
> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha 
> wrote:
>
>> Only way I know is in elassandra .
>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>> cassandra (having only data).
>>
>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal 
>> wrote:
>>
>>> Hi,
>>>
>>> Is it possible to have secondary indices (SASI or native ones) defined
>>> on a table restricted to a particular DC? For instance it is very much
>>> possible in mysql to have a parent server on which writes are being done
>>> without any indices (other than the required ones), and to have indices on
>>> replica db's, this helps the parent database to be lightweight and free
>>> from building secondary index on every write.
>>>
>>> For analytics & auditing purposes it is essential to serve different
>>> access patterns than that modeled from a partition key fetch perspective,
>>> although a limited reads are needed by users but if enabled cluster wide it
>>> will require index write for every row written on that table on every
>>> single node on every DC even the one which may be serving read operations.
>>>
>>> What could be the potential means to solve this problem inside of
>>> cassandra (Not having to ship off the data into elasticsearch etc).
>>>
>>> Best Regards,
>>> Bhuvan
>>>
>>
>>
>


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Can it be possible with change log feature implemented in CASSANDRA-8844
?  i.e. to have two
clusters (With different schema definitions for secondary indices) and
segregating analytics workload on the other cluster with CDC log shipper
enabled on parent DC which is taking care of transactional workload?

On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha 
wrote:

> Only way I know is in elassandra .
> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
> cassandra (having only data).
>
> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal  wrote:
>
>> Hi,
>>
>> Is it possible to have secondary indices (SASI or native ones) defined on
>> a table restricted to a particular DC? For instance it is very much
>> possible in mysql to have a parent server on which writes are being done
>> without any indices (other than the required ones), and to have indices on
>> replica db's, this helps the parent database to be lightweight and free
>> from building secondary index on every write.
>>
>> For analytics & auditing purposes it is essential to serve different
>> access patterns than that modeled from a partition key fetch perspective,
>> although a limited reads are needed by users but if enabled cluster wide it
>> will require index write for every row written on that table on every
>> single node on every DC even the one which may be serving read operations.
>>
>> What could be the potential means to solve this problem inside of
>> cassandra (Not having to ship off the data into elasticsearch etc).
>>
>> Best Regards,
>> Bhuvan
>>
>
>


Re: Having secondary indices limited to analytics dc

2016-09-18 Thread Dorian Hoxha
Only way I know is in elassandra .
You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
cassandra (having only data).

On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal  wrote:

> Hi,
>
> Is it possible to have secondary indices (SASI or native ones) defined on
> a table restricted to a particular DC? For instance it is very much
> possible in mysql to have a parent server on which writes are being done
> without any indices (other than the required ones), and to have indices on
> replica db's, this helps the parent database to be lightweight and free
> from building secondary index on every write.
>
> For analytics & auditing purposes it is essential to serve different
> access patterns than that modeled from a partition key fetch perspective,
> although a limited reads are needed by users but if enabled cluster wide it
> will require index write for every row written on that table on every
> single node on every DC even the one which may be serving read operations.
>
> What could be the potential means to solve this problem inside of
> cassandra (Not having to ship off the data into elasticsearch etc).
>
> Best Regards,
> Bhuvan
>


Having secondary indices limited to analytics dc

2016-09-18 Thread Bhuvan Rawal
Hi,

Is it possible to have secondary indices (SASI or native ones) defined on a
table restricted to a particular DC? For instance it is very much possible
in mysql to have a parent server on which writes are being done without any
indices (other than the required ones), and to have indices on replica
db's, this helps the parent database to be lightweight and free from
building secondary index on every write.

For analytics & auditing purposes it is essential to serve different access
patterns than that modeled from a partition key fetch perspective, although
a limited reads are needed by users but if enabled cluster wide it will
require index write for every row written on that table on every single
node on every DC even the one which may be serving read operations.

What could be the potential means to solve this problem inside of cassandra
(Not having to ship off the data into elasticsearch etc).

Best Regards,
Bhuvan


Nodetool repair

2016-09-18 Thread Lokesh Shrivastava
Hi,

I tried to run nodetool repair command on one of my keyspaces and found
that it took lot more time than I anticipated. Is there a way to know in
advance the ETA of manual repair before triggering it? I believe repair
performs following operations -

1) Major compaction
2) Exchange of merkle trees with neighbouring nodes.

Is there any other operation performed during manual repair? What if I kill
the process in the middle?

Thanks.
Lokesh


Re: large system hint partition

2016-09-18 Thread Carlos Alonso
By inspecting the contents on your system.hints table, specifically the
host_id column, you can see which is the destination host of those hints
and check if it is one of the alive or dead ones.

Carlos Alonso | Software Engineer | @calonso 

On 18 September 2016 at 04:35, Ezra Stuetzel 
wrote:

> Hey Nicolas,
>
> There are no dead nodes. 'nodetool status' and 'nodetool describecluster'
> both show 4 healthy nodes. In the past we have had some nodes we eliminated
> by using 'nodetool assassinate'. However, I checked system.peers table on
> all 4 of our nodes and they each show 3 peers as expected. So it doesn't
> appear that any nodes have any awareness of an unreachable node which could
> be causing hints to back up. Any ideas for further troubleshooting what the
> hints are?
>
> Thanks,
> Ezra
>
> On Fri, Sep 16, 2016 at 4:13 PM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>> Hi Erza,
>>
>> Have you a dead node in your cluster?
>> Because the coordinator stores a hint about dead replicas in the local
>> system.hints when a node is dead or didn't respond to a write request.
>>
>> --
>> Nicolas
>>
>>
>>
>> Le sam. 17 sept. 2016 à 00:12, Ezra Stuetzel 
>> a écrit :
>>
>>> What would be the likely causes of large system hint partitions?
>>> Normally large partition warnings are for user defined tables which they
>>> are writing large partitions to. In this case, it appears C* is writing
>>> large partitions to the system.hints table. Gossip is not backed up.
>>>
>>> version: C* 2.2.7
>>>
>>> WARN  [MemtableFlushWriter:134] 2016-09-16 04:27:39,220
>>> BigTableWriter.java:184 - Writing large partition
>>> system/hints:7ce838aa-f30f-494a-8caa-d44d1440e48b (128181097 bytes)
>>>
>>>
>>> Thanks,
>>>
>>> Ezra
>>>
>>
>