Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
Hi Jan,
I checked it. There are no old Key Spaces or tables.
Thanks for your pointer, I started looking inside the directories. I see
lot of snapshots directory inside the table directory. These directories
are consuming space.

However these snapshots are not shown  when I issue listsnapshots
./bin/nodetool listsnapshots
Snapshot Details:
There are no snapshots

Can I safely delete those snapshots? why listsnapshots is not showing the
snapshots? Also in future, how can we find out if there are snapshots?

Thanks,
Rahul



On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  wrote:

> Hi Rahul,
>
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from old
> keyspaces that have been deleted and snapshoted before. Try something like
> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
> your space.
>
> Jan
>
> Von meinem iPhone gesendet
>
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh :
>
> Thanks for your suggestion.
>
> Compaction was happening on one of the large tables. The disk space did
> not decrease much after the compaction. So I ran an external compaction.
> The disk space decreased by around 10%. However it is still consuming close
> to 750Gb for load of 250Gb.
>
> I even restarted cassandra thinking there may be some open files. However
> it didnt help much.
>
> Is there any way to find out why so much of data is being consumed?
>
> I checked if there are any open files using lsof. There are not any open
> files.
>
> *Recovery:*
> Just a wild thought
> I am using replication factor of 2 and I have two nodes. If I delete
> complete data on one of the node, will I be able to recover all the data
> from the active node?
> I don't want to pursue this path as I want to find out the root cause of
> the issue!
>
>
> Any help will be greatly appreciated
>
> Thank you,
>
> Rahul
>
>
>
>
>
>
> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:
>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But
>> I think Carlos Alonso might be correct. Running compactions might be the
>> issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>> *linkedin.com/in/carlosjuzarterolo
>> *
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso 
>> wrote:
>>
>>> I'd have a look also at possible running compactions.
>>>
>>> If you have big column families with STCS then large compactions may be
>>> happening.
>>>
>>> Check it with nodetool compactionstats
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
>>>
 Have you tried restarting? It's possible there's open file handles to
 sstables that have been compacted away. You can verify by doing lsof and
 grepping for DEL or deleted.

 If it's not that, you can run nodetool cleanup on each node to scan all
 of the sstables on disk and remove anything that it's not responsible for.
 Generally this would only work if you added nodes recently.


 On Tuesday, January 12, 2016, Rahul Ramesh  wrote:

> We have a 2 node Cassandra cluster with a replication factor of 2.
>
> The load factor on the nodes is around 350Gb
>
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns
>  Token
>
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>   -7068746880841807701
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>   -5072018636360415943
>
> However,if I use df -h,
>
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
>
>
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
> one of the machine and in another machine it is close to 650Gb.
>
> I started repair 2 days ago, after running repair, the amount of disk
> space consumption has actually increased.
> I also checked if this is because of snapshots. nodetool listsnapshot
> intermittently lists a snapshot but it goes away after sometime.
>
> Can somebody please help me understand,
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
>
>
> Thanks,
> Rahul
>
>
>>>
>>
>> --
>>
>>
>>
>>
>


Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Rahul Ramesh
One update. I cleared the snapshot using nodetool clearsnapshot command.
Disk space is recovered now.

Because of this issue, I have mounted one more drive to the server and
there are some data files there. How can I migrate the data so that I can
decommission the drive?
Will it work if I just copy all the contents in the table directory to one
of the drives?

Thanks for all the help.

Regards,
Rahul

On Thursday 14 January 2016, Rahul Ramesh  wrote:

> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I see
> lot of snapshots directory inside the table directory. These directories
> are consuming space.
>
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details:
> There are no snapshots
>
> Can I safely delete those snapshots? why listsnapshots is not showing the
> snapshots? Also in future, how can we find out if there are snapshots?
>
> Thanks,
> Rahul
>
>
>
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  > wrote:
>
>> Hi Rahul,
>>
>> just an idea, did you have a look at the data directorys on disk
>> (/var/lib/cassandra/data)? It could be that there are some from old
>> keyspaces that have been deleted and snapshoted before. Try something like
>> "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming
>> your space.
>>
>> Jan
>>
>> Von meinem iPhone gesendet
>>
>> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh > >:
>>
>> Thanks for your suggestion.
>>
>> Compaction was happening on one of the large tables. The disk space did
>> not decrease much after the compaction. So I ran an external compaction.
>> The disk space decreased by around 10%. However it is still consuming close
>> to 750Gb for load of 250Gb.
>>
>> I even restarted cassandra thinking there may be some open files. However
>> it didnt help much.
>>
>> Is there any way to find out why so much of data is being consumed?
>>
>> I checked if there are any open files using lsof. There are not any open
>> files.
>>
>> *Recovery:*
>> Just a wild thought
>> I am using replication factor of 2 and I have two nodes. If I delete
>> complete data on one of the node, will I be able to recover all the data
>> from the active node?
>> I don't want to pursue this path as I want to find out the root cause of
>> the issue!
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote:
>>
>>> You can check if the snapshot exists in the snapshot folder.
>>> Repairs stream sstables over, than can temporary increase disk space.
>>> But I think Carlos Alonso might be correct. Running compactions might be
>>> the issue.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>>> *linkedin.com/in/carlosjuzarterolo
>>> *
>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>> www.pythian.com
>>>
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso >> > wrote:
>>>
 I'd have a look also at possible running compactions.

 If you have big column families with STCS then large compactions may be
 happening.

 Check it with nodetool compactionstats

 Carlos Alonso | Software Engineer | @calonso
 

 On 13 January 2016 at 05:22, Kevin O'Connor > wrote:

> Have you tried restarting? It's possible there's open file handles to
> sstables that have been compacted away. You can verify by doing lsof and
> grepping for DEL or deleted.
>
> If it's not that, you can run nodetool cleanup on each node to scan
> all of the sstables on disk and remove anything that it's not responsible
> for. Generally this would only work if you added nodes recently.
>
>
> On Tuesday, January 12, 2016, Rahul Ramesh  > wrote:
>
>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>
>> The load factor on the nodes is around 350Gb
>>
>> Datacenter: Cassandra
>> ==
>> Address  RackStatus State   LoadOwns
>>Token
>>
>>   -5072018636360415943
>> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>>   -7068746880841807701
>> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>>   -5072018636360415943

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Jan Kesten
Hi Rahul,

it should work as you would expect - simply copy over the sstables from
your extra disk to the original one. To minimize downtime of the node
you can do something like this:

- rsync the files while the node is still running (sstables are
immutable) to copy most of the data
- edit cassandra.yaml to remove the additional datadir
- shutdown the node
- rsync again (just for the case, a new sstable got written while the
first one was running)
- restart

HTH
Jan

Am 14.01.2016 um 08:38 schrieb Rahul Ramesh:
> One update. I cleared the snapshot using nodetool clearsnapshot command.
> Disk space is recovered now. 
> 
> Because of this issue, I have mounted one more drive to the server and
> there are some data files there. How can I migrate the data so that I
> can decommission the drive? 
> Will it work if I just copy all the contents in the table directory to
> one of the drives? 
> 
> Thanks for all the help.
> 
> Regards,
> Rahul
> 
> On Thursday 14 January 2016, Rahul Ramesh  > wrote:
> 
> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I
> see lot of snapshots directory inside the table directory. These
> directories are consuming space.
> 
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details: 
> There are no snapshots
> 
> Can I safely delete those snapshots? why listsnapshots is not
> showing the snapshots? Also in future, how can we find out if there
> are snapshots?
> 
> Thanks,
> Rahul
> 
> 
> 
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  > wrote:
> 
> Hi Rahul,
> 
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from
> old keyspaces that have been deleted and snapshoted before. Try
> something like "du -sh /var/lib/cassandra/data/*" to verify
> which keyspace is consuming your space.
> 
> Jan
> 
> Von meinem iPhone gesendet
> 
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh  >:
> 
>> Thanks for your suggestion. 
>>
>> Compaction was happening on one of the large tables. The disk
>> space did not decrease much after the compaction. So I ran an
>> external compaction. The disk space decreased by around 10%.
>> However it is still consuming close to 750Gb for load of 250Gb. 
>>
>> I even restarted cassandra thinking there may be some open
>> files. However it didnt help much. 
>>
>> Is there any way to find out why so much of data is being
>> consumed? 
>>
>> I checked if there are any open files using lsof. There are
>> not any open files.
>>
>> *Recovery:*
>> Just a wild thought 
>> I am using replication factor of 2 and I have two nodes. If I
>> delete complete data on one of the node, will I be able to
>> recover all the data from the active node? 
>> I don't want to pursue this path as I want to find out the
>> root cause of the issue! 
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote:
>>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase
>> disk space. But I think Carlos Alonso might be correct.
>> Running compactions might be the issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> _linkedin.com/in/carlosjuzarterolo
>> _
>> Mobile: +351 91 891 81 00
>>  | Tel: +1 613 565 8696
>> x1649 
>> www.pythian.com 
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso
>> > > wrote:
>>
>> I'd have a look also at possible running compactions.
>>
>> If you have big column families with STCS then large
>> compactions may be happening.
>>
>> Check it with nodetool compactionstats
>>
>> Carlos Alonso | Software 

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Rolo
You can check if the snapshot exists in the snapshot folder.
Repairs stream sstables over, than can temporary increase disk space. But I
think Carlos Alonso might be correct. Running compactions might be the
issue.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso  wrote:

> I'd have a look also at possible running compactions.
>
> If you have big column families with STCS then large compactions may be
> happening.
>
> Check it with nodetool compactionstats
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
>
>> Have you tried restarting? It's possible there's open file handles to
>> sstables that have been compacted away. You can verify by doing lsof and
>> grepping for DEL or deleted.
>>
>> If it's not that, you can run nodetool cleanup on each node to scan all
>> of the sstables on disk and remove anything that it's not responsible for.
>> Generally this would only work if you added nodes recently.
>>
>>
>> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
>>
>>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>>
>>> The load factor on the nodes is around 350Gb
>>>
>>> Datacenter: Cassandra
>>> ==
>>> Address  RackStatus State   LoadOwns
>>>Token
>>>
>>> -5072018636360415943
>>> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>>> -7068746880841807701
>>> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>>> -5072018636360415943
>>>
>>> However,if I use df -h,
>>>
>>> /dev/xvdf   252G  223G   17G  94% /HDD1
>>> /dev/xvdg   493G  456G   12G  98% /HDD2
>>> /dev/xvdh   197G  167G   21G  90% /HDD3
>>>
>>>
>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one
>>> of the machine and in another machine it is close to 650Gb.
>>>
>>> I started repair 2 days ago, after running repair, the amount of disk
>>> space consumption has actually increased.
>>> I also checked if this is because of snapshots. nodetool listsnapshot
>>> intermittently lists a snapshot but it goes away after sometime.
>>>
>>> Can somebody please help me understand,
>>> 1. why so much disk space is consumed?
>>> 2. Why did it increase after repair?
>>> 3. Is there any way to recover from this state.
>>>
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>

-- 


--





Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Carlos Alonso
I'd have a look also at possible running compactions.

If you have big column families with STCS then large compactions may be
happening.

Check it with nodetool compactionstats

Carlos Alonso | Software Engineer | @calonso 

On 13 January 2016 at 05:22, Kevin O'Connor  wrote:

> Have you tried restarting? It's possible there's open file handles to
> sstables that have been compacted away. You can verify by doing lsof and
> grepping for DEL or deleted.
>
> If it's not that, you can run nodetool cleanup on each node to scan all of
> the sstables on disk and remove anything that it's not responsible for.
> Generally this would only work if you added nodes recently.
>
>
> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
>
>> We have a 2 node Cassandra cluster with a replication factor of 2.
>>
>> The load factor on the nodes is around 350Gb
>>
>> Datacenter: Cassandra
>> ==
>> Address  RackStatus State   LoadOwns
>>Token
>>
>>   -5072018636360415943
>> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>>   -7068746880841807701
>> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>>   -5072018636360415943
>>
>> However,if I use df -h,
>>
>> /dev/xvdf   252G  223G   17G  94% /HDD1
>> /dev/xvdg   493G  456G   12G  98% /HDD2
>> /dev/xvdh   197G  167G   21G  90% /HDD3
>>
>>
>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one
>> of the machine and in another machine it is close to 650Gb.
>>
>> I started repair 2 days ago, after running repair, the amount of disk
>> space consumption has actually increased.
>> I also checked if this is because of snapshots. nodetool listsnapshot
>> intermittently lists a snapshot but it goes away after sometime.
>>
>> Can somebody please help me understand,
>> 1. why so much disk space is consumed?
>> 2. Why did it increase after repair?
>> 3. Is there any way to recover from this state.
>>
>>
>> Thanks,
>> Rahul
>>
>>


Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Rahul Ramesh
Thanks for your suggestion.

Compaction was happening on one of the large tables. The disk space did not
decrease much after the compaction. So I ran an external compaction. The
disk space decreased by around 10%. However it is still consuming close to
750Gb for load of 250Gb.

I even restarted cassandra thinking there may be some open files. However
it didnt help much.

Is there any way to find out why so much of data is being consumed?

I checked if there are any open files using lsof. There are not any open
files.

*Recovery:*
Just a wild thought
I am using replication factor of 2 and I have two nodes. If I delete
complete data on one of the node, will I be able to recover all the data
from the active node?
I don't want to pursue this path as I want to find out the root cause of
the issue!


Any help will be greatly appreciated

Thank you,

Rahul






On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:

> You can check if the snapshot exists in the snapshot folder.
> Repairs stream sstables over, than can temporary increase disk space. But
> I think Carlos Alonso might be correct. Running compactions might be the
> issue.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso  wrote:
>
>> I'd have a look also at possible running compactions.
>>
>> If you have big column families with STCS then large compactions may be
>> happening.
>>
>> Check it with nodetool compactionstats
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
>>
>>> Have you tried restarting? It's possible there's open file handles to
>>> sstables that have been compacted away. You can verify by doing lsof and
>>> grepping for DEL or deleted.
>>>
>>> If it's not that, you can run nodetool cleanup on each node to scan all
>>> of the sstables on disk and remove anything that it's not responsible for.
>>> Generally this would only work if you added nodes recently.
>>>
>>>
>>> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
>>>
 We have a 2 node Cassandra cluster with a replication factor of 2.

 The load factor on the nodes is around 350Gb

 Datacenter: Cassandra
 ==
 Address  RackStatus State   LoadOwns
  Token

 -5072018636360415943
 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
 -7068746880841807701
 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
 -5072018636360415943

 However,if I use df -h,

 /dev/xvdf   252G  223G   17G  94% /HDD1
 /dev/xvdg   493G  456G   12G  98% /HDD2
 /dev/xvdh   197G  167G   21G  90% /HDD3


 HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in
 one of the machine and in another machine it is close to 650Gb.

 I started repair 2 days ago, after running repair, the amount of disk
 space consumption has actually increased.
 I also checked if this is because of snapshots. nodetool listsnapshot
 intermittently lists a snapshot but it goes away after sometime.

 Can somebody please help me understand,
 1. why so much disk space is consumed?
 2. Why did it increase after repair?
 3. Is there any way to recover from this state.


 Thanks,
 Rahul


>>
>
> --
>
>
>
>


Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Jan Kesten
Hi Rahul,

just an idea, did you have a look at the data directorys on disk 
(/var/lib/cassandra/data)? It could be that there are some from old keyspaces 
that have been deleted and snapshoted before. Try something like "du -sh 
/var/lib/cassandra/data/*" to verify which keyspace is consuming your space.

Jan

Von meinem iPhone gesendet

> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh :
> 
> Thanks for your suggestion. 
> 
> Compaction was happening on one of the large tables. The disk space did not 
> decrease much after the compaction. So I ran an external compaction. The disk 
> space decreased by around 10%. However it is still consuming close to 750Gb 
> for load of 250Gb. 
> 
> I even restarted cassandra thinking there may be some open files. However it 
> didnt help much. 
> 
> Is there any way to find out why so much of data is being consumed? 
> 
> I checked if there are any open files using lsof. There are not any open 
> files.
> 
> Recovery:
> Just a wild thought 
> I am using replication factor of 2 and I have two nodes. If I delete complete 
> data on one of the node, will I be able to recover all the data from the 
> active node? 
> I don't want to pursue this path as I want to find out the root cause of the 
> issue! 
> 
> 
> Any help will be greatly appreciated
> 
> Thank you,
> 
> Rahul
> 
> 
> 
> 
> 
> 
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But I 
>> think Carlos Alonso might be correct. Running compactions might be the issue.
>> 
>> Regards,
>> 
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>> 
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>> 
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso  wrote:
>>> I'd have a look also at possible running compactions.
>>> 
>>> If you have big column families with STCS then large compactions may be 
>>> happening.
>>> 
>>> Check it with nodetool compactionstats
>>> 
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
 On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
 Have you tried restarting? It's possible there's open file handles to 
 sstables that have been compacted away. You can verify by doing lsof and 
 grepping for DEL or deleted. 
 
 If it's not that, you can run nodetool cleanup on each node to scan all of 
 the sstables on disk and remove anything that it's not responsible for. 
 Generally this would only work if you added nodes recently. 
 
 
> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
> We have a 2 node Cassandra cluster with a replication factor of 2. 
> 
> The load factor on the nodes is around 350Gb
> 
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns  
>   Token   
>   
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%   
>   -7068746880841807701   
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%   
>   -5072018636360415943
> 
> However,if I use df -h, 
> 
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
> 
> 
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one 
> of the machine and in another machine it is close to 650Gb. 
> 
> I started repair 2 days ago, after running repair, the amount of disk 
> space consumption has actually increased. 
> I also checked if this is because of snapshots. nodetool listsnapshot 
> intermittently lists a snapshot but it goes away after sometime. 
> 
> Can somebody please help me understand, 
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
> 
> 
> Thanks,
> Rahul
>> 
>> 
>> --
>> 
> 


Cassandra is consuming a lot of disk space

2016-01-12 Thread Rahul Ramesh
We have a 2 node Cassandra cluster with a replication factor of 2.

The load factor on the nodes is around 350Gb

Datacenter: Cassandra
==
Address  RackStatus State   LoadOwns
 Token

-5072018636360415943
172.31.7.91  rack1   Up Normal  328.5 GB100.00%
-7068746880841807701
172.31.7.92  rack1   Up Normal  351.7 GB100.00%
-5072018636360415943

However,if I use df -h,

/dev/xvdf   252G  223G   17G  94% /HDD1
/dev/xvdg   493G  456G   12G  98% /HDD2
/dev/xvdh   197G  167G   21G  90% /HDD3


HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one of
the machine and in another machine it is close to 650Gb.

I started repair 2 days ago, after running repair, the amount of disk space
consumption has actually increased.
I also checked if this is because of snapshots. nodetool listsnapshot
intermittently lists a snapshot but it goes away after sometime.

Can somebody please help me understand,
1. why so much disk space is consumed?
2. Why did it increase after repair?
3. Is there any way to recover from this state.


Thanks,
Rahul


Re: Cassandra is consuming a lot of disk space

2016-01-12 Thread Kevin O'Connor
Have you tried restarting? It's possible there's open file handles to
sstables that have been compacted away. You can verify by doing lsof and
grepping for DEL or deleted.

If it's not that, you can run nodetool cleanup on each node to scan all of
the sstables on disk and remove anything that it's not responsible for.
Generally this would only work if you added nodes recently.

On Tuesday, January 12, 2016, Rahul Ramesh  wrote:

> We have a 2 node Cassandra cluster with a replication factor of 2.
>
> The load factor on the nodes is around 350Gb
>
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns
>  Token
>
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>   -7068746880841807701
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>   -5072018636360415943
>
> However,if I use df -h,
>
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
>
>
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one
> of the machine and in another machine it is close to 650Gb.
>
> I started repair 2 days ago, after running repair, the amount of disk
> space consumption has actually increased.
> I also checked if this is because of snapshots. nodetool listsnapshot
> intermittently lists a snapshot but it goes away after sometime.
>
> Can somebody please help me understand,
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
>
>
> Thanks,
> Rahul
>
>