Re: 8 million Cassandra data files on disk

Jonathan Ellis Tue, 02 Aug 2011 14:46:16 -0700

I stand corrected.  There are several dozen reasons to upgrade, AND that one. :)


On Tue, Aug 2, 2011 at 4:42 PM, Yiming Sun <yiming....@gmail.com> wrote:
> Hi Jonathan,
>
> Good to know.  We will certainly upgrade to 0.7.8.
>
> Also, here is the link to that post I came across earlier:
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Files-not-deleted-after-compaction-and-GCed-td5960453.html
>
> best,
>
> -- Y.
>
> On Tue, Aug 2, 2011 at 5:36 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> I don't remember a removing-compacted-files bug in 0.7.0, but you
>> should absolutely upgrade to 0.7.8 for several dozen other fixes,
>> including some severe ones -- see NEWS.txt.
>>
>> On Tue, Aug 2, 2011 at 4:29 PM, Yiming Sun <yiming....@gmail.com> wrote:
>> > Hi Jeremiah,
>> >
>> > Thank you for the information - it certainly is a relief.  Two questions
>> > though:
>> >
>> > 1. I came across an old thread which seemed to be saying 0.7.0 cassandra
>> > has
>> > a bug and doesn't remove these compact files properly.  Should we
>> > upgrade to
>> > a newer version that has this bug fixed?
>> >
>> > 2. Do we must do the garbage collection via Jconsole manually?  Is there
>> > anyway I can force the GC in our code? (we are using Hector as our java
>> > client).
>> >
>> > Thanks!
>> >
>> >
>> >
>> > On Tue, Aug 2, 2011 at 5:19 PM, Jeremiah Jordan
>> > <jeremiah.jor...@morningstar.com> wrote:
>> >>
>> >> Connect with jconsole and run garbage collection.
>> >> All of the files that have a -Compacted with the same name will get
>> >> deleted the next time a full garbage collection runs, or when the node
>> >> is restarted.  They have already been combined into new files, the old
>> >> ones just haven't been deleted yet.
>> >>
>> >> On Tue, 2011-08-02 at 16:09 -0400, Yiming Sun wrote:
>> >> > Hi,
>> >> >
>> >> > I am new to Cassandra, and am hoping someone could help me understand
>> >> > the (large amount of small) data files on disk that Cassandra
>> >> > generates.
>> >> >
>> >> > The reason we are using Cassandra is because we are dealing with
>> >> > thousands to millions of small text files on disk, so we are
>> >> > experimenting with Cassandra hoping that by dropping the files
>> >> > contents into Cassandra, it will achieve more efficient disk usage
>> >> > because Cassandra is going to aggregate them into bigger files (one
>> >> > file per column family, according to the wiki).
>> >> >
>> >> > But after we pushed a subset of the files into a single node
>> >> > Cassandra
>> >> > v0.7.0 instance, we noted that in the Cassandra data directory for
>> >> > the
>> >> > keyspace, there are 8.5 million very small files, most are named
>> >> >
>> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Filter.db
>> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Compacted.db
>> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Index.db
>> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Statistics.db
>> >> >
>> >> > and among these files, the Compacted.db are always empty,  Filter and
>> >> > Index are under 100 bytes, and Statistics are around 4k.
>> >> >
>> >> > What are these files? Why are there so many of them?  We originally
>> >> > hope that Cassandra was going to solve our issue with the small files
>> >> > we have, but now it doesn't seem to help -- we still end up with tons
>> >> > of small files.   Is there any way to reduce/combine these small
>> >> > files?
>> >> >
>> >> > Thanks.
>> >> >
>> >> > -- Y.
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 8 million Cassandra data files on disk

Reply via email to