I stand corrected. There are several dozen reasons to upgrade, AND that one. :)
On Tue, Aug 2, 2011 at 4:42 PM, Yiming Sun <yiming....@gmail.com> wrote: > Hi Jonathan, > > Good to know. We will certainly upgrade to 0.7.8. > > Also, here is the link to that post I came across earlier: > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Files-not-deleted-after-compaction-and-GCed-td5960453.html > > best, > > -- Y. > > On Tue, Aug 2, 2011 at 5:36 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> I don't remember a removing-compacted-files bug in 0.7.0, but you >> should absolutely upgrade to 0.7.8 for several dozen other fixes, >> including some severe ones -- see NEWS.txt. >> >> On Tue, Aug 2, 2011 at 4:29 PM, Yiming Sun <yiming....@gmail.com> wrote: >> > Hi Jeremiah, >> > >> > Thank you for the information - it certainly is a relief. Two questions >> > though: >> > >> > 1. I came across an old thread which seemed to be saying 0.7.0 cassandra >> > has >> > a bug and doesn't remove these compact files properly. Should we >> > upgrade to >> > a newer version that has this bug fixed? >> > >> > 2. Do we must do the garbage collection via Jconsole manually? Is there >> > anyway I can force the GC in our code? (we are using Hector as our java >> > client). >> > >> > Thanks! >> > >> > >> > >> > On Tue, Aug 2, 2011 at 5:19 PM, Jeremiah Jordan >> > <jeremiah.jor...@morningstar.com> wrote: >> >> >> >> Connect with jconsole and run garbage collection. >> >> All of the files that have a -Compacted with the same name will get >> >> deleted the next time a full garbage collection runs, or when the node >> >> is restarted. They have already been combined into new files, the old >> >> ones just haven't been deleted yet. >> >> >> >> On Tue, 2011-08-02 at 16:09 -0400, Yiming Sun wrote: >> >> > Hi, >> >> > >> >> > I am new to Cassandra, and am hoping someone could help me understand >> >> > the (large amount of small) data files on disk that Cassandra >> >> > generates. >> >> > >> >> > The reason we are using Cassandra is because we are dealing with >> >> > thousands to millions of small text files on disk, so we are >> >> > experimenting with Cassandra hoping that by dropping the files >> >> > contents into Cassandra, it will achieve more efficient disk usage >> >> > because Cassandra is going to aggregate them into bigger files (one >> >> > file per column family, according to the wiki). >> >> > >> >> > But after we pushed a subset of the files into a single node >> >> > Cassandra >> >> > v0.7.0 instance, we noted that in the Cassandra data directory for >> >> > the >> >> > keyspace, there are 8.5 million very small files, most are named >> >> > >> >> > <SuperColumnFamilyName>-e-<nnnnn>.Filter.db >> >> > <SuperColumnFamilyName>-e-<nnnnn>.Compacted.db >> >> > <SuperColumnFamilyName>-e-<nnnnn>.Index.db >> >> > <SuperColumnFamilyName>-e-<nnnnn>.Statistics.db >> >> > >> >> > and among these files, the Compacted.db are always empty, Filter and >> >> > Index are under 100 bytes, and Statistics are around 4k. >> >> > >> >> > What are these files? Why are there so many of them? We originally >> >> > hope that Cassandra was going to solve our issue with the small files >> >> > we have, but now it doesn't seem to help -- we still end up with tons >> >> > of small files. Is there any way to reduce/combine these small >> >> > files? >> >> > >> >> > Thanks. >> >> > >> >> > -- Y. >> >> >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com