You're the only one I see in the thread that's made any reference to HDFS. The OP even noted that his question is about C*, not HDFS.
On Tue, May 30, 2017 at 2:59 PM daemeon reiydelle <daeme...@gmail.com> wrote: > Did you notice that HDFS is the distributed file system used? > > > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London > (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* > > > *“All men dream, but not equally. Those who dream by night in the dusty > recesses of their minds wake up in the day to find it was vanity, but the > dreamers of the day are dangerous men, for they may act their dreams with > open eyes, to make it possible.” — T.E. Lawrence* > > > On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> This isn't an HDFS mailing list. >> >> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle <daeme...@gmail.com> >> wrote: >> >>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs >>> node. Depends somewhat on whether there is a mix of more and less >>> frequently accessed data. But even storing only hot data, never saw >>> anything less than 20tb hdfs per node. >>> >>> >>> >>> >>> >>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London >>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* >>> >>> >>> *“All men dream, but not equally. Those who dream by night in the dusty >>> recesses of their minds wake up in the day to find it was vanity, but the >>> dreamers of the day are dangerous men, for they may act their dreams with >>> open eyes, to make it possible.” — T.E. Lawrence* >>> >>> >>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com> >>> wrote: >>> >>>> Am I the only one thinking 3TB is way too much data for a single node >>>> on a VM? >>>> >>>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol < >>>> dan...@sendwithus.com> wrote: >>>> >>>>> I don't believe incremental repair is enabled, I have never enabled it >>>>> on the cluster, and unless it's the default then it is off. Also I don't >>>>> see a setting in cassandra.yaml for it. >>>>> >>>>> >>>>> >>>>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com> >>>>> wrote: >>>>> >>>>>> Unless there is a bug, snapshots are excluded (they are not HDFS >>>>>> anyway!) from nodetool status. >>>>>> >>>>>> Out of curiousity, is incremenatal repair enabled? This is almost >>>>>> certainly a rat hole, but there was an issue a few releases back where >>>>>> load >>>>>> would only increase until the node was restarted. Had been fixed ages >>>>>> ago, >>>>>> but wondering what happens if you restart a node, IF you have incremental >>>>>> enabled. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >>>>>> <+1%20415-501-0198>London (+44) (0) 20 8144 9872 >>>>>> <+44%2020%208144%209872>* >>>>>> >>>>>> >>>>>> *“All men dream, but not equally. Those who dream by night in the >>>>>> dusty recesses of their minds wake up in the day to find it was vanity, >>>>>> but >>>>>> the dreamers of the day are dangerous men, for they may act their dreams >>>>>> with open eyes, to make it possible.” — T.E. Lawrence* >>>>>> >>>>>> >>>>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> >>>>>> wrote: >>>>>> >>>>>> Can you please check if you have incremental backup enabled and >>>>>> snapshots are occupying the space. >>>>>> >>>>>> run nodetool clearsnapshot command. >>>>>> >>>>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol < >>>>>> dan...@sendwithus.com> wrote: >>>>>> >>>>>> It's 3-4TB per node, and by load rises, I'm talking about load as >>>>>> reported by nodetool status. >>>>>> >>>>>> >>>>>> >>>>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> When you say "the load rises ... ", could you clarify what you mean >>>>>> by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. >>>>>> But in neither case would that be relevant to transient or persisted >>>>>> disk. >>>>>> Am I missing something? >>>>>> >>>>>> >>>>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli < >>>>>> tbarbu...@gmail.com> wrote: >>>>>> >>>>>> 3-4 TB per node or in total? >>>>>> >>>>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol < >>>>>> dan...@sendwithus.com> wrote: >>>>>> >>>>>> I should also mention that I am running cassandra 3.10 on the cluster >>>>>> >>>>>> >>>>>> >>>>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com> >>>>>> wrote: >>>>>> >>>>>> The cluster is running with RF=3, right now each node is storing >>>>>> about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 >>>>>> vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 >>>>>> ssd >>>>>> ebs volumes with 10k iops. I guess this brings up the question of what's >>>>>> a >>>>>> good marker to decide on whether to increase disk space vs provisioning a >>>>>> new node? >>>>>> >>>>>> >>>>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hi Daniel, >>>>>> >>>>>> This is not normal. Possibly a capacity problem. Whats the RF, how >>>>>> much data do you store per node and what kind of servers do you use (core >>>>>> count, RAM, disk, ...)? >>>>>> >>>>>> Cheers, >>>>>> Tommaso >>>>>> >>>>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol < >>>>>> dan...@sendwithus.com> wrote: >>>>>> >>>>>> >>>>>> I am running a 6 node cluster, and I have noticed that the reported >>>>>> load on each node rises throughout the week and grows way past the actual >>>>>> disk space used and available on each node. Also eventually latency for >>>>>> operations suffers and the nodes have to be restarted. A couple questions >>>>>> on this, is this normal? Also does cassandra need to be restarted every >>>>>> few >>>>>> days for best performance? Any insight on this behaviour would be >>>>>> helpful. >>>>>> >>>>>> Cheers, >>>>>> Daniel >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For >>>>>> additional commands, e-mail: user-h...@cassandra.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For >>>>>> additional commands, e-mail: user-h...@cassandra.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >