You're the only one I see in the thread that's made any reference to HDFS.
The OP even noted that his question is about C*, not HDFS.

On Tue, May 30, 2017 at 2:59 PM daemeon reiydelle <daeme...@gmail.com>
wrote:

> Did you notice that HDFS is the distributed file system used?
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
> *“All men dream, but not equally. Those who dream by night in the dusty
> recesses of their minds wake up in the day to find it was vanity, but the
> dreamers of the day are dangerous men, for they may act their dreams with
> open eyes, to make it possible.” — T.E. Lawrence*
>
>
> On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> This isn't an HDFS mailing list.
>>
>> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
>>> node. Depends somewhat on whether there is a mix of more and less
>>> frequently accessed data. But even storing only hot data, never saw
>>> anything less than 20tb hdfs per node.
>>>
>>>
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>>
>>> *“All men dream, but not equally. Those who dream by night in the dusty
>>> recesses of their minds wake up in the day to find it was vanity, but the
>>> dreamers of the day are dangerous men, for they may act their dreams with
>>> open eyes, to make it possible.” — T.E. Lawrence*
>>>
>>>
>>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com>
>>> wrote:
>>>
>>>> Am I the only one thinking 3TB is way too much data for a single node
>>>> on a VM?
>>>>
>>>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <
>>>> dan...@sendwithus.com> wrote:
>>>>
>>>>> I don't believe incremental repair is enabled, I have never enabled it
>>>>> on the cluster, and unless it's the default then it is off. Also I don't
>>>>> see a setting in cassandra.yaml for it.
>>>>>
>>>>>
>>>>>
>>>>> On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Unless there is a bug, snapshots are excluded (they are not HDFS
>>>>>> anyway!) from nodetool status.
>>>>>>
>>>>>> Out of curiousity, is incremenatal repair enabled? This is almost
>>>>>> certainly a rat hole, but there was an issue a few releases back where 
>>>>>> load
>>>>>> would only increase until the node was restarted. Had been fixed ages 
>>>>>> ago,
>>>>>> but wondering what happens if you restart a node, IF you have incremental
>>>>>> enabled.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>>> <+1%20415-501-0198>London (+44) (0) 20 8144 9872 
>>>>>> <+44%2020%208144%209872>*
>>>>>>
>>>>>>
>>>>>> *“All men dream, but not equally. Those who dream by night in the
>>>>>> dusty recesses of their minds wake up in the day to find it was vanity, 
>>>>>> but
>>>>>> the dreamers of the day are dangerous men, for they may act their dreams
>>>>>> with open eyes, to make it possible.” — T.E. Lawrence*
>>>>>>
>>>>>>
>>>>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com>
>>>>>> wrote:
>>>>>>
>>>>>> Can you please check if you have incremental backup enabled and
>>>>>> snapshots are occupying the space.
>>>>>>
>>>>>> run nodetool clearsnapshot command.
>>>>>>
>>>>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>>>>>> dan...@sendwithus.com> wrote:
>>>>>>
>>>>>> It's 3-4TB per node, and by load rises, I'm talking about load as
>>>>>> reported by nodetool status.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> When you say "the load rises ... ", could you clarify what you mean
>>>>>> by "load"? That has a specific Linux term, and in e.g. Cloudera Manager.
>>>>>> But in neither case would that be relevant to transient or persisted 
>>>>>> disk.
>>>>>> Am I missing something?
>>>>>>
>>>>>>
>>>>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <
>>>>>> tbarbu...@gmail.com> wrote:
>>>>>>
>>>>>> 3-4 TB per node or in total?
>>>>>>
>>>>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <
>>>>>> dan...@sendwithus.com> wrote:
>>>>>>
>>>>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>>>>
>>>>>>
>>>>>>
>>>>>> On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com>
>>>>>> wrote:
>>>>>>
>>>>>> The cluster is running with RF=3, right now each node is storing
>>>>>> about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8
>>>>>> vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 
>>>>>> ssd
>>>>>> ebs volumes with 10k iops. I guess this brings up the question of what's 
>>>>>> a
>>>>>> good marker to decide on whether to increase disk space vs provisioning a
>>>>>> new node?
>>>>>>
>>>>>>
>>>>>> On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> This is not normal. Possibly a capacity problem. Whats the RF, how
>>>>>> much data do you store per node and what kind of servers do you use (core
>>>>>> count, RAM, disk, ...)?
>>>>>>
>>>>>> Cheers,
>>>>>> Tommaso
>>>>>>
>>>>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <
>>>>>> dan...@sendwithus.com> wrote:
>>>>>>
>>>>>>
>>>>>> I am running a 6 node cluster, and I have noticed that the reported
>>>>>> load on each node rises throughout the week and grows way past the actual
>>>>>> disk space used and available on each node. Also eventually latency for
>>>>>> operations suffers and the nodes have to be restarted. A couple questions
>>>>>> on this, is this normal? Also does cassandra need to be restarted every 
>>>>>> few
>>>>>> days for best performance? Any insight on this behaviour would be 
>>>>>> helpful.
>>>>>>
>>>>>> Cheers,
>>>>>> Daniel
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>>>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>>>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Reply via email to