Hi,

I think you could try to set the limit for the number of open files to
unlimited and see how it goes when you start tablet server.

I think the best way forward is to add tablet servers into the cluster.
Ideally, you want to have your data replicated, consider creating tables
with replication factor 3 and having at least 4 tablet servers in your
cluster.  Once you added a new tablet servers, don't forget to run the
rebalancer tool (kudu cluster rebalance ...)


HTH,

Alexey

On Mon, Oct 7, 2019 at 2:31 AM Faraz Mateen <fmat...@an10.io> wrote:

> Alexey,
>
> Thank you for the response. Having too many partitions is exactly what the
> problem is. When I restart the tserver, it tries to open files against each
> tablet and eventually crashes.
>
> Is there a way to get around this and recover my data? Is there any config
> I can change to run the tserver? Or can I add a new tablet server and
> migrate existing tablets?
>
> On Sat, Oct 5, 2019 at 10:05 PM Alexey Serbin <aser...@cloudera.com>
> wrote:
>
>> Hi,
>>
>> Most likely the issue happened because of high number of tablet replicas
>> at the tablet server.  In case of high spike of in the input data rate,
>> higher compaction activity might require more than usual number of file
>> descriptors, since more files are opened.
>>
>> How many tablet replicas does that tablet server have?  It's not
>> recommended to have too many:
>> https://kudu.apache.org/docs/known_issues.html#_scale
>>
>> To understand what has happened, you need to take a look into the logs of
>> the tablet server.  This might be useful:
>> https://kudu.apache.org/docs/troubleshooting.html
>>
>> Overall, if there is only one (?) tablet server in the whole Kudu
>> cluster, why to have 39 partitions per table?  I guess that's some sort of
>> proof-of-concept/toy setup, but anyways.  Since all the tablet replicas end
>> up at the same single tablet server, I don't see benefits from partitioning
>> in that setup.  For the tablet server, it simply means x-times increased
>> number of open file descriptors and increased memory usage.
>>
>>
>> Kind regards,
>>
>> Alexey
>>
>> On Fri, Oct 4, 2019 at 4:21 AM Faraz Mateen <fmat...@an10.io> wrote:
>>
>>> Hi all,
>>>
>>> I am facing a problem with my kudu setup where tablet server crashes
>>> with "too many open files" error.
>>> The setup consists of a single master and a single tablet server. Tables
>>> created are such that there are 39 partitions per table. However not all
>>> partitions have data that corresponds to them.
>>> Yesterday my tserver crashed and when I am trying to restart the
>>> tserver, it fails with the error:
>>>
>>> I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
>>> cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
>>> Registered tablet (data state: TABLET_DATA_READY)
>>> W1004 03:50:39.923184  5687 os-util.cc:165] could not read
>>> /proc/self/status: IO error: /proc/self/status: Too many open files (error
>>> 24)
>>> I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
>>> d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
>>> Registered tablet (data state: TABLET_DATA_READY)
>>>
>>> I have already modified ulimit of the machine:
>>>
>>> root@vm-3:~# ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 63923
>>> max locked memory       (kbytes, -l) 16384
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 65535
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 8192
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 65535
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>>
>>> *Set up Details:*
>>> Single master and tserver setup on a single VM.
>>> 4 cores, 550GB hard disk, 16GB RAM
>>> Kudu version 1.8 on ubuntu, installed through debian packages.
>>> Before crash, data was being inserted in kudu at a very high rate. RAM
>>> usage was around 87% and disk usage was around 84 percent.
>>>
>>> Here is what I have tried so far:
>>> 1- Set ulimit -n to 65535.
>>> 2- Reboot the vm to get rid of stale processes.
>>> 3- Set block_manager_max_open_files to 32000 in tserver flag file.
>>>
>>> What I want to know now is:
>>> 1- Why am I hitting this problem? Is this due to low resources on the VM
>>> or high number of tablets on a single tserver?
>>> 2- How can I get around this problem, recover my data and kudu services?
>>>
>>> Would really appreciate some help on this.
>>> --
>>> Faraz Mateen
>>>
>>
>
> --
> Faraz Mateen
>

Reply via email to