Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-19 Thread paveldimow
Thank you Sean,

I will try to optimize things in our setup and will monitor 
https://github.com/influxdata/influxdb/issues/7142 for any updates.

Thank you.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/44c8aa2f-8151-4b5f-8788-c7064c754833%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-18 Thread Sean Beckett
Pavel,

The official recommendations are rough guidelines only. A use case storing
a 10KB string in a field will need far more RAM and storage than a use case
storing a single integer. A query over the previous 5 seconds will use much
less CPU and RAM than a query covering the past month.

There is currently no way to affect the timing of TSM compactions. Follow
https://github.com/influxdata/influxdb/issues/7474 for possible progress.

> CQ runs every 30 minutes and downsamples data

Over what time frame? If the CQ is running over the previous 30 minutes,
that's about 280k points. If it's running over the previous day, that's
13.5 million points. Huge difference in the RAM needed.

On Tue, Oct 18, 2016 at 4:20 AM, Pavel Dimow  wrote:

> Hi Sean,
>
> our CQ runs every 30 minutes and yes I guess it's complex (I can post it
> here) but our box CPU and RAM has been set according to official
> recommendations. Also I can't find a way to reschedule TSM compactions
> which I think could solve the issue.
>
> On Mon, Oct 17, 2016 at 11:30 PM, Sean Beckett  wrote:
>
>> It looks like the combination of the CQ running at 20:00 and the TSM
>> compactions also kicking off at 20:00 are consuming the available RAM on
>> the box. As Mathias mentioned, a CQ that takes 3.5 minutes to complete
>> sounds very RAM intensive. Either add RAM or reduce the complexity of that
>> CQ so it can complete with less RAM.
>>
>> On Mon, Oct 17, 2016 at 2:50 PM, Mathias Herberts <
>> mathias.herbe...@gmail.com> wrote:
>>
>>> Your CQ completed in 3m27s, does it manipulate a very large amount of
>>> data?
>>>
>>>
>>> On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com
>>> wrote:

 Heh believe it or not once again I got an OOM error! And it's becomes
 really 'funny' that it happens at the same time? Look at this:

 Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer:
 gfp_mask=0x3000d0, order=2, oom_score_adj=0

 When I look at InfluxDB log I see this:

 Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter
 [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1"
 204 0
 "-" "-" 0252f60c-9494-11e6-b227- 239168
 Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter
 [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1"
 204 0
 "-" "-" 044496f0-9494-11e6-b228- 198616
 Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter
 [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1"
 204 0
 "-" "-" 0655bab6-9494-11e6-b229- 141794
 Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting
 cache for /var/lib/influxdb/data/macdb/seven_days/579
 Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37
 finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to
 2016-10-17 20
 :00:00 +0200 CEST) in 3m37.079684067s
 Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting
 cache for /var/lib/influxdb/data/macdb/three_months/575
 Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for
 path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in
 368.123875m
 s
 Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for
 path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in
 485.55333
 4ms
 Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
 /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
 Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
 /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
 Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
 /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
 Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for
 path /var/lib/influxdb/data/macdb/seven_days/579 written in
 25.602303689s
 Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning
 level 1 compaction of group 0, 2 TSM files
 Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
 level 1 group (0) /var/lib/influxdb/data/macdb/s
 even_days/579/00359-00
 001.tsm (#0)
 Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
 level 1 group (0) /var/lib/influxdb/data/macdb/s
 even_days/579/00360-00
 001.tsm (#1)
 Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
 [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
 204 0
 "-" "-" 1616a065-9494-11e6-b22f- 152030
 Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
 [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
 204 0
 "-" "-" 161979a7-9494-11e6-b238- 133364
 Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-18 Thread Sean Beckett
> 40k metrics every 5 minutes. Every metric has 2 tags and around 70 fields.

4 points every 300 seconds, each with 70 fields. That's about 10k
values per second. If the load is coming all at once every 300 seconds,
that might cause RAM spikes. Ideally the batches can be evened out to avoid
spikes.

On Tue, Oct 18, 2016 at 3:34 AM, Pavel Dimow  wrote:

> Hi Mathias,
>
> what's to be considered large amount of data in InfluxDB? I do have around
> 40k metrics every 5 minutes. Every metric has 2 tags and around 70 fields.
> CQ runs every 30 minutes and downsamples data (I keep 7 days raw data and 3
> months downsampled data). Is that too much for a 40GB RAM box?
>
> On Mon, Oct 17, 2016 at 10:50 PM, Mathias Herberts <
> mathias.herbe...@gmail.com> wrote:
>
>> Your CQ completed in 3m27s, does it manipulate a very large amount of
>> data?
>>
>>
>> On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com
>> wrote:
>>>
>>> Heh believe it or not once again I got an OOM error! And it's becomes
>>> really 'funny' that it happens at the same time? Look at this:
>>>
>>> Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer:
>>> gfp_mask=0x3000d0, order=2, oom_score_adj=0
>>>
>>> When I look at InfluxDB log I see this:
>>>
>>> Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 0252f60c-9494-11e6-b227- 239168
>>> Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 044496f0-9494-11e6-b228- 198616
>>> Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 0655bab6-9494-11e6-b229- 141794
>>> Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting
>>> cache for /var/lib/influxdb/data/macdb/seven_days/579
>>> Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37
>>> finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to
>>> 2016-10-17 20
>>> :00:00 +0200 CEST) in 3m37.079684067s
>>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting
>>> cache for /var/lib/influxdb/data/macdb/three_months/575
>>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for
>>> path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in
>>> 368.123875m
>>> s
>>> Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for
>>> path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in
>>> 485.55333
>>> 4ms
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for
>>> path /var/lib/influxdb/data/macdb/seven_days/579 written in
>>> 25.602303689s
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning
>>> level 1 compaction of group 0, 2 TSM files
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>>> even_days/579/00359-00
>>> 001.tsm (#0)
>>> Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>>> even_days/579/00360-00
>>> 001.tsm (#1)
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a065-9494-11e6-b22f- 152030
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 161979a7-9494-11e6-b238- 133364
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 16169ee0-9494-11e6-b22b- 152118
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a01c-9494-11e6-b22e- 153616
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a070-9494-11e6-b230- 153639
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-18 Thread Pavel Dimow
Hi Sean,

our CQ runs every 30 minutes and yes I guess it's complex (I can post it
here) but our box CPU and RAM has been set according to official
recommendations. Also I can't find a way to reschedule TSM compactions
which I think could solve the issue.

On Mon, Oct 17, 2016 at 11:30 PM, Sean Beckett  wrote:

> It looks like the combination of the CQ running at 20:00 and the TSM
> compactions also kicking off at 20:00 are consuming the available RAM on
> the box. As Mathias mentioned, a CQ that takes 3.5 minutes to complete
> sounds very RAM intensive. Either add RAM or reduce the complexity of that
> CQ so it can complete with less RAM.
>
> On Mon, Oct 17, 2016 at 2:50 PM, Mathias Herberts <
> mathias.herbe...@gmail.com> wrote:
>
>> Your CQ completed in 3m27s, does it manipulate a very large amount of
>> data?
>>
>>
>> On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com
>> wrote:
>>>
>>> Heh believe it or not once again I got an OOM error! And it's becomes
>>> really 'funny' that it happens at the same time? Look at this:
>>>
>>> Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer:
>>> gfp_mask=0x3000d0, order=2, oom_score_adj=0
>>>
>>> When I look at InfluxDB log I see this:
>>>
>>> Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 0252f60c-9494-11e6-b227- 239168
>>> Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 044496f0-9494-11e6-b228- 198616
>>> Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 0655bab6-9494-11e6-b229- 141794
>>> Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting
>>> cache for /var/lib/influxdb/data/macdb/seven_days/579
>>> Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37
>>> finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to
>>> 2016-10-17 20
>>> :00:00 +0200 CEST) in 3m37.079684067s
>>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting
>>> cache for /var/lib/influxdb/data/macdb/three_months/575
>>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for
>>> path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in
>>> 368.123875m
>>> s
>>> Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for
>>> path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in
>>> 485.55333
>>> 4ms
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>>> /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for
>>> path /var/lib/influxdb/data/macdb/seven_days/579 written in
>>> 25.602303689s
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning
>>> level 1 compaction of group 0, 2 TSM files
>>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>>> even_days/579/00359-00
>>> 001.tsm (#0)
>>> Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>>> even_days/579/00360-00
>>> 001.tsm (#1)
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a065-9494-11e6-b22f- 152030
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 161979a7-9494-11e6-b238- 133364
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 16169ee0-9494-11e6-b22b- 152118
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a01c-9494-11e6-b22e- 153616
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0
>>> "-" "-" 1616a070-9494-11e6-b230- 153639
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-18 Thread Pavel Dimow
Hi Mathias,

what's to be considered large amount of data in InfluxDB? I do have around
40k metrics every 5 minutes. Every metric has 2 tags and around 70 fields.
CQ runs every 30 minutes and downsamples data (I keep 7 days raw data and 3
months downsampled data). Is that too much for a 40GB RAM box?

On Mon, Oct 17, 2016 at 10:50 PM, Mathias Herberts <
mathias.herbe...@gmail.com> wrote:

> Your CQ completed in 3m27s, does it manipulate a very large amount of data?
>
>
> On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com
> wrote:
>>
>> Heh believe it or not once again I got an OOM error! And it's becomes
>> really 'funny' that it happens at the same time? Look at this:
>>
>> Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer:
>> gfp_mask=0x3000d0, order=2, oom_score_adj=0
>>
>> When I look at InfluxDB log I see this:
>>
>> Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 0252f60c-9494-11e6-b227- 239168
>> Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 044496f0-9494-11e6-b228- 198616
>> Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 0655bab6-9494-11e6-b229- 141794
>> Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting
>> cache for /var/lib/influxdb/data/macdb/seven_days/579
>> Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37
>> finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to
>> 2016-10-17 20
>> :00:00 +0200 CEST) in 3m37.079684067s
>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting
>> cache for /var/lib/influxdb/data/macdb/three_months/575
>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for
>> path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in
>> 368.123875m
>> s
>> Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for
>> path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in
>> 485.55333
>> 4ms
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for
>> path /var/lib/influxdb/data/macdb/seven_days/579 written in 25.602303689s
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning level
>> 1 compaction of group 0, 2 TSM files
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>> even_days/579/00359-00
>> 001.tsm (#0)
>> Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>> even_days/579/00360-00
>> 001.tsm (#1)
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a065-9494-11e6-b22f- 152030
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 161979a7-9494-11e6-b238- 133364
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 16169ee0-9494-11e6-b22b- 152118
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a01c-9494-11e6-b22e- 153616
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a070-9494-11e6-b230- 153639
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 1616d89c-9494-11e6-b237- 152262
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 16169fb1-9494-11e6-b22d- 153706
>> 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-17 Thread Sean Beckett
It looks like the combination of the CQ running at 20:00 and the TSM
compactions also kicking off at 20:00 are consuming the available RAM on
the box. As Mathias mentioned, a CQ that takes 3.5 minutes to complete
sounds very RAM intensive. Either add RAM or reduce the complexity of that
CQ so it can complete with less RAM.

On Mon, Oct 17, 2016 at 2:50 PM, Mathias Herberts <
mathias.herbe...@gmail.com> wrote:

> Your CQ completed in 3m27s, does it manipulate a very large amount of data?
>
>
> On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com
> wrote:
>>
>> Heh believe it or not once again I got an OOM error! And it's becomes
>> really 'funny' that it happens at the same time? Look at this:
>>
>> Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer:
>> gfp_mask=0x3000d0, order=2, oom_score_adj=0
>>
>> When I look at InfluxDB log I see this:
>>
>> Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 0252f60c-9494-11e6-b227- 239168
>> Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 044496f0-9494-11e6-b228- 198616
>> Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 0655bab6-9494-11e6-b229- 141794
>> Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting
>> cache for /var/lib/influxdb/data/macdb/seven_days/579
>> Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37
>> finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to
>> 2016-10-17 20
>> :00:00 +0200 CEST) in 3m37.079684067s
>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting
>> cache for /var/lib/influxdb/data/macdb/three_months/575
>> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for
>> path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in
>> 368.123875m
>> s
>> Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for
>> path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in
>> 485.55333
>> 4ms
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing
>> /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for
>> path /var/lib/influxdb/data/macdb/seven_days/579 written in 25.602303689s
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning level
>> 1 compaction of group 0, 2 TSM files
>> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>> even_days/579/00359-00
>> 001.tsm (#0)
>> Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting
>> level 1 group (0) /var/lib/influxdb/data/macdb/s
>> even_days/579/00360-00
>> 001.tsm (#1)
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a065-9494-11e6-b22f- 152030
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 161979a7-9494-11e6-b238- 133364
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 16169ee0-9494-11e6-b22b- 152118
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a01c-9494-11e6-b22e- 153616
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0
>> "-" "-" 1616a070-9494-11e6-b230- 153639
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 1616d89c-9494-11e6-b237- 152262
>> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter
>> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1"
>> 204 0 "-" "-" 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-17 Thread Mathias Herberts
Your CQ completed in 3m27s, does it manipulate a very large amount of data?

On Monday, October 17, 2016 at 10:43:22 PM UTC+2, pavel...@gmail.com wrote:
>
> Heh believe it or not once again I got an OOM error! And it's becomes 
> really 'funny' that it happens at the same time? Look at this:
>
> Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer: 
> gfp_mask=0x3000d0, order=2, oom_score_adj=0
>
> When I look at InfluxDB log I see this:
>
> Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 0252f60c-9494-11e6-b227- 239168
> Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 044496f0-9494-11e6-b228- 198616
> Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 0655bab6-9494-11e6-b229- 141794
> Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting cache 
> for /var/lib/influxdb/data/macdb/seven_days/579
> Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37 
> finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to 
> 2016-10-17 20
> :00:00 +0200 CEST) in 3m37.079684067s
> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting cache 
> for /var/lib/influxdb/data/macdb/three_months/575
> Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for 
> path /var/lib/influxdb/data/macdb/seven_days/579 deduplicated in 368.123875m
> s
> Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for 
> path /var/lib/influxdb/data/macdb/three_months/575 deduplicated in 485.55333
> 4ms
> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
> /var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
> /var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
> Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
> /var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for 
> path /var/lib/influxdb/data/macdb/seven_days/579 written in 25.602303689s
> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning level 
> 1 compaction of group 0, 2 TSM files
> Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting level 
> 1 group (0) /var/lib/influxdb/data/macdb/seven_days/579/00359-00
> 001.tsm (#0)
> Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting level 
> 1 group (0) /var/lib/influxdb/data/macdb/seven_days/579/00360-00
> 001.tsm (#1)
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 1616a065-9494-11e6-b22f- 152030
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 161979a7-9494-11e6-b238- 133364
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 16169ee0-9494-11e6-b22b- 152118
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 1616a01c-9494-11e6-b22e- 153616
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 
> "-" "-" 1616a070-9494-11e6-b230- 153639
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 16169f73-9494-11e6-b22c- 153690
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 1616d89c-9494-11e6-b237- 152262
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 16169fb1-9494-11e6-b22d- 153706
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 1616c37e-9494-11e6-b234- 152707
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> [17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 
> 204 0 "-" "-" 1616cbfb-9494-11e6-b235- 152636
> Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
> 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-17 Thread paveldimow
Heh believe it or not once again I got an OOM error! And it's becomes really 
'funny' that it happens at the same time? Look at this:

Oct 17 20:04:11 node1 kernel: kthreadd invoked oom-killer: gfp_mask=0x3000d0, 
order=2, oom_score_adj=0

When I look at InfluxDB log I see this:

Oct 17 20:03:29 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:03:29 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 0252f60c-9494-11e6-b227- 239168
Oct 17 20:03:33 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:03:32 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 044496f0-9494-11e6-b228- 198616
Oct 17 20:03:36 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:03:36 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 0655bab6-9494-11e6-b229- 141794
Oct 17 20:03:36 node1 influxd: [tsm1] 2016/10/17 20:03:36 Compacting cache for 
/var/lib/influxdb/data/macdb/seven_days/579
Oct 17 20:03:37 node1 influxd: [continuous_querier] 2016/10/17 20:03:37 
finished continuous query cq_30m (2016-10-17 19:30:00 +0200 CEST to 2016-10-17 
20
:00:00 +0200 CEST) in 3m37.079684067s
Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Compacting cache for 
/var/lib/influxdb/data/macdb/three_months/575
Oct 17 20:03:38 node1 influxd: [tsm1] 2016/10/17 20:03:38 Snapshot for path 
/var/lib/influxdb/data/macdb/seven_days/579 deduplicated in 368.123875m
s
Oct 17 20:03:40 node1 influxd: [tsm1] 2016/10/17 20:03:40 Snapshot for path 
/var/lib/influxdb/data/macdb/three_months/575 deduplicated in 485.55333
4ms
Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
/var/lib/influxdb/wal/macdb/seven_days/579/_01024.wal
Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
/var/lib/influxdb/wal/macdb/seven_days/579/_01025.wal
Oct 17 20:04:02 node1 influxd: [tsm1wal] 2016/10/17 20:04:02 Removing 
/var/lib/influxdb/wal/macdb/seven_days/579/_01026.wal
Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 Snapshot for path 
/var/lib/influxdb/data/macdb/seven_days/579 written in 25.602303689s
Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 beginning level 1 
compaction of group 0, 2 TSM files
Oct 17 20:04:02 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting level 1 
group (0) /var/lib/influxdb/data/macdb/seven_days/579/00359-00
001.tsm (#0)
Oct 17 20:04:03 node1 influxd: [tsm1] 2016/10/17 20:04:02 compacting level 1 
group (0) /var/lib/influxdb/data/macdb/seven_days/579/00360-00
001.tsm (#1)
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616a065-9494-11e6-b22f- 152030
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 161979a7-9494-11e6-b238- 133364
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 16169ee0-9494-11e6-b22b- 152118
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616a01c-9494-11e6-b22e- 153616
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616a070-9494-11e6-b230- 153639
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 16169f73-9494-11e6-b22c- 153690
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 16169f73-9494-11e6-b22c- 153690
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616d89c-9494-11e6-b237- 152262
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 16169fb1-9494-11e6-b22d- 153706
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616c37e-9494-11e6-b234- 152707
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616cbfb-9494-11e6-b235- 152636
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616cc3e-9494-11e6-b236- 152641
Oct 17 20:04:03 node1 influxd: [httpd] 192.168.11.24 - writter 
[17/Oct/2016:20:04:02 +0200] "POST /write?db=macdb=s HTTP/1.1" 204 0 
"-" "-" 1616c2eb-9494-11e6-b233- 153000
Oct 17 20:04:03 node1 influxd: 

Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-14 Thread paveldimow
Hi,

I will try to future investigate this problem in more details. I am pretty sure 
that there should be not timestamp collision but I need little more time to get 
all the relevant data.

Again, thank you very much for your help!

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/3517c8ac-689a-4143-8dc7-d9ce2c6cfb6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-14 Thread paveldimow
Hi,

I have increased shard duration to two weeks for 'three_months', and I will 
monitor RAM usage for the next few days.

Thank you very much for your help.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/7599e33d-9e79-4d80-bfc7-4c2304b978b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-14 Thread dragande
Hi,

I have increased shard durations to two weeks for 'three_months'. We will see 
how this reflect to memory usage.

Thank you very much for your help!

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/c21e2a1e-3518-4e9c-97ad-56d29dedd8a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-14 Thread Sean Beckett
It looks like your CQ does lead to RAM spikes close to the capacity of the
box. Your shard durations are what's tipping the issue, I believe. With 1
day shards in a 90 day retention policy, there are a lot of housecleaning
tasks to do each night at midnight UTC. When each shard expires, the series
index has to be updated and a series of compactions kick off. Compactions
are RAM and CPU intensive.

First recommendation, use ALTER RETENTION POLICY to raise the shard
duration for `three_months` to at least a week, but even a month would be
good. It will reduce the frequency of the TSM compactions, and with fewer
files the compactions will be less resource intensive.

Also, queries should touch as few shards as possible. If you are often
querying for more than 12 hours of data then raising the shard duration
will reduce the RAM needs of those queries.

On Fri, Oct 14, 2016 at 3:29 AM,  wrote:

> Hi Sean,
>
> here is the graph from out NMS about memory usage
>
> https://s18.postimg.org/a6buyzna1/memory.png
>
> and I would say we have spikes, but they are not every 24h but rather
> every 30 minutes and I guess it's because of our CQ we use for
> downsampling. I can post that CQ if that can help.
>
> We are aware of cardinality when we designed our solution and currently we
> have 81761 series which I guess it's quite ok for this amount of RAM.
>
> > SHOW RETENTION POLICIES ON macdb
> namedurationshardGroupDuration  replicaN
> default
> default 0   168h0m0s1
>  false
> seven_days  168h0m0s24h0m0s 1
>  true
> three_months2160h0m0s   24h0m0s 1
>  false
>
>
> Yes, we always have successful writes and we have 204 response returned by
> InfluxDB. By missing the whole measurement I mean that for example we have
> one measurement at 10:00 pm in InfluxDB and in our file (we do write data
> in file for debugging), then we have next measurement in 10:05 pm in both
> InfluxDB and file, the next measurement in 10:10 we are missing in InfluxDB
> but we do have that measurement in file and still we have 204 response
> returned by InfluxDB.
>
> --
> Remember to include the version number!
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to influxdb+unsubscr...@googlegroups.com.
> To post to this group, send email to influxdb@googlegroups.com.
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/ba384d32-335f-4db8-b4f9-f43b584dba25%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvMk%2BgtmKBqYTtq7vZWUCNb_DWSw4BM5aZO5usKi3pevog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [influxdb] InfluxDB restarts every 24 hours and some data is missing

2016-10-13 Thread Sean Beckett
> Now, we have two issues, one is that server restart every 24h due to OOM,
look at this:

Does the RAM use spike every 24 hours, or does it slowly grow?

One of your tags is a MAC address. That has very high cardinality. How many
series are there in your system?
http://docs.influxdata.com/influxdb/v1.0/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter

You also have 71 fields per point. Are you running any CQs to downsample
them?

What are the retention policy settings? (SHOW RETENTION POLICIES ON
)

> The other issue is that some data are missing. For example we see the
whole measurement is missing, while I am pretty sure that it's written
because at the same time when we write to influx we write to a file and we
don't see any errors from influx. We write 1000 measurements in one batch.

Do the InfluxDB logs show successful writes? Is the client receiving a 204
response to the writes?

What does "see the whole measurement is missing" mean? Can you show actual
CLI queries? This could be syntax issues.


On Thu, Oct 13, 2016 at 8:49 AM, Pavel  wrote:

> Hi guys,
>
> I have really strange problem with our Influxdb server. First of all we
> are running the latest version 1.0.2. We use Infuxdb to store some
> performance statistics from around 40k devices. We do this regularly every
> 5 minutes. The server in question has 40GB RAM and 16 CPU's. We keep data
> for 7 days and after that we use CQ to downsample and store it for 3 months.
> Example of one measurement looks like this:
>
> cm,mac=a2faa63e1c00,status=1 host_if="C5/1/4/UB",sw_rev="
> EPC3008",model="EPC3008-v302r125531-101220c",fl_ins=5,
> fl_miss=256,fl_padj=15,fl_crc=0,fl_flap=37,fl_hit=19482,fl_ltime="Sep 22
> 12:44:08",status_us2="sta",cw_good_us2=157500,cw_uncorr_us2=
> 507,cw_corr_us2=19064,tx_pwr_us2=45.00,snr_us2=31.41,rx_
> pwr_us2=29.00,status_us3="sta",cw_good_us3=237573,cw_uncorr_
> us3=2,cw_corr_us3=6909,tx_pwr_us3=45.00,snr_us3=34.18,rx_
> pwr_us3=29.00,cm_ip="172.16.11.15",mtc_mode=1,wideband_
> capable=1,prim_ds="Mo5/1/1:9",init_reason="POWER_ON",tto="6h11m",
> docsIfSigQUncorrectables.49=31,docsIfSigQSignalNoise.48=
> 390,docsIfSigQCorrecteds.53=16281,docsIfSigQCorrecteds.50=16003,
> docsIfSigQUncorrectables.51=144,docsIfSigQUncorrectables.48=179,
> docsIfSigQUncorrectables.3=18,docsIfSigQSignalNoise.3=389,
> docsIfSigQSignalNoise.54=398,docsIfDownChannelPower.50=0,
> docsIfDownChannelPower.49=-6,docsIfSigQUnerroreds.51=765089373,
> docsIfSigQCorrecteds.3=17400,docsIfSigQCorrecteds.49=16433,
> docsIfSigQSignalNoise.52=399,docsIfDownChannelPower.48=-18,
> ifHCOutOctets.1=368789007,docsIfDownChannelPower.54=-9,
> docsIfSigQCorrecteds.54=16376,ifHCInOctets.1=48467216,
> docsIfSigQUnerroreds.52=765083145,docsIfSigQCorrecteds.48=17615,
> docsIfSigQSignalNoise.51=394,docsIfSigQUnerroreds.50=765097168,
> docsIfSigQCorrecteds.51=16009,docsIfSigQSignalNoise.53=399,
> docsIfDownChannelPower.53=-2,docsIfSigQUnerroreds.53=765074315,
> docsIfSigQSignalNoise.50=393,docsIfSigQUnerroreds.48=765110628,
> docsIfSigQUncorrectables.53=13,docsIfSigQUnerroreds.3=765195092,
> docsIfSigQUncorrectables.54=74,docsIfDownChannelPower.51=
> 0,docsIfSigQUnerroreds.54=765068049,docsIfDownChannelPower.3=-14,
> docsIfSigQSignalNoise.49=394,docsIfDownChannelPower.52=1,
> docsIfSigQUnerroreds.49=765105625,docsIfSigQCorrecteds.52=15876,
> docsIfSigQUncorrectables.50=38,docsIfSigQUncorrectables.52=16 1474563439
>
> Now, we have two issues, one is that server restart every 24h due to OOM,
> look at this:
>
> Sep 29 20:34:21 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Sep 30 20:04:15 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  3 20:04:32 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  4 20:04:35 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x200da, order=0, oom_score_adj=0
> Oct  5 20:04:45 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
> Oct  6 20:04:46 node1 kernel: influxd invoked oom-killer:
> gfp_mask=0x280da, order=0, oom_score_adj=0
>
> and so on. The other issue is that some data are missing. For example we
> see the whole measurement is missing, while I am pretty sure that it's
> written because at the same time when we write to influx we write to a file
> and we don't see any errors from influx. We write 1000 measurements in one
> batch.
>
> I would really appreciate some help in resolving this issue since
> everything else works perfectly.
>
> Thank you.
>
> --
> Remember to include the version number!
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to influxdb+unsubscr...@googlegroups.com.
> To post to this group, send email to influxdb@googlegroups.com.
> Visit this group at