Thanks for the explanation.
To summarize, the segment is created by picking larger of the 2 values
`segment.flush.threshold.rows` and `segment.flush.threshold.segment.size`
and also even earlier if time happens first ?

Thanks

On Wed, Oct 20, 2021, 4:35 PM Sajjad Moradi <moradi.saj...@gmail.com> wrote:

> Arpit,
>
> `segment.flush.threshold.size` and `segment.flush.threshold.rows` are the
> same parameters. The former is deprecated. If you specify the
> threshold.rows, the value for threshold.size will be ignored. That means
> your effective flush threshold parameters are:
> segment.flush.threshold.rows:"0"
> segment.flush.threshold.time:"6h"
> On the other hand, having threshold.rows as zero kicks off auto-tuning
> process, meaning that initially 100K rows are consumed. Then the memory
> size of the consumed segment with 100K is compared with the value of
> `segment.flush.threshold.segment.size` parameter which indicates the
> desired segment size. If the consumed segment size is smaller than the
> desired size, 100K will be increased to generate a bigger segment size.
> Hope that answers your question.
>
>
> On Wed, Oct 20, 2021 at 6:07 AM Arpit Jain <jain.arp...@gmail.com> wrote:
>
>> I have checked logs and cant find any obvious errors. Both segments are
>> in "consuming" state
>> I do see below line in logs and I am not sure how it picks this number
>> and has it stopped consuming any more because of this limit?
>> Stopping consumption due to row limit nRows=100000,
>> numRowsIndxed=100000,numRowsconsumed=100000
>>
>> Thanks
>>
>> On Wed, Oct 20, 2021, 1:39 PM Mayank Shrivastava <
>> mayanks.apa...@gmail.com> wrote:
>>
>>> Hi Arpit,
>>>
>>> 1. You can check the external view of the real-time table (via swagger
>>> api or zk browser in the console). Segments showing as ONLINE are flushed
>>> to disk and ones showing as CONSUMING are still in memory and not committed
>>> to disk yet.
>>> 2. Can you run the debug api from swagger to see if any errors in server?
>>>
>>> Also, for faster turnaround, please join the Apache Pinot slack
>>> community as well.
>>>
>>> Thanks
>>> Mayank
>>>
>>> > On Oct 20, 2021, at 3:10 AM, Arpit Jain <jain.arp...@gmail.com> wrote:
>>> >
>>> > 
>>> > Hi,
>>> >
>>> > I have setup a Pinot cluster 0.8.0 for real time data ingestion from
>>> Kafka. It is able to consume data but it just consumes 100000 docs and
>>> stops I believe.
>>> > Reading the docs, it should flush after a certain period of time/rows
>>> but I think thats not happening.
>>> > I have below questions:
>>> > 1. How do I confirm if its flushing to disk?
>>> > 2. Why it is only consuming 100k docs ?
>>> > My settings are:
>>> > segment.flush.threshold.rows:"0"
>>> > segment.flush.threshold.size:"10000000"
>>> > segment.flush.threshold.time:"6h"
>>> > segment.flush.segment.size:"150M"
>>> >
>>> > Any inputs welcome.
>>> >
>>> > Regards,
>>> > Arpit
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscr...@pinot.apache.org
>>> For additional commands, e-mail: users-h...@pinot.apache.org
>>>
>>>

Reply via email to