Thanks for the explanation. To summarize, the segment is created by picking larger of the 2 values `segment.flush.threshold.rows` and `segment.flush.threshold.segment.size` and also even earlier if time happens first ?
Thanks On Wed, Oct 20, 2021, 4:35 PM Sajjad Moradi <moradi.saj...@gmail.com> wrote: > Arpit, > > `segment.flush.threshold.size` and `segment.flush.threshold.rows` are the > same parameters. The former is deprecated. If you specify the > threshold.rows, the value for threshold.size will be ignored. That means > your effective flush threshold parameters are: > segment.flush.threshold.rows:"0" > segment.flush.threshold.time:"6h" > On the other hand, having threshold.rows as zero kicks off auto-tuning > process, meaning that initially 100K rows are consumed. Then the memory > size of the consumed segment with 100K is compared with the value of > `segment.flush.threshold.segment.size` parameter which indicates the > desired segment size. If the consumed segment size is smaller than the > desired size, 100K will be increased to generate a bigger segment size. > Hope that answers your question. > > > On Wed, Oct 20, 2021 at 6:07 AM Arpit Jain <jain.arp...@gmail.com> wrote: > >> I have checked logs and cant find any obvious errors. Both segments are >> in "consuming" state >> I do see below line in logs and I am not sure how it picks this number >> and has it stopped consuming any more because of this limit? >> Stopping consumption due to row limit nRows=100000, >> numRowsIndxed=100000,numRowsconsumed=100000 >> >> Thanks >> >> On Wed, Oct 20, 2021, 1:39 PM Mayank Shrivastava < >> mayanks.apa...@gmail.com> wrote: >> >>> Hi Arpit, >>> >>> 1. You can check the external view of the real-time table (via swagger >>> api or zk browser in the console). Segments showing as ONLINE are flushed >>> to disk and ones showing as CONSUMING are still in memory and not committed >>> to disk yet. >>> 2. Can you run the debug api from swagger to see if any errors in server? >>> >>> Also, for faster turnaround, please join the Apache Pinot slack >>> community as well. >>> >>> Thanks >>> Mayank >>> >>> > On Oct 20, 2021, at 3:10 AM, Arpit Jain <jain.arp...@gmail.com> wrote: >>> > >>> > >>> > Hi, >>> > >>> > I have setup a Pinot cluster 0.8.0 for real time data ingestion from >>> Kafka. It is able to consume data but it just consumes 100000 docs and >>> stops I believe. >>> > Reading the docs, it should flush after a certain period of time/rows >>> but I think thats not happening. >>> > I have below questions: >>> > 1. How do I confirm if its flushing to disk? >>> > 2. Why it is only consuming 100k docs ? >>> > My settings are: >>> > segment.flush.threshold.rows:"0" >>> > segment.flush.threshold.size:"10000000" >>> > segment.flush.threshold.time:"6h" >>> > segment.flush.segment.size:"150M" >>> > >>> > Any inputs welcome. >>> > >>> > Regards, >>> > Arpit >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscr...@pinot.apache.org >>> For additional commands, e-mail: users-h...@pinot.apache.org >>> >>>