Re: Property spark.sql.streaming.minBatchesToRetain
OK got it! Thanks! On Tue, 9 Mar 2021 at 21:17, Jungtaek Lim wrote: > That property decides how many log files (log file is created per batch > per type - types are like offsets, commits, etc.) to retain on the > checkpoint. > > Unless you're struggling with a small files problem on checkpoint, you > wouldn't need to tune the value. I guess that's why the configuration is > marked as "internal" meaning just some admins need to know about such > configuration. > > On Wed, Mar 10, 2021 at 3:58 AM German Schiavon > wrote: > >> Hey Maxim, >> >> ok! I didn't see them. >> >> Is this property documented somewhere? >> >> Thanks! >> >> On Tue, 9 Mar 2021 at 13:57, Maxim Gekk >> wrote: >> >>> Hi German, >>> >>> It is used at least at: >>> 1. >>> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 >>> 2. >>> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 >>> >>> Maxim Gekk >>> >>> Software Engineer >>> >>> Databricks, Inc. >>> >>> >>> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon >>> wrote: >>> Hello all, I wanted to ask if this property is still active? I can't find it in the doc https://spark.apache.org/docs/latest/configuration.html or anywhere in the code(only in Tests). If so, should we remove it? val MIN_BATCHES_TO_RETAIN = buildConf("spark.sql.streaming.minBatchesToRetain") .internal() .doc("The minimum number of batches that must be retained and made recoverable.") .version("2.1.1") .intConf .createWithDefault(100)
Re: Property spark.sql.streaming.minBatchesToRetain
That property decides how many log files (log file is created per batch per type - types are like offsets, commits, etc.) to retain on the checkpoint. Unless you're struggling with a small files problem on checkpoint, you wouldn't need to tune the value. I guess that's why the configuration is marked as "internal" meaning just some admins need to know about such configuration. On Wed, Mar 10, 2021 at 3:58 AM German Schiavon wrote: > Hey Maxim, > > ok! I didn't see them. > > Is this property documented somewhere? > > Thanks! > > On Tue, 9 Mar 2021 at 13:57, Maxim Gekk wrote: > >> Hi German, >> >> It is used at least at: >> 1. >> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 >> 2. >> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 >> >> Maxim Gekk >> >> Software Engineer >> >> Databricks, Inc. >> >> >> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon >> wrote: >> >>> Hello all, >>> >>> I wanted to ask if this property is still active? I can't find it in the >>> doc https://spark.apache.org/docs/latest/configuration.html or anywhere >>> in the code(only in Tests). >>> >>> If so, should we remove it? >>> >>> val MIN_BATCHES_TO_RETAIN = >>> buildConf("spark.sql.streaming.minBatchesToRetain") >>> .internal() >>> .doc("The minimum number of batches that must be retained and made >>> recoverable.") >>> .version("2.1.1") >>> .intConf >>> .createWithDefault(100) >>> >>>
Re: Property spark.sql.streaming.minBatchesToRetain
Hey Maxim, ok! I didn't see them. Is this property documented somewhere? Thanks! On Tue, 9 Mar 2021 at 13:57, Maxim Gekk wrote: > Hi German, > > It is used at least at: > 1. > https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 > 2. > https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 > > Maxim Gekk > > Software Engineer > > Databricks, Inc. > > > On Tue, Mar 9, 2021 at 3:27 PM German Schiavon > wrote: > >> Hello all, >> >> I wanted to ask if this property is still active? I can't find it in the >> doc https://spark.apache.org/docs/latest/configuration.html or anywhere >> in the code(only in Tests). >> >> If so, should we remove it? >> >> val MIN_BATCHES_TO_RETAIN = >> buildConf("spark.sql.streaming.minBatchesToRetain") >> .internal() >> .doc("The minimum number of batches that must be retained and made >> recoverable.") >> .version("2.1.1") >> .intConf >> .createWithDefault(100) >> >>
Re: Property spark.sql.streaming.minBatchesToRetain
Hi German, It is used at least at: 1. https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56 2. https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 Maxim Gekk Software Engineer Databricks, Inc. On Tue, Mar 9, 2021 at 3:27 PM German Schiavon wrote: > Hello all, > > I wanted to ask if this property is still active? I can't find it in the > doc https://spark.apache.org/docs/latest/configuration.html or anywhere > in the code(only in Tests). > > If so, should we remove it? > > val MIN_BATCHES_TO_RETAIN = > buildConf("spark.sql.streaming.minBatchesToRetain") > .internal() > .doc("The minimum number of batches that must be retained and made > recoverable.") > .version("2.1.1") > .intConf > .createWithDefault(100) > >
Property spark.sql.streaming.minBatchesToRetain
Hello all, I wanted to ask if this property is still active? I can't find it in the doc https://spark.apache.org/docs/latest/configuration.html or anywhere in the code(only in Tests). If so, should we remove it? val MIN_BATCHES_TO_RETAIN = buildConf("spark.sql.streaming.minBatchesToRetain") .internal() .doc("The minimum number of batches that must be retained and made recoverable.") .version("2.1.1") .intConf .createWithDefault(100)