Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-10 Thread German Schiavon
OK got it!


Thanks!

On Tue, 9 Mar 2021 at 21:17, Jungtaek Lim 
wrote:

> That property decides how many log files (log file is created per batch
> per type - types are like offsets, commits, etc.) to retain on the
> checkpoint.
>
> Unless you're struggling with a small files problem on checkpoint, you
> wouldn't need to tune the value. I guess that's why the configuration is
> marked as "internal" meaning just some admins need to know about such
> configuration.
>
> On Wed, Mar 10, 2021 at 3:58 AM German Schiavon 
> wrote:
>
>> Hey Maxim,
>>
>> ok! I didn't see them.
>>
>> Is this property documented somewhere?
>>
>> Thanks!
>>
>> On Tue, 9 Mar 2021 at 13:57, Maxim Gekk 
>> wrote:
>>
>>> Hi German,
>>>
>>> It is used at least at:
>>> 1.
>>> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56
>>> 2.
>>> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>>
>>> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon 
>>> wrote:
>>>
 Hello all,

 I wanted to ask if this property is still active? I can't find it in
 the doc https://spark.apache.org/docs/latest/configuration.html or
 anywhere in the code(only in Tests).

 If so, should we remove it?

 val MIN_BATCHES_TO_RETAIN = 
 buildConf("spark.sql.streaming.minBatchesToRetain")
   .internal()
   .doc("The minimum number of batches that must be retained and made 
 recoverable.")
   .version("2.1.1")
   .intConf
   .createWithDefault(100)




Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Jungtaek Lim
That property decides how many log files (log file is created per batch per
type - types are like offsets, commits, etc.) to retain on the checkpoint.

Unless you're struggling with a small files problem on checkpoint, you
wouldn't need to tune the value. I guess that's why the configuration is
marked as "internal" meaning just some admins need to know about such
configuration.

On Wed, Mar 10, 2021 at 3:58 AM German Schiavon 
wrote:

> Hey Maxim,
>
> ok! I didn't see them.
>
> Is this property documented somewhere?
>
> Thanks!
>
> On Tue, 9 Mar 2021 at 13:57, Maxim Gekk  wrote:
>
>> Hi German,
>>
>> It is used at least at:
>> 1.
>> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56
>> 2.
>> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon 
>> wrote:
>>
>>> Hello all,
>>>
>>> I wanted to ask if this property is still active? I can't find it in the
>>> doc https://spark.apache.org/docs/latest/configuration.html or anywhere
>>> in the code(only in Tests).
>>>
>>> If so, should we remove it?
>>>
>>> val MIN_BATCHES_TO_RETAIN = 
>>> buildConf("spark.sql.streaming.minBatchesToRetain")
>>>   .internal()
>>>   .doc("The minimum number of batches that must be retained and made 
>>> recoverable.")
>>>   .version("2.1.1")
>>>   .intConf
>>>   .createWithDefault(100)
>>>
>>>


Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread German Schiavon
Hey Maxim,

ok! I didn't see them.

Is this property documented somewhere?

Thanks!

On Tue, 9 Mar 2021 at 13:57, Maxim Gekk  wrote:

> Hi German,
>
> It is used at least at:
> 1.
> https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56
> 2.
> https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Tue, Mar 9, 2021 at 3:27 PM German Schiavon 
> wrote:
>
>> Hello all,
>>
>> I wanted to ask if this property is still active? I can't find it in the
>> doc https://spark.apache.org/docs/latest/configuration.html or anywhere
>> in the code(only in Tests).
>>
>> If so, should we remove it?
>>
>> val MIN_BATCHES_TO_RETAIN = 
>> buildConf("spark.sql.streaming.minBatchesToRetain")
>>   .internal()
>>   .doc("The minimum number of batches that must be retained and made 
>> recoverable.")
>>   .version("2.1.1")
>>   .intConf
>>   .createWithDefault(100)
>>
>>


Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Maxim Gekk
Hi German,

It is used at least at:
1.
https://github.com/apache/spark/blob/a093d6feefb0e086d19c86ae53bf92df12ccf2fa/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala#L56
2.
https://github.com/apache/spark/blob/e7e016192f882cfb430d706c2099e58e1bcc014c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84

Maxim Gekk

Software Engineer

Databricks, Inc.


On Tue, Mar 9, 2021 at 3:27 PM German Schiavon 
wrote:

> Hello all,
>
> I wanted to ask if this property is still active? I can't find it in the
> doc https://spark.apache.org/docs/latest/configuration.html or anywhere
> in the code(only in Tests).
>
> If so, should we remove it?
>
> val MIN_BATCHES_TO_RETAIN = 
> buildConf("spark.sql.streaming.minBatchesToRetain")
>   .internal()
>   .doc("The minimum number of batches that must be retained and made 
> recoverable.")
>   .version("2.1.1")
>   .intConf
>   .createWithDefault(100)
>
>


Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread German Schiavon
Hello all,

I wanted to ask if this property is still active? I can't find it in the
doc https://spark.apache.org/docs/latest/configuration.html or anywhere in
the code(only in Tests).

If so, should we remove it?

val MIN_BATCHES_TO_RETAIN = buildConf("spark.sql.streaming.minBatchesToRetain")
  .internal()
  .doc("The minimum number of batches that must be retained and made
recoverable.")
  .version("2.1.1")
  .intConf
  .createWithDefault(100)