Re: Removing Kinesis in Spark 4

2024-01-20 Thread Nicholas Chammas
Oh, that’s a very interesting dashboard. I was familiar with the Matomo snippet 
but never looked up where exactly those metrics were going.

I see that the Kinesis docs do indeed have around 650 views in the past month, 
but for Kafka I see 11K and 1.3K views for the Structured Streaming and DStream 
docs, respectively. Big difference there, though maybe that's because Kinesis 
doesn’t have docs for structured streaming. Hard to tell.





These statistics also raise questions about the future of the R API, though 
that’s a topic for another thread.



Nick


> On Jan 20, 2024, at 1:05 PM, Sean Owen  wrote:
> 
> I'm not aware of much usage. but that doesn't mean a lot.
> 
> FWIW, in the past month or so, the Kinesis docs page got about 700 views, 
> compared to about 1400 for Kafka
> https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40#?idSite=40&period=range&date=2023-12-15,2024-01-20&category=General_Actions&subcategory=Actions_SubmenuPageTitles
> 
> Those are "low" in general, compared to the views for streaming pages, which 
> got tens of thousands of views.
> 
> I do feel like it's unmaintained, and do feel like it might be a stretch to 
> leave it lying around until Spark 5.
> It's not exactly unused though.
> 
> I would not object to removing it unless there is some voice of support here.
> 
> On Sat, Jan 20, 2024 at 10:38 AM Nicholas Chammas  > wrote:
>> From the dev thread: What else could be removed in Spark 4? 
>> 
>>> On Aug 17, 2023, at 1:44 AM, Yang Jie >> > wrote:
>>> 
>>> I would like to know how we should handle the two Kinesis-related modules 
>>> in Spark 4.0. They have a very low frequency of code updates, and because 
>>> the corresponding tests are not continuously executed in any GitHub Actions 
>>> pipeline, so I think they significantly lack quality assurance. On top of 
>>> that, I am not certain if the test cases, which require AWS credentials in 
>>> these modules, get verified during each Spark version release.
>> 
>> Did we ever reach a decision about removing Kinesis in Spark 4?
>> 
>> I was cleaning up some docs related to Kinesis and came across a reference 
>> to some Java API docs that I could not find 
>> . And 
>> looking around I came across both this email thread and this thread on JIRA 
>> 
>>  about potentially removing Kinesis.
>> 
>> But as far as I can tell we haven’t made a clear decision one way or the 
>> other.
>> 
>> Nick
>> 



Re: Removing Kinesis in Spark 4

2024-01-20 Thread Sean Owen
I'm not aware of much usage. but that doesn't mean a lot.

FWIW, in the past month or so, the Kinesis docs page got about 700 views,
compared to about 1400 for Kafka
https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40#?idSite=40&period=range&date=2023-12-15,2024-01-20&category=General_Actions&subcategory=Actions_SubmenuPageTitles

Those are "low" in general, compared to the views for streaming pages,
which got tens of thousands of views.

I do feel like it's unmaintained, and do feel like it might be a stretch to
leave it lying around until Spark 5.
It's not exactly unused though.

I would not object to removing it unless there is some voice of support
here.

On Sat, Jan 20, 2024 at 10:38 AM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> From the dev thread: What else could be removed in Spark 4?
> 
>
> On Aug 17, 2023, at 1:44 AM, Yang Jie  wrote:
>
> I would like to know how we should handle the two Kinesis-related modules
> in Spark 4.0. They have a very low frequency of code updates, and because
> the corresponding tests are not continuously executed in any GitHub Actions
> pipeline, so I think they significantly lack quality assurance. On top of
> that, I am not certain if the test cases, which require AWS credentials in
> these modules, get verified during each Spark version release.
>
>
> Did we ever reach a decision about removing Kinesis in Spark 4?
>
> I was cleaning up some docs related to Kinesis and came across a reference
> to some Java API docs that I could not find
> . And
> looking around I came across both this email thread and this thread on
> JIRA
> 
>  about
> potentially removing Kinesis.
>
> But as far as I can tell we haven’t made a clear decision one way or the
> other.
>
> Nick
>
>


Removing Kinesis in Spark 4

2024-01-20 Thread Nicholas Chammas
From the dev thread: What else could be removed in Spark 4? 

> On Aug 17, 2023, at 1:44 AM, Yang Jie  wrote:
> 
> I would like to know how we should handle the two Kinesis-related modules in 
> Spark 4.0. They have a very low frequency of code updates, and because the 
> corresponding tests are not continuously executed in any GitHub Actions 
> pipeline, so I think they significantly lack quality assurance. On top of 
> that, I am not certain if the test cases, which require AWS credentials in 
> these modules, get verified during each Spark version release.

Did we ever reach a decision about removing Kinesis in Spark 4?

I was cleaning up some docs related to Kinesis and came across a reference to 
some Java API docs that I could not find 
. And 
looking around I came across both this email thread and this thread on JIRA 

 about potentially removing Kinesis.

But as far as I can tell we haven’t made a clear decision one way or the other.

Nick



Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-20 Thread Pavan Kotikalapudi
Here is the link to the voting thread
https://lists.apache.org/thread/rlwqrw6ddxdkbvkp78kpd0zgvglgbbp8.

Thank you,

Pavan

On Wed, Jan 17, 2024 at 7:15 PM Pavan Kotikalapudi 
wrote:

> Thanks for the +1, I will propose voting in a new thread now.
>
> - Pavan
>
> On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh 
> wrote:
>
>> I think we have discussed this enough and I consider it as a useful
>> feature.. I propose a vote on it.
>>
>> + 1 for me
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 8 Aug 2023 at 01:30, Pavan Kotikalapudi
>>  wrote:
>>
>>> Hi Spark Dev,
>>>
>>> I have extended traditional DRA to work for structured streaming
>>> use-case.
>>>
>>> Here is an initial Implementation draft PR
>>> https://github.com/apache/spark/pull/42352
>>> 
>>>  and
>>> design doc:
>>> https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing
>>> 
>>>
>>> Please review and let me know what you think.
>>>
>>> Thank you,
>>>
>>> Pavan
>>>
>>