[jira] [Created] (SPARK-47213) Proposal about moving on from the Shepherd terminology in SPIPs to "Mentor"

2024-02-28 Thread Mich Talebzadeh (Jira)
Mich Talebzadeh created SPARK-47213:
---

 Summary: Proposal about moving on from the Shepherd terminology in 
SPIPs to "Mentor"
 Key: SPARK-47213
 URL: https://issues.apache.org/jira/browse/SPARK-47213
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.5.1
 Environment: Documentation. SPIP form submission
Reporter: Mich Talebzadeh
 Fix For: 4.0.0


As an active member I am proposing a move to replace the current terminology 
"SPIP Shepherd" with the more respectful and inclusive term "SPIP Mentor." We 
have over the past few years have tried to replace some past terminologies with 
more acceptable ones.

While some may not find "Shepherd" offensive, it can unintentionally imply 
passivity or dependence on community members, which might not accurately 
reflect their expertise and contributions. Additionally, the shepherd-sheep 
dynamic might be interpreted as hierarchical, which does not align with the 
collaborative and open nature of Spark community.

*"SPIP Mentor"* better emphasizes the collaborative nature of the process, 
focusing on supporting and guiding members while respecting their strengths and 
contributions. It also avoids any potentially offensive or hierarchical 
connotations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47212) When creating Jira please use the word "mentor" instead of "Shepherd" in SPIP

2024-02-28 Thread Mich Talebzadeh (Jira)
Mich Talebzadeh created SPARK-47212:
---

 Summary: When creating Jira please use the word "mentor" instead 
of "Shepherd" in SPIP
 Key: SPARK-47212
 URL: https://issues.apache.org/jira/browse/SPARK-47212
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.3.4
Reporter: Mich Talebzadeh


As an active member I am proposing a move to replace the current terminology 
"SPIP Shepherd" with the more respectful and inclusive term "SPIP Mentor." We 
have over the past few years have tried to replace some past terminologies with 
more acceptable ones.

While some may not find "Shepherd" offensive, it can unintentionally imply 
passivity or dependence on community members, which might not accurately 
reflect their expertise and contributions. Additionally, the shepherd-sheep 
dynamic might be interpreted as hierarchical, which does not align with the 
collaborative and open nature of Spark community.

*"SPIP Mentor"* better emphasizes the collaborative nature of the process, 
focusing on supporting and guiding members while respecting their strengths and 
contributions. It also avoids any potentially offensive or hierarchical 
connotations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2024-02-27 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821433#comment-17821433
 ] 

Mich Talebzadeh commented on SPARK-24815:
-

some thoughts on this if I may

 

This enhancement request provides a solid foundation for improving dynamic 
allocation in Structured Streaming. Adding more specific details, outlining 
potential benefits, and addressing potential challenges can further strengthen 
the proposal and increase its chances of being implemented.

So these are my thoughts:



- Pluggable Dynamic Allocation: This suggestion shows good design principles, 
allowing for flexibility and future improvements. We should elaborate benefits 
of a pluggable approach, like customization and integration with external 
resource management tools.

- Separate Algorithm for Structured Streaming: This is crucial for adapting 
allocation strategies to the unique nature of streaming workloads Also  
outlining how a separate algorithm might differ from the batch counterpart 
could be useful

- Warning for Enabled Core Dynamic Allocation: This is a valuable warning to 
prevent accidental misuse and raise awareness among users. Also consider 
suggesting the warning level (e.g. info, warning, error) and potential content 
to provide clarity.

- Briefly mention potential challenges or trade-offs associated with 
implementing these proposals. Suggesting relevant discussions, resources, or 
alternative approaches could strengthen the request for enhancement

 

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24815) Structured Streaming should support dynamic allocation

2024-02-26 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820915#comment-17820915
 ] 

Mich Talebzadeh edited comment on SPARK-24815 at 2/26/24 11:58 PM:
---

Now that the ticket is reopened let us review the submitted documents. This has 
got 6 votes as of today. I volunteered to mentor it until a committer comes 
forward. Hope this helps to speed up the process and time to delivery.


was (Author: mich.talebza...@gmail.com):
Now that the ticket is reopened let us review the submitted documents. This has 
got 6 votes for now. I volunteered to mentor it until a committer comes forward 
to it. Hope this helps to speed up the process and time to delivery.

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2024-02-26 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820915#comment-17820915
 ] 

Mich Talebzadeh commented on SPARK-24815:
-

Now that the ticket is reopened let us review the submitted documents. This has 
got 6 votes for now. I volunteered to mentor it until a committer comes forward 
to it. Hope this helps to speed up the process and time to delivery.

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43929) Add date time functions to Scala and Python

2023-06-03 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728985#comment-17728985
 ] 

Mich Talebzadeh commented on SPARK-43929:
-

Might be a good idea to make these Date & Time functions more exhaustive. Today 
we write our own code to do the job for these functions. Some may already exist 
or already mentioned in the list


{code:java}
Name    Description
ADDDATE()    Add time values (intervals) to a date value
ADDTIME()    Add time
CONVERT_TZ()    Convert from one time zone to another
CURDATE()    Return the current date
CURRENT_DATE(), CURRENT_DATE    Synonyms for CURDATE()
CURRENT_TIME(), CURRENT_TIME    Synonyms for CURTIME()
CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP    Synonyms for NOW()
CURTIME()    Return the current time
DATE()    Extract the date part of a date or datetime expression
DATE_ADD()    Add time values (intervals) to a date value
DATE_FORMAT()    Format date as specified
DATE_SUB()    Subtract a time value (interval) from a date
DATEDIFF()    Subtract two dates
DAY()    Synonym for DAYOFMONTH()
DAYNAME()    Return the name of the weekday
DAYOFMONTH()    Return the day of the month (0-31)
DAYOFWEEK()    Return the weekday index of the argument
DAYOFYEAR()    Return the day of the year (1-366)
EXTRACT()    Extract part of a date
FROM_DAYS()    Convert a day number to a date
FROM_UNIXTIME()    Format Unix timestamp as a date
GET_FORMAT()    Return a date format string
HOUR()    Extract the hour
LAST_DAY    Return the last day of the month for the argument
LOCALTIME(), LOCALTIME    Synonym for NOW()
LOCALTIMESTAMP, LOCALTIMESTAMP()    Synonym for NOW()
MAKEDATE()    Create a date from the year and day of year
MAKETIME()    Create time from hour, minute, second
MICROSECOND()    Return the microseconds from argument
MINUTE()    Return the minute from the argument
MONTH()    Return the month from the date passed
MONTHNAME()    Return the name of the month
NOW()    Return the current date and time
PERIOD_ADD()    Add a period to a year-month
PERIOD_DIFF()    Return the number of months between periods
QUARTER()    Return the quarter from a date argument
SEC_TO_TIME()    Converts seconds to 'hh:mm:ss' format
SECOND()    Return the second (0-59)
STR_TO_DATE()    Convert a string to a date
SUBDATE()    Synonym for DATE_SUB() when invoked with three arguments
SUBTIME()    Subtract times
SYSDATE()    Return the time at which the function executes
TIME()    Extract the time portion of the expression passed
TIME_FORMAT()    Format as time
TIME_TO_SEC()    Return the argument converted to seconds
TIMEDIFF()    Subtract time
TIMESTAMP()    With a single argument, this function returns the date or 
datetime expression; with two arguments, the sum of the arguments
TIMESTAMPADD()    Add an interval to a datetime expression
TIMESTAMPDIFF()    Return the difference of two datetime expressions, using the 
units specified
TO_DAYS()    Return the date argument converted to days
TO_SECONDS()    Return the date or datetime argument converted to seconds since 
Year 0
UNIX_TIMESTAMP()    Return a Unix timestamp
UTC_DATE()    Return the current UTC date
UTC_TIME()    Return the current UTC time
UTC_TIMESTAMP()    Return the current UTC date and time
WEEK()    Return the week number
WEEKDAY()    Return the weekday index
WEEKOFYEAR()    Return the calendar week of the date (1-53)
YEAR()    Return the year
YEARWEEK()    Return the year and week{code}

> Add date time functions to Scala and Python
> ---
>
> Key: SPARK-43929
> URL: https://issues.apache.org/jira/browse/SPARK-43929
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * date_diff
> * date_from_unix_date
> * date_part
> * dateadd
> * datepart
> * day
> * weekday
> * convert_timezone
> * extract
> * now
> * timestamp_micros
> * timestamp_millis
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-03-06 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Attachment: sparkStructuredStreaming01.png

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
> Attachments: sparkStructuredStreaming01.png
>
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-27 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693883#comment-17693883
 ] 

Mich Talebzadeh commented on SPARK-42485:
-

Yes that is the intention. I have not had a chance to complete the 
documentation yet. 

 

[[SPIP] Shutting down spark structured streaming when the streaming process 
completed current process - Google 
Docs|https://docs.google.com/document/d/1SljobKKHiB2M7Md7raBOMM7o2EW6nglH-hEM1dtjUQg/edit#heading=h.ud7930xhlsm6]

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-21 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691790#comment-17691790
 ] 

Mich Talebzadeh commented on SPARK-42485:
-

Hi Boyang,

 

Please find my responses below

 
 # Can you share any use cases you have that will benefit from this feature
 # --> Sure i will add these to SPIP document in due course
 # Is there a SPIP doc written for this yet? If there is a SPIP doc written 
please link in the JIRA.
 # --> I have created an outline but not there yet.  [[SPIP] Shutting down 
spark structured streaming when the streaming process completed current process 
- Google 
Docs|https://docs.google.com/document/d/1SljobKKHiB2M7Md7raBOMM7o2EW6nglH-hEM1dtjUQg/edit#heading=h.ud7930xhlsm6]
 # In regards to this statement > I have devised a method that allows one to 
terminate the spark application internally after processing the last received 
message. Within say 2 seconds of the confirmation of shutdown, the process will 
invoke a graceful shutdown.

Do you mean the query will gracefully shutdown after the most current/most 
recent micro-batch is done processing?  

--> Just to qualify shutdown gracefully when the last message is processed 
successfully 

This is the original case that I posted in 24 April 2021 to the user group

"""

{color:#00}How to shutdown the topic doing work for the message being 
processed, wait for it to complete and shutdown the streaming process for a 
given topic.{color}

{color:#00} {color}

{color:#00}I thought about this and looked at options. Using sensors to 
implement this like airflow would be expensive as for example reading a file 
from object storage or from an underlying database would have incurred 
additional I/O overheads through continuous polling.{color}

{color:#00} {color}

{color:#00}So the design had to be incorporated into the streaming process 
itself. What I came up with was an addition of a control topic (I call it 
newtopic below), which keeps running triggered every 2 seconds say and is in 
json format with the following structure{color}

{color:#00} {color}

{color:#00}root{color}

{color:#00} |-- newtopic_value: struct (nullable = true){color}

{color:#00} |    |-- uuid: string (nullable = true){color}

{color:#00} |    |-- timeissued: timestamp (nullable = true){color}

{color:#00} |    |-- queue: string (nullable = true){color}

{color:#00} |    |-- status: string (nullable = true){color}
 
In above the queue refers to the business topic) and status is set to 'true', 
meaning carry on processing the business stream. This control topic streaming  
can be restarted anytime, and status can be set to false if we want to stop the 
streaming queue for a given business topic
 
ac7d0b2e-dc71-4b3f-a17a-500cd9d38efe    
{"uuid":"ac7d0b2e-dc71-4b3f-a17a-500cd9d38efe", 
"timeissued":"2021-04-23T08:54:06", {color:#ff}"queue":"md", 
"status":"true"{color}}
 
64a8321c-1593-428b-ae65-89e45ddf0640    
{"uuid":"64a8321c-1593-428b-ae65-89e45ddf0640", 
"timeissued":"2021-04-23T09:49:37", {color:#ff}"queue":"md", 
{color}{color:#ff}"status":"false"}{color}
 
So how can I stop the business queue when the current business topic message 
has been processed? Let us say the source is sending data for a business topic 
every 30 seconds. Our control topic sends a one liner as above every 2 seconds. 
 
In your writestream add the following line to be able to identify topic name
 
{color:#ff}trigger(processingTime='30 seconds'). \{color}
{color:#ff}*queryName('md').* \{color}
 
Next the controlling topic (called newtopic)  has the following
 
foreachBatch({*}sendToControl{*}). \

trigger(processingTime='2 seconds'). \
queryName('newtopic'). \
 
That method sendToControl does what is needed
 
def sendToControl(dfnewtopic, batchId):
    if(len(dfnewtopic.take(1))) > 0:
        #print(f"""newtopic batchId is \{batchId}""")
        #dfnewtopic.show(10,False)
        queue = dfnewtopic.select(col("queue")).collect()[0][0]
        status = dfnewtopic.select(col("status")).collect()[0][0]
 
        if((queue == 'md')) & (status == 'false')):
          spark_session = s.spark_session(config['common']['appName'])
          active = spark_session.streams.active
          for e in active:
             #print(e)
             name = [e.name|http://e.name/]
             if(name == 'md'):
                print(f"""Terminating streaming process \{name}""")
                e.stop()
    else:
        print("DataFrame newtopic is empty")
 
This seems to work as I checked it to ensure that in this case data was written 
and saved to the target sink (BigQuery table). It will wait until data is 
written completely meaning the current streaming message is processed and there 
is a latency there.
 
This is the output
 
Terminating streaming process md

wrote to DB  ## this is the flag  I added to ensure the current 

[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690947#comment-17690947
 ] 

Mich Talebzadeh commented on SPARK-42485:
-

done thanks

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Affects Version/s: 3.2.2

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2, 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Affects Version/s: (was: 3.3.2)

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690847#comment-17690847
 ] 

Mich Talebzadeh commented on SPARK-42485:
-

How about Target Version?

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-19 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Target Version/s:   (was: 3.3.2)

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming is a very useful tool in dealing with Event Driven 
Architecture. In an Event Driven Architecture, there is generally a main loop 
that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully 

[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

 

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming AKA SSS is a very useful tool in dealing with 
> Event Driven Architecture. In an Event Driven Architecture, there is 
> generally a main loop that listens for events and then triggers a call-back 
> function when one of those events is detected. In a streaming application the 
> application waits to receive the source messages in a set interval or 
> whenever they happen and reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 

[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

 

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:{color:#00}How to shutdown the topic doing work for the message being 
processed, wait for it to complete and shutdown the streaming process for a 
given topic.{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming AKA SSS is a very useful tool in dealing with 
> Event Driven Architecture. In an Event Driven Architecture, there is 
> generally a main loop that listens for events and then triggers a call-back 
> function when one of those events is detected. In a streaming application the 
> application waits to receive the source messages in a set interval or 
> whenever they happen and reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
>  
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Shepherd: Dongjoon Hyun
Target Version/s: 3.3.2
  Labels: SPIP  (was: )

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> {color:#00}How to shutdown the topic doing work for the message being 
> processed, wait for it to complete and shutdown the streaming process for a 
> given topic.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)
Mich Talebzadeh created SPARK-42485:
---

 Summary: SPIP: Shutting down spark structured streaming when the 
streaming process completed current process
 Key: SPARK-42485
 URL: https://issues.apache.org/jira/browse/SPARK-42485
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.3.2
Reporter: Mich Talebzadeh


{color:#00}How to shutdown the topic doing work for the message being 
processed, wait for it to complete and shutdown the streaming process for a 
given topic.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17047) Spark 2 cannot create ORC table when CLUSTERED.

2016-08-13 Thread Dr Mich Talebzadeh (JIRA)
Dr Mich Talebzadeh created SPARK-17047:
--

 Summary: Spark 2 cannot create ORC table when CLUSTERED.
 Key: SPARK-17047
 URL: https://issues.apache.org/jira/browse/SPARK-17047
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Dr Mich Talebzadeh


This does not work with CLUSTERED BY clause in Spark 2 now!




CREATE TABLE test.dummy2
 (
 ID INT
   , CLUSTERED INT
   , SCATTERED INT
   , RANDOMISED INT
   , RANDOM_STRING VARCHAR(50)
   , SMALL_VC VARCHAR(10)
   , PADDING  VARCHAR(10)
)
CLUSTERED BY (ID) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )

scala> HiveContext.sql(sqltext)
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: CREATE TABLE ... CLUSTERED BY(line 2, pos 0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org