[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-47819:
--
Fix Version/s: 3.5.2

> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-47819:
--
Affects Version/s: 3.5.1
   3.5.0

> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47819:
---
Labels: pull-request-available  (was: )

> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-11 Thread Xi Lyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Lyu updated SPARK-47819:
---
Description: 
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in rare cases, 
interrupting the execution thread of a query in a session can take hours, 
causing the entire maintenance process to stall, resulting in a large amount of 
memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above scenarios. To be more specific, instead of 
calling {{runner.join()}} in ExecutorHolder.close(), we set a post-cleanup 
function as the callback through {{{}runner.processOnCompletion{}}}, which will 
be called asynchronously once the execution runner is completed or interrupted. 
In this way, the maintenance thread won't get blocked on {{{}join{}}}ing an 
execution thread.

 

  was:
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios. To be more specific, 
instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
post-cleanup function as the callback through 
{{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
the execution runner is completed or interrupted. In this way, the maintenance 
thread won't get blocked on {{{}join{}}}ing an execution thread.

 


> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in rare 
> cases, interrupting the execution thread of a query in a session can take 
> hours, causing the entire maintenance process to stall, resulting in a large 
> amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above scenarios. To be more specific, 
> instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
> post-cleanup function as the callback through 
> {{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
> the execution runner is completed or interrupted. In this way, the 
> maintenance thread won't get blocked on {{{}join{}}}ing an execution thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-11 Thread Xi Lyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Lyu updated SPARK-47819:
---
Description: 
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios. To be more specific, 
instead of calling {{runner.join()}} in ExecutorHolder.close(), we set a 
post-cleanup function as the callback through 
{{{}runner.processOnCompletion{}}}, which will be called asynchronously once 
the execution runner is completed or interrupted. In this way, the maintenance 
thread won't get blocked on {{{}join{}}}ing an execution thread.

 

  was:
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios.

 


> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in 
> occasional cases, interrupting the execution thread of a query in a session 
> can take hours, causing the entire maintenance process to stall, resulting in 
> a large amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above occasional scenarios. To be 
> more specific, instead of calling {{runner.join()}} in 
> ExecutorHolder.close(), we set a post-cleanup function as the callback 
> through {{{}runner.processOnCompletion{}}}, which will be called 
> asynchronously once the execution runner is completed or interrupted. In this 
> way, the maintenance thread won't get blocked on {{{}join{}}}ing an execution 
> thread.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-11 Thread Xi Lyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Lyu updated SPARK-47819:
---
Description: 
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios.

 

  was:
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios.

A minimal example of the problem:
{code:java}
import pyspark.sql.functions as F
df = spark.range(10)
for i in range(200):
  if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
request in every iteration
    df = df.withColumn(str(i), F.col("id") + i)
df.show() {code}
 


> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in 
> occasional cases, interrupting the execution thread of a query in a session 
> can take hours, causing the entire maintenance process to stall, resulting in 
> a large amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above occasional scenarios.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47819) Use asynchronous callback for execution cleanup

2024-04-11 Thread Xi Lyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Lyu updated SPARK-47819:
---
Description: 
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios.

A minimal example of the problem:
{code:java}
import pyspark.sql.functions as F
df = spark.range(10)
for i in range(200):
  if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
request in every iteration
    df = df.withColumn(str(i), F.col("id") + i)
df.show() {code}
 

  was:
Expired sessions are regularly checked and cleaned up by a maintenance thread. 
However, currently, this process is synchronous. Therefore, in occasional 
cases, interrupting the execution thread of a query in a session can take 
hours, causing the entire maintenance process to stall, resulting in a large 
amount of memory not being cleared.

We address this by introducing asynchronous callbacks for execution cleanup, 
avoiding synchronous joins of execution threads, and preventing the maintenance 
thread from stalling in the above occasional scenarios.


> Use asynchronous callback for execution cleanup
> ---
>
> Key: SPARK-47819
> URL: https://issues.apache.org/jira/browse/SPARK-47819
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Xi Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> Expired sessions are regularly checked and cleaned up by a maintenance 
> thread. However, currently, this process is synchronous. Therefore, in 
> occasional cases, interrupting the execution thread of a query in a session 
> can take hours, causing the entire maintenance process to stall, resulting in 
> a large amount of memory not being cleared.
> We address this by introducing asynchronous callbacks for execution cleanup, 
> avoiding synchronous joins of execution threads, and preventing the 
> maintenance thread from stalling in the above occasional scenarios.
> A minimal example of the problem:
> {code:java}
> import pyspark.sql.functions as F
> df = spark.range(10)
> for i in range(200):
>   if str(i) not in df.columns: # <-- The df.columns call causes a new Analyze 
> request in every iteration
>     df = df.withColumn(str(i), F.col("id") + i)
> df.show() {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org