Re: [PR] [SPARK-46620][PS][CONNECT] Introduce a basic fallback mechanism for frame methods [spark]

via GitHub Wed, 24 Jan 2024 20:17:25 -0800


zhengruifeng commented on code in PR #44869:
URL: https://github.com/apache/spark/pull/44869#discussion_r1465822177



##########
python/pyspark/pandas/frame.py:
##########
@@ -13446,10 +13447,46 @@ def _index_normalized_frame(level: int, 
psser_or_psdf: DataFrameOrSeries) -> "Da
 
         return psdf
 
+    def _fall_back_frame(self, method: str) -> Callable:
+        def _internal_fall_back_function_(*inputs: Any, **kwargs: Any):
+            log_advice(
+                f"`{method}` is executed in fallback mode. It loads partial 
data into the driver's memory"
+                f" to infer the schema, and loads all data into one executor's 
memory to compute. "
+                "It should only be used if the pandas DataFrame is expected to 
be small."
+            )
+            input_df = self.copy()
+
+            uid = str(uuid.uuid4()).replace("-", "")
+            tmp_agg_column_name = 
f"__tmp_aggregate_col_for_frame_{method}_{uid}__"
+            tmp_idx_column_name = f"__tmp_index_col_for_frame_{method}_{uid}__"

Review Comment:
   so I just make such conversion:
   
   - input psdf index -> input psdf normal column -> reset pdf index -> 
fallback computation
   - fallback computation -> output pdf index -> output pdf normal column -> 
reset psdf index



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46620][PS][CONNECT] Introduce a basic fallback mechanism for frame methods [spark]

Reply via email to