Structured streaming help on releasing memory

2022-05-09 Thread Xavi Gervilla
Hi Team, I'm developing a streaming project that obtains tweets in real time and after applying some ML models and transformations generate a treemap of the data. The problem I'm facing is that after processing each batch and after the timestamp is completed, memory isn't liberated. I've be

Spark on K8s - repeating annoying exception

2022-05-09 Thread Shay Elbaz
Hi all, I apologize for reposting this from Stack Overflow, but it got very little attention and now comment. I'm using Spark 3.2.1 image that was built from the official distribution via `docker-image-tool.sh', on Kubernetes 1.18 cluster. Everything works fine, except for this error message on

Re: How do I read parquet with python object

2022-05-09 Thread Sean Owen
That's a parquet library error. It might be this: https://issues.apache.org/jira/browse/PARQUET-1633 That's fixed in recent versions of Parquet. You didn't say what versions of libraries you are using, but try the latest Spark. On Mon, May 9, 2022 at 8:49 AM wrote: > # python: > > import pandas

How do I read parquet with python object

2022-05-09 Thread ben
# python: import pandas as pd a = pd.DataFrame([[1, [2.3, 1.2]]], columns=['a', 'b']) a.to_parquet('a.parquet') # pyspark: d2 = spark.read.parquet('a.parquet') will return error: An error was encountered: An error occurred while calling o277.showString. : org.apache.spark.SparkException: Job