Haejoon Lee created SPARK-47543: ----------------------------------- Summary: Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. Key: SPARK-47543 URL: https://issues.apache.org/jira/browse/SPARK-47543 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee
Currently the PyArrow infers the Pandas dictionary field as StructType instead of MapType, so Spark can't handle the schema properly: {code:java} >>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7, >>> 'second': 0.3}]}) >>> pa.Schema.from_pandas(pdf) str_col: string dict_col: struct<first: double, second: double> child 0, first: double child 1, second: double {code} We cannot handle this case since we use PyArrow for schema creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org