amaliujia commented on code in PR #40310: URL: https://github.com/apache/spark/pull/40310#discussion_r1127370577
########## python/pyspark/sql/connect/session.py: ########## @@ -235,6 +235,9 @@ def createDataFrame( # If no schema supplied by user then get the names of columns only if schema is None: _cols = [str(x) if not isinstance(x, str) else x for x in data.columns] + elif isinstance(schema, (list, tuple)) and _num_cols < len(data.columns): + _cols = _cols + [f"_{i + 1}" for i in range(_num_cols, len(data.columns))] Review Comment: In fact, I guess probably we can do a bit more: need to make sure the user provided column name is not the same as the auto-generated one. Though the probability of the collision is small so maybe this is not a big concern. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org