Herman van Hövell created SPARK-42690: -----------------------------------------
Summary: Implement CSV/JSON parsing funcions Key: SPARK-42690 URL: https://issues.apache.org/jira/browse/SPARK-42690 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.4.0 Reporter: Herman van Hövell Implement the following two methods in DataFrameReader: {code:java} /** * Loads a `Dataset[String]` storing JSON objects (<a href="http://jsonlines.org/">JSON Lines * text format or newline-delimited JSON</a>) and returns the result as a `DataFrame`. * * Unless the schema is specified using `schema` function, this function goes through the * input once to determine the input schema. * * @param jsonDataset input Dataset with one JSON object per record * @since 3.4.0 */ def json(jsonDataset: Dataset[String]): DataFrame /** * Loads an `Dataset[String]` storing CSV rows and returns the result as a `DataFrame`. * * If the schema is not specified using `schema` function and `inferSchema` option is enabled, * this function goes through the input once to determine the input schema. * * If the schema is not specified using `schema` function and `inferSchema` option is disabled, * it determines the columns as string types and it reads only the first line to determine the * names and the number of fields. * * If the enforceSchema is set to `false`, only the CSV header in the first line is checked * to conform specified or inferred schema. * * @note if `header` option is set to `true` when calling this API, all lines same with * the header will be removed if exists. * * @param csvDataset input Dataset with one CSV row per record * @since 3.4.0 */ def csv(csvDataset: Dataset[String]): DataFrame {code} For this we need a new message. We cannot use project because we don't know the schema upfront. {code:java} message Parse { // (Required) Input relation to Parse. The input is expected to have single text column. Relation input = 1; // (Required) The expected format of the text. ParseFormat format = 2; enum ParseFormat { PARSE_FORMAT_UNSPECIFIED = 0; PARSE_FORMAT_CSV = 1; PARSE_FORMAT_JSON = 2; } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org