hejsgpuom62c created SPARK-27050: ------------------------------------ Summary: Bean Encoder serializes data in a wrong order if input schema is not ordered Key: SPARK-27050 URL: https://issues.apache.org/jira/browse/SPARK-27050 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: hejsgpuom62c
Steps to reproduce. Define schema like this {code:java} StructType valid = StructType.fromDDL( "broker_name string, order integer, server_name string, " + "storages array<struct<timestamp: timestamp, storage: double>>" );{code} {code:java} package com.example; import java.io.Serializable; import lombok.Data; import lombok.AllArgsConstructor; import lombok.NoArgsConstructor; @Data @NoArgsConstructor @AllArgsConstructor public class Entity implements Serializable { private String broker_name; private String server_name; private Integer order; private Storage[] storages; }{code} {code:java} package com.example; import java.io.Serializable; import lombok.Data; import lombok.AllArgsConstructor; import lombok.NoArgsConstructor; @Data @NoArgsConstructor @AllArgsConstructor public class Storage implements Serializable { private java.sql.Timestamp timestamp; private Double storage; }{code} Create a JSON file with the following content: {code:java} [ { "broker_name": "A1", "server_name": "S1", "order": 1, "storages": [ { "timestamp": "2018-10-29 23:11:44.000", "storage": 12.5 } ] } ]{code} Process data as {code:java} Dataset<Entity> ds = spark.read().option("multiline", "true").schema(valid).json("/path/to/file") .as(Encoders.bean(Entity.class)); ds .groupByKey((MapFunction<Entity, String>) o -> o.getBroker_name(), Encoders.STRING()) .reduceGroups((ReduceFunction<Entity>)(e1, e2) -> e1) .map((MapFunction<Tuple2<String, Entity>, Entity>) tuple -> tuple._2, Encoders.bean(Entity.class)) .show(10, false);{code} The result will be: {code:java} +-----------+-----+-----------+--------------------------------------------------------+ |broker_name|order|server_name|storages | +-----------+-----+-----------+--------------------------------------------------------+ |A1 |1 |S1 |[[7.612815958429577E-309, 148474-03-19 22:14:3232.5248]]| +-----------+-----+-----------+--------------------------------------------------------+ {code} Source https://stackoverflow.com/q/54987724 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org