> meaning that double and integer I meant to write: "meaning that double and bigint ... " :)
On Tue, Nov 15, 2022 at 8:54 AM Andrew Otto <o...@wikimedia.org> wrote: > > Also thanks for showing me your pattern with the SchemaConversions and > stuff. Feels pretty clean and worked like a charm :) > Glad to hear it, that is very cool! > > > converts number to double always. I wonder, did you make this up? > Yes, we chose the the mapping. We chose to do number -> double and > integer -> bigint because both of those are wider than their float/int > counterparts, meaning that double and integer will work in more cases. Of > course, this is not an optimal usage of bits, but at least things won't > break. > > > all kinds of fields like double, float, big decimal… they all get > mapped to number by my converter > It is possible to make some non-JSONSchema convention in the JSONSchema to > map to more specific types. This is done for example with format: > date-time in our code, to map from a ISO-8601 string to a timestamp. I > just did a quick google to find some example of someone else already doing > this and found this doc from IBM > <https://www.ibm.com/docs/en/cics-ts/5.3?topic=mapping-json-schema-c-c> saying > they use JSONSchema's format to specify a float, like > > type: number > format: float > > This seems like a pretty good idea to me, and we should probably do this > at WMF too! However, it would be a custom convention, and not in the > JSONSchema spec itself, so when you convert back to a JSONSchema, you'd > have to codify this convention to do so (and nothing outside of your code > would really respect it). > > > > > > > On Tue, Nov 15, 2022 at 4:23 AM Theodor Wübker <theo.wueb...@inside-m2m.de> > wrote: > >> Yes, you are right. Schemas are not so nice in Json. When implementing >> and testing my converter from DataType to JsonSchema I noticed that your >> converter from JsonSchema to DataType converts number to double always. I >> wonder, did you make this up? Because the table that specifies the >> mapping >> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/> >> only >> does it for DataType -> JsonSchema. >> >> Its generally unfortunate that json schema only offers so little >> possibility to specify type information… now when I have a Flink DataType >> with all kinds of fields like double, float, big decimal… they all get >> mapped to number by my converter - in return when I use yours they are all >> mapped to a Flink Datatype double again. So I lose a lot of precision. >> >> I guess for my application it would in general be better to use Avro or >> Protobuf, since they retain a lot more type information when you convert >> them back and forth… >> Also thanks for showing me your pattern with the SchemaConversions and >> stuff. Feels pretty clean and worked like a charm :) >> >> -Theo >> >> >> On 10. Nov 2022, at 15:02, Andrew Otto <o...@wikimedia.org> wrote: >> >> > I find it interesting that the Mapping from DataType to AvroSchema >> does exist in Flink (see AvroSchemaConverter), but for all the other >> formats there is no such Mapping, >> Yah, but I guess for JSON, there isn't a clear 'schema' to be had. There >> of course is JSONSchema, but it isn't a real java-y type system; it's just >> more JSON for which there exist validators. >> >> >> >> On Thu, Nov 10, 2022 at 2:12 AM Theodor Wübker < >> theo.wueb...@inside-m2m.de> wrote: >> >>> Great, I will have a closer look at what you sent. Your idea seems very >>> good, it would be a very clean solution to be able to plug in different >>> SchemaConversions that a (Row) DataType can be mapped to. I will probably >>> try to implement it like this. I find it interesting that the Mapping from >>> DataType to AvroSchema does exist in Flink (see AvroSchemaConverter), but >>> for all the other formats there is no such Mapping. Maybe this would be >>> something that would interest more people, so I when I am finished perhaps >>> I can suggest putting the solution into the flink-json and flink-protobuf >>> packages. >>> >>> -Theo >>> >>> On 9. Nov 2022, at 21:24, Andrew Otto <o...@wikimedia.org> wrote: >>> >>> Interesting, yeah I think you'll have to implement code to recurse >>> through the (Row) DataType and somehow auto generate the JSONSchema you >>> want. >>> >>> We abstracted the conversions from JSONSchema to other type systems in >>> this JsonSchemaConverter >>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/JsonSchemaConverter.java>. >>> There's nothing special going on here, I've seen versions of this schema >>> conversion code over and over again in different frameworks. This one just >>> allows us to plug in a SchemaConversions >>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/SchemaConversions.java> >>> implementation >>> to provide the mappings to the output type system (like the Flink DataType >>> mappings >>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java> >>> I >>> linked to before), rather than hardcoding the output types. >>> >>> If I were trying to do what you are doing (in our codebase)...I'd create >>> a Flink DataTypeConverter<T> that iterated through a (Row) DataType and a >>> SchemaConversions<JsonNode> implementation that mapped to the JsonNode that >>> represented the JSONSchema. (If not using Jackson...then you could use >>> another Java JSON object than JsonNode). >>> You could also make a SchemaConversions<ProtobufSchema> (with whatever >>> Protobuf class to use...I'm not familiar with Protobuf) and then use the >>> same DataTypeConverter to convert to ProtobufSchema. AND THEN...I'd >>> wonder if the input schema recursion code itself could be abstracted too so >>> that it would work for either JsonSchema OR DataType OR whatever but anyway >>> that is probably too crazy and too much for what you are doing...but it >>> would be cool! :p >>> >>> >>> >>> >>> >>> On Wed, Nov 9, 2022 at 9:52 AM Theodor Wübker < >>> theo.wueb...@inside-m2m.de> wrote: >>> >>>> I want to register the result-schema in a schema registry, as I am >>>> pushing the result-data to a Kafka topic. The result-schema is not known at >>>> compile-time, so I need to find a way to compute it at runtime from the >>>> resulting Flink Schema. >>>> >>>> -Theo >>>> >>>> (resent - again sorry, I forgot to add the others in the cc) >>>> >>>> On 9. Nov 2022, at 14:59, Andrew Otto <o...@wikimedia.org> wrote: >>>> >>>> > I want to convert the schema of a Flink table to both Protobuf >>>> *schema* and JSON *schema* >>>> Oh, you want to convert from Flink Schema TO JSONSchema? Interesting. >>>> That would indeed be something that is not usually done. Just curious, why >>>> do you want to do this? >>>> >>>> On Wed, Nov 9, 2022 at 8:46 AM Andrew Otto <o...@wikimedia.org> wrote: >>>> >>>>> Hello! >>>>> >>>>> I see you are talking about JSONSchema, not just JSON itself. >>>>> >>>>> We're trying to do a similar thing at Wikimedia and have developed >>>>> some tooling around this. >>>>> >>>>> JsonSchemaFlinkConverter >>>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/JsonSchemaFlinkConverter.java> >>>>> has some logic to convert from JSONSchema Jackson ObjectNodes to Flink >>>>> Table DataType or Table SchemaBuilder, or Flink DataStream >>>>> TypeInformation[Row]. Some of the conversions from JSONSchema to Flink >>>>> type are opinionated. You can see the mappings here >>>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java> >>>>> . >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 9, 2022 at 2:33 AM Theodor Wübker < >>>>> theo.wueb...@inside-m2m.de> wrote: >>>>> >>>>>> Thanks for your reply Yaroslav! The way I do it with Avro seems >>>>>> similar to what you pointed out: >>>>>> >>>>>> ResolvedSchema resultSchema = resultTable.getResolvedSchema(); >>>>>> DataType type = resultSchema.toSinkRowDataType(); >>>>>> org.apache.avro.Schema converted = >>>>>> AvroSchemaConverter.convertToSchema(type.getLogicalType()); >>>>>> >>>>>> I mentioned the ResolvedSchema because it is my starting point after >>>>>> the SQL operation. It seemed to me that I can not retrieve something that >>>>>> contains more schema information from the table so I got myself this. >>>>>> About >>>>>> your other answers: It seems the classes you mentioned can be used to >>>>>> serialize actual Data? However this is not quite what I want to do. >>>>>> Essentially I want to convert the schema of a Flink table to both >>>>>> Protobuf *schema* and JSON *schema* (for Avro as you can see I have >>>>>> it already). It seems odd that this is not easily possible, because >>>>>> converting from a JSON schema to a Schema of Flink is possible using the >>>>>> JsonRowSchemaConverter. However the other way is not implemented it >>>>>> seems. >>>>>> This is how I got a Table Schema (that I can use in a table descriptor) >>>>>> from a JSON schema: >>>>>> >>>>>> TypeInformation<Row> type = JsonRowSchemaConverter.convert(json); >>>>>> DataType row = TableSchema.fromTypeInfo(type).toPhysicalRowDataType(); >>>>>> Schema schema = Schema.newBuilder().fromRowDataType(row).build(); >>>>>> >>>>>> Sidenote: I use deprecated methods here, so if there is a better >>>>>> approach please let me know! But it shows that in Flink its easily >>>>>> possible >>>>>> to create a Schema for a TableDescriptor from a JSON Schema - the other >>>>>> way >>>>>> is just not so trivial it seems. And for Protobuf so far I don’t have any >>>>>> solutions, not even creating a Flink Schema from a Protobuf Schema - not >>>>>> to >>>>>> mention the other way around. >>>>>> >>>>>> -Theo >>>>>> >>>>>> (resent because I accidentally only responded to you, not the Mailing >>>>>> list - sorry) >>>>>> >>>>>> >>>> >>> >>