> meaning that double and integer
I meant to write: "meaning that double and bigint ... "
:)

On Tue, Nov 15, 2022 at 8:54 AM Andrew Otto <o...@wikimedia.org> wrote:

> > Also thanks for showing me your pattern with the SchemaConversions and
> stuff. Feels pretty clean and worked like a charm :)
> Glad to hear it, that is very cool!
>
> > converts number to double always. I wonder, did you make this up?
> Yes, we chose the the mapping.  We chose to do number -> double and
> integer -> bigint because both of those are wider than their float/int
> counterparts, meaning that double and integer will work in more cases.  Of
> course, this is not an optimal usage of bits, but at least things won't
> break.
>
> > all kinds of fields like double, float, big decimal… they all get
> mapped to number by my converter
> It is possible to make some non-JSONSchema convention in the JSONSchema to
> map to more specific types.  This is done for example with format:
> date-time in our code, to map from a ISO-8601 string to a timestamp.  I
> just did a quick google to find some example of someone else already doing
> this and found this doc from IBM
> <https://www.ibm.com/docs/en/cics-ts/5.3?topic=mapping-json-schema-c-c> saying
> they use JSONSchema's format to specify a float, like
>
>   type: number
>   format: float
>
> This seems like a pretty good idea to me, and we should probably do this
> at WMF too!  However, it would be a custom convention, and not in the
> JSONSchema spec itself, so when you convert back to a JSONSchema, you'd
> have to codify this convention to do so (and nothing outside of your code
> would really respect it).
>
>
>
>
>
>
> On Tue, Nov 15, 2022 at 4:23 AM Theodor Wübker <theo.wueb...@inside-m2m.de>
> wrote:
>
>> Yes, you are right. Schemas are not so nice in Json. When implementing
>> and testing my converter from DataType to JsonSchema I noticed that your
>> converter from JsonSchema to DataType converts number to double always. I
>> wonder, did you make this up? Because the table that specifies the
>> mapping
>> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/>
>>  only
>> does it for DataType -> JsonSchema.
>>
>> Its generally unfortunate that json schema only offers so little
>> possibility to specify type information… now when I have a Flink DataType
>> with all kinds of fields like double, float, big decimal… they all get
>> mapped to number by my converter - in return when I use yours they are all
>> mapped to a Flink Datatype double again. So I lose a lot of precision.
>>
>> I guess for my application it would in general be better to use Avro or
>> Protobuf, since they retain a lot more type information when you convert
>> them back and forth…
>> Also thanks for showing me your pattern with the SchemaConversions and
>> stuff. Feels pretty clean and worked like a charm :)
>>
>> -Theo
>>
>>
>> On 10. Nov 2022, at 15:02, Andrew Otto <o...@wikimedia.org> wrote:
>>
>> >  I find it interesting that the Mapping from DataType to AvroSchema
>> does exist in Flink (see AvroSchemaConverter), but for all the other
>> formats there is no such Mapping,
>> Yah, but I guess for JSON, there isn't a clear 'schema' to be had.  There
>> of course is JSONSchema, but it isn't a real java-y type system; it's just
>> more JSON for which there exist validators.
>>
>>
>>
>> On Thu, Nov 10, 2022 at 2:12 AM Theodor Wübker <
>> theo.wueb...@inside-m2m.de> wrote:
>>
>>> Great, I will have a closer look at what you sent. Your idea seems very
>>> good, it would be a very clean solution to be able to plug in different
>>> SchemaConversions that a (Row) DataType can be mapped to. I will probably
>>> try to implement it like this. I find it interesting that the Mapping from
>>> DataType to AvroSchema does exist in Flink (see AvroSchemaConverter), but
>>> for all the other formats there is no such Mapping. Maybe this would be
>>> something that would interest more people, so I when I am finished perhaps
>>> I can suggest putting the solution into the flink-json and flink-protobuf
>>> packages.
>>>
>>> -Theo
>>>
>>> On 9. Nov 2022, at 21:24, Andrew Otto <o...@wikimedia.org> wrote:
>>>
>>> Interesting, yeah I think you'll have to implement code to recurse
>>> through the (Row) DataType and somehow auto generate the JSONSchema you
>>> want.
>>>
>>> We abstracted the conversions from JSONSchema to other type systems in
>>> this JsonSchemaConverter
>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/JsonSchemaConverter.java>.
>>> There's nothing special going on here, I've seen versions of this schema
>>> conversion code over and over again in different frameworks. This one just
>>> allows us to plug in a SchemaConversions
>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/types/SchemaConversions.java>
>>>  implementation
>>> to provide the mappings to the output type system (like the Flink DataType
>>> mappings
>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java>
>>>  I
>>> linked to before), rather than hardcoding the output types.
>>>
>>> If I were trying to do what you are doing (in our codebase)...I'd create
>>> a Flink DataTypeConverter<T> that iterated through a (Row) DataType and a
>>> SchemaConversions<JsonNode> implementation that mapped to the JsonNode that
>>> represented the JSONSchema.  (If not using Jackson...then you could use
>>> another Java JSON object than JsonNode).
>>> You could also make a SchemaConversions<ProtobufSchema> (with whatever
>>> Protobuf class to use...I'm not familiar with Protobuf) and then use the
>>> same DataTypeConverter to convert to ProtobufSchema.   AND THEN...I'd
>>> wonder if the input schema recursion code itself could be abstracted too so
>>> that it would work for either JsonSchema OR DataType OR whatever but anyway
>>> that is probably too crazy and too much for what you are doing...but it
>>> would be cool! :p
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Nov 9, 2022 at 9:52 AM Theodor Wübker <
>>> theo.wueb...@inside-m2m.de> wrote:
>>>
>>>> I want to register the result-schema in a schema registry, as I am
>>>> pushing the result-data to a Kafka topic. The result-schema is not known at
>>>> compile-time, so I need to find a way to compute it at runtime from the
>>>> resulting Flink Schema.
>>>>
>>>> -Theo
>>>>
>>>> (resent - again sorry, I forgot to add the others in the cc)
>>>>
>>>> On 9. Nov 2022, at 14:59, Andrew Otto <o...@wikimedia.org> wrote:
>>>>
>>>> >  I want to convert the schema of a Flink table to both Protobuf
>>>> *schema* and JSON *schema*
>>>> Oh, you want to convert from Flink Schema TO JSONSchema?  Interesting.
>>>> That would indeed be something that is not usually done.  Just curious, why
>>>> do you want to do this?
>>>>
>>>> On Wed, Nov 9, 2022 at 8:46 AM Andrew Otto <o...@wikimedia.org> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> I see you are talking about JSONSchema, not just JSON itself.
>>>>>
>>>>> We're trying to do a similar thing at Wikimedia and have developed
>>>>> some tooling around this.
>>>>>
>>>>> JsonSchemaFlinkConverter
>>>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/JsonSchemaFlinkConverter.java>
>>>>> has some logic to convert from JSONSchema Jackson ObjectNodes to Flink
>>>>> Table DataType or Table SchemaBuilder, or Flink DataStream
>>>>> TypeInformation[Row].  Some of the conversions from JSONSchema to Flink
>>>>> type are opinionated.  You can see the mappings here
>>>>> <https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json/DataTypeSchemaConversions.java>
>>>>> .
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 9, 2022 at 2:33 AM Theodor Wübker <
>>>>> theo.wueb...@inside-m2m.de> wrote:
>>>>>
>>>>>> Thanks for your reply Yaroslav! The way I do it with Avro seems
>>>>>> similar to what you pointed out:
>>>>>>
>>>>>> ResolvedSchema resultSchema = resultTable.getResolvedSchema();
>>>>>> DataType type = resultSchema.toSinkRowDataType();
>>>>>> org.apache.avro.Schema converted = 
>>>>>> AvroSchemaConverter.convertToSchema(type.getLogicalType());
>>>>>>
>>>>>> I mentioned the ResolvedSchema because it is my starting point after
>>>>>> the SQL operation. It seemed to me that I can not retrieve something that
>>>>>> contains more schema information from the table so I got myself this. 
>>>>>> About
>>>>>> your other answers: It seems the classes you mentioned can be used to
>>>>>> serialize actual Data? However this is not quite what I want to do.
>>>>>> Essentially I want to convert the schema of a Flink table to both
>>>>>> Protobuf *schema* and JSON *schema* (for Avro as you can see I have
>>>>>> it already). It seems odd that this is not easily possible, because
>>>>>> converting from a JSON schema to a Schema of Flink is possible using the
>>>>>> JsonRowSchemaConverter. However the other way is not implemented it 
>>>>>> seems.
>>>>>> This is how I got a Table Schema (that I can use in a table descriptor)
>>>>>> from a JSON schema:
>>>>>>
>>>>>> TypeInformation<Row> type = JsonRowSchemaConverter.convert(json);
>>>>>> DataType row = TableSchema.fromTypeInfo(type).toPhysicalRowDataType();
>>>>>> Schema schema = Schema.newBuilder().fromRowDataType(row).build();
>>>>>>
>>>>>> Sidenote: I use deprecated methods here, so if there is a better
>>>>>> approach please let me know! But it shows that in Flink its easily 
>>>>>> possible
>>>>>> to create a Schema for a TableDescriptor from a JSON Schema - the other 
>>>>>> way
>>>>>> is just not so trivial it seems. And for Protobuf so far I don’t have any
>>>>>> solutions, not even creating a Flink Schema from a Protobuf Schema - not 
>>>>>> to
>>>>>> mention the other way around.
>>>>>>
>>>>>> -Theo
>>>>>>
>>>>>> (resent because I accidentally only responded to you, not the Mailing
>>>>>> list - sorry)
>>>>>>
>>>>>>
>>>>
>>>
>>

Reply via email to