Of course! I think some BeamSQL folks should be involved as well, as this
directly affects SQL work. Anton especially has expressed interest in Row
and schemas.
Reuven
On Mon, Mar 5, 2018 at 4:30 AM Jean-Baptiste Onofré wrote:
> Cool,
>
> can I work with you on this
Cool,
can I work with you on this (sharing a branch for instance) ?
Thanks !
Regards
JB
On 03/05/2018 01:01 PM, Reuven Lax wrote:
> Yes, I do have a PoC in progress. The Beam Row class was being refactored, so
> I
> paused to wait for that to finish.
>
>
> On Sun, Mar 4, 2018 at 8:24 PM
Yes, I do have a PoC in progress. The Beam Row class was being refactored,
so I paused to wait for that to finish.
On Sun, Mar 4, 2018 at 8:24 PM Jean-Baptiste Onofré wrote:
> Hi Reuven,
>
> I revive this discussion as I think it would be a great addition.
>
> We had
Hi Reuven,
I revive this discussion as I think it would be a great addition.
We had discussion on the fly, but I think now, as base for discussion, it would
be great to have a feature branch where we can start some sketch/impl and
discuss.
@Reuven, did you start a PoC with what you proposed:
-
On Mon, Feb 5, 2018 at 9:06 PM, Kenneth Knowles wrote:
> Joining late, but very interested. Commented on the doc. Since there's a
> forked discussion between doc and thread, I want to say this on the thread:
>
> 1. I have used JSON schema in production for describing the
I would add a use case: single serialization mecanism accross a pipeline.
JSON allows to handle generic records (JsonObject) as well as POJO
serialization and both are compatible. Compared to avro built-in mecanism,
it is not intrusive in the models which is a key feature of an API. It also
Joining late, but very interested. Commented on the doc. Since there's a
forked discussion between doc and thread, I want to say this on the thread:
1. I have used JSON schema in production for describing the structure of
analytics events and it is OK but not great. If you are sure your data is
None, Json-p - the spec so no strong impl requires - as record API and a
custom light wrapping for schema - like
https://github.com/Talend/component-runtime/blob/master/component-form/component-form-model/src/main/java/org/talend/sdk/component/form/model/jsonschema/JsonSchema.java
(note this code
Which json library are you thinking of? At least in Java, there's always
been a problem of no good standard Json library.
On Mon, Feb 5, 2018 at 12:03 PM, Romain Manni-Bucau
wrote:
>
>
> Le 5 févr. 2018 19:54, "Reuven Lax" a écrit :
>
> multiplying by
Le 5 févr. 2018 19:54, "Reuven Lax" a écrit :
multiplying by 1.0 doesn't really solve the right problems. The number type
used by Javascript (and by extension, they standard for json) only has 53
bits of precision. I've seen many, many bugs caused because of this - the
input
multiplying by 1.0 doesn't really solve the right problems. The number type
used by Javascript (and by extension, they standard for json) only has 53
bits of precision. I've seen many, many bugs caused because of this - the
input data may easily contain numbers too large for 53 bits.
In addition,
Im off tonight but can we try to do it next week (tomorrow)? If not please
answer to this thread with outcomes and Ill catch up tmr morning.
Le 4 févr. 2018 20:23, "Reuven Lax" a écrit :
Cool, let's chat about this on slack for a bit (which I realized I've been
signed out of
Cool, let's chat about this on slack for a bit (which I realized I've been
signed out of for some time).
Reuven
On Sun, Feb 4, 2018 at 9:21 AM, Jean-Baptiste Onofré
wrote:
> Sorry guys, I was off today. Happy to be part of the party too ;)
>
> Regards
> JB
>
> On 02/04/2018
Sorry guys, I was off today. Happy to be part of the party too ;)
Regards
JB
On 02/04/2018 06:19 PM, Reuven Lax wrote:
> Romain, since you're interested maybe the two of us should put together a
> proposal for how to set this things (hints, schema) on PCollections? I don't
> think it'll be hard
Romain, since you're interested maybe the two of us should put together a
proposal for how to set this things (hints, schema) on PCollections? I
don't think it'll be hard - the previous list thread on hints already
agreed on a general approach, and we would just need to flesh it out.
BTW in the
2018-02-04 17:53 GMT+01:00 Reuven Lax :
>
>
> On Sun, Feb 4, 2018 at 8:42 AM, Romain Manni-Bucau
> wrote:
>
>>
>> 2018-02-04 17:37 GMT+01:00 Reuven Lax :
>>
>>> I'm not sure where proto comes from here. Proto is one example of a type
>>>
On Sun, Feb 4, 2018 at 8:42 AM, Romain Manni-Bucau
wrote:
>
> 2018-02-04 17:37 GMT+01:00 Reuven Lax :
>
>> I'm not sure where proto comes from here. Proto is one example of a type
>> that has a schema, but only one example.
>>
>> 1. In the initial
2018-02-04 17:37 GMT+01:00 Reuven Lax :
> I'm not sure where proto comes from here. Proto is one example of a type
> that has a schema, but only one example.
>
> 1. In the initial prototype I want to avoid modifying the PCollection API.
> So I think it's best to create a special
I'm not sure where proto comes from here. Proto is one example of a type
that has a schema, but only one example.
1. In the initial prototype I want to avoid modifying the PCollection API.
So I think it's best to create a special SchemaCoder, and pass the schema
into this coder. Later we might
@Reuven: is the proto only about passing schema or also the generic type?
There are 2.5 topics to solve this issue:
1. How to pass schema
1.a. hints?
2. What is the generic record type associated to a schema and how to
express a schema relatively to it
I would be happy to help on 1.a and 2
One more thing. If anyone here has experience with various OSS metadata
stores (e.g. Kafka Schema Registry is one example), would you like to
collaborate on implementation? I want to make sure that source schemas can
be stored in a variety of OSS metadata stores, and be easily pulled into a
Beam
Hi all,
If there are no concerns, I would like to start working on a prototype.
It's just a prototype, so I don't think it will have the final API (e.g.
for the prototype I'm going to avoid change the API of PCollection, and use
a "special" Coder instead). Also even once we go beyond prototype,
If you need help on the json part I'm happy to help. To give a few hints on
what is very doable: we can add an avro module to johnzon (asf json{p,b}
impl) to back jsonp by avro (guess it will be one of the first to be asked)
for instance.
Romain Manni-Bucau
@rmannibucau
Hmm, it is a hint semantically or it is deducable from the transform. Doing
the union of both you cover all cases. Then how it is forwarded from the
transform to the runtime is in runner API not the user (pipeline) API so
I'm not sure I see the case you reference where it has a semantic API. Can
I don't think "hint" is the right API, as schema is not a hint (it has
semantic meaning). However I think the API for schema should look similar
to any "hint" API.
On Wed, Jan 31, 2018 at 11:40 AM, Romain Manni-Bucau
wrote:
>
>
> Le 31 janv. 2018 20:16, "Reuven Lax"
Le 31 janv. 2018 20:16, "Reuven Lax" a écrit :
As to the question of how a schema should be specified, I want to support
several common schema formats. So if a user has a Json schema, or an Avro
schema, or a Calcite schema, etc. there should be adapters that allow
setting a
As to the question of how a schema should be specified, I want to support
several common schema formats. So if a user has a Json schema, or an Avro
schema, or a Calcite schema, etc. there should be adapters that allow
setting a schema from any of them. I don't think we should prefer one over
the
Hi,
I think we should avoid to mix two things in the discussion (and so the
document):
1. The element of the collection and the schema itself are two different things.
By essence, Beam should not enforce any schema. That's why I think it's a good
idea to set the schema optionally on the
Le 30 janv. 2018 01:09, "Reuven Lax" a écrit :
On Mon, Jan 29, 2018 at 12:17 PM, Romain Manni-Bucau
wrote:
> Hi
>
> I have some questions on this: how hierarchic schemas would work? Seems it
> is not really supported by the ecosystem (out of custom
Hi
I have some questions on this: how hierarchic schemas would work? Seems it
is not really supported by the ecosystem (out of custom stuff) :(. How
would it integrate smoothly with other generic record types - N bridges?
Concretely I wonder if using json API couldnt be beneficial: json-p is a
Hi Reuven,
Thanks for the update ! As I'm working with you on this, I fully agree and great
doc gathering the ideas.
It's clearly something we have to add asap in Beam, because it would allow new
use cases for our users (in a simple way) and open new areas for the runners
(for instance dataframe
31 matches
Mail list logo