[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2017-05-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-12-01 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r90505208
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/AvroDeserializationSchemaTest.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import 
org.apache.flink.streaming.util.serialization.AvroRowSerializationSchema;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.toTypeInfo;
+import static org.junit.Assert.assertEquals;
+
+public class AvroDeserializationSchemaTest {
+
+   private static final String[] FIELD_NAMES = new String[]{"f1", "f2", 
"f3"};
+   private static final TypeInformation[] FIELD_TYPES = toTypeInfo(new 
Class[]{Integer.class, Boolean.class, String.class});
+
+   private AvroRowSerializationSchema serializationSchema =  new 
AvroRowSerializationSchema(
+   FIELD_NAMES, FIELD_TYPES
+   );
+   private AvroRowDeserializationSchema deserializationSchema = new 
AvroRowDeserializationSchema(
+   FIELD_NAMES, FIELD_TYPES
+   );
+
+   @Test
+   public void serializeAndDeserializeRow() throws IOException {
+   Row row = createRow();
+
+   byte[] bytes = serializationSchema.serialize(row);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   @Test
+   public void serializeRowSeveralTimes() throws IOException {
+   Row row = createRow();
+
+   serializationSchema.serialize(row);
+   serializationSchema.serialize(row);
+   byte[] bytes = serializationSchema.serialize(row);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   @Test
+   public void deserializeRowSeveralTimes() throws IOException {
+   Row row = createRow();
+
+   byte[] bytes = serializationSchema.serialize(row);
+   deserializationSchema.deserialize(bytes);
+   deserializationSchema.deserialize(bytes);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   private Row createRow() {
+   Row row = new Row(3);
--- End diff --

Makes sense. I'll add these tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-12-01 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r90504858
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaAvroTableSource.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.sources.StreamTableSource;
+import org.apache.flink.streaming.connectors.kafka.internals.TypeUtil;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import org.apache.flink.streaming.util.serialization.DeserializationSchema;
+
+import java.util.Properties;
+
+/**
+ * A version-agnostic Kafka Avro {@link StreamTableSource}.
+ *
+ * The version-specific Kafka consumers need to extend this class and
+ * override {@link #getKafkaConsumer(String, Properties, 
DeserializationSchema)}}.
+ *
+ * The field names are used to parse the Avro file and so are the types.
+ */
+public abstract class KafkaAvroTableSource extends KafkaTableSource {
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   Class[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   TypeInformation[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   TypeInformation[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, fieldTypes);
--- End diff --

Sorry, accidentally typed it.
Will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-12-01 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r90504943
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/AvroRowDeserializationSchema.java
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.util.serialization;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.Decoder;
+import org.apache.avro.io.DecoderFactory;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.createRowAvroSchema;
+
+/**
+ * Deserialization schema from Avro to {@link Row}.
+ *
+ * Deserializes the byte[] messages in Avro format and 
reads
+ * the specified fields.
+ *
+ * Failure during deserialization are forwarded as wrapped IOExceptions.
+ */
+public class AvroRowDeserializationSchema extends 
AbstractDeserializationSchema {
+
+   /** Field names in a row */
+   private final String[] fieldNames;
+   /** Types to parse fields as. Indices match fieldNames indices. */
+   private final TypeInformation[] fieldTypes;
+   /** Avro deserialization schema */
+   private final Schema schema;
+   /** Reader that deserializes byte array into a record */
+   private final DatumReader datumReader;
+   /** Record to deserialize byte array to */
+   private final GenericRecord record;
+
+   /**
+* Creates a Avro deserializtion schema for the given type classes.
--- End diff --

Good catch. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-12-01 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r90504878
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaAvroTableSource.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.sources.StreamTableSource;
+import org.apache.flink.streaming.connectors.kafka.internals.TypeUtil;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import org.apache.flink.streaming.util.serialization.DeserializationSchema;
+
+import java.util.Properties;
+
+/**
+ * A version-agnostic Kafka Avro {@link StreamTableSource}.
+ *
+ * The version-specific Kafka consumers need to extend this class and
+ * override {@link #getKafkaConsumer(String, Properties, 
DeserializationSchema)}}.
+ *
+ * The field names are used to parse the Avro file and so are the types.
+ */
+public abstract class KafkaAvroTableSource extends KafkaTableSource {
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   Class[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   TypeInformation[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   TypeInformation[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   Class[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, TypeUtil.toTypeInfo(fieldTypes));
--- End diff --

Ditto. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-12-01 Thread mushketyk
Github user mushketyk commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r90504916
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/AvroRowSerializationSchema.java
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.util.serialization;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumWriter;
+import org.apache.avro.io.Encoder;
+import org.apache.avro.io.EncoderFactory;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.createRowAvroSchema;
+
+/**
+ * Serialization schema that serializes an object into a Avro bytes.
+ * 
+ */
+public class AvroRowSerializationSchema implements 
SerializationSchema {
+
+   /** Field names in a Row */
+   private final String[] fieldNames;
+   /** Avro serialization schema */
+   private final Schema schema;
+   /** Writer to serialize Avro GeneralRecord into a byte array */
+   private final DatumWriter datumWriter;
+   /** Output stream to serialize records into byte array */
+   private final ByteArrayOutputStream arrayOutputStream =  new 
ByteArrayOutputStream();
+   /** Low level class for serialization of Avro values */
+   private final Encoder encoder = 
EncoderFactory.get().directBinaryEncoder(arrayOutputStream, null);
--- End diff --

Ok, thank you for suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88463790
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/AvroRowDeserializationSchema.java
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.util.serialization;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.Decoder;
+import org.apache.avro.io.DecoderFactory;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.createRowAvroSchema;
+
+/**
+ * Deserialization schema from Avro to {@link Row}.
+ *
+ * Deserializes the byte[] messages in Avro format and 
reads
+ * the specified fields.
+ *
+ * Failure during deserialization are forwarded as wrapped IOExceptions.
+ */
+public class AvroRowDeserializationSchema extends 
AbstractDeserializationSchema {
+
+   /** Field names in a row */
+   private final String[] fieldNames;
+   /** Types to parse fields as. Indices match fieldNames indices. */
+   private final TypeInformation[] fieldTypes;
+   /** Avro deserialization schema */
+   private final Schema schema;
+   /** Reader that deserializes byte array into a record */
+   private final DatumReader datumReader;
+   /** Record to deserialize byte array to */
+   private final GenericRecord record;
+
+   /**
+* Creates a Avro deserializtion schema for the given type classes.
+*
+* @param fieldNames
+* @param fieldTypes Type classes to parse Avro fields as.
+*/
+   public AvroRowDeserializationSchema(String[] fieldNames, 
TypeInformation[] fieldTypes) {
+   this.schema = createRowAvroSchema(fieldNames, fieldTypes);
+   this.fieldNames = fieldNames;
+   this.fieldTypes = fieldTypes;
+   this.datumReader = new GenericDatumReader<>(schema);
+   this.record = new GenericData.Record(schema);
+   }
+
+   @Override
+   public Row deserialize(byte[] message) throws IOException {
+   readRecord(message);
+   return convertRecordToRow();
+   }
+
+   private void readRecord(byte[] message) throws IOException {
+   ByteArrayInputStream arrayInputStream =  new 
ByteArrayInputStream(message);
--- End diff --

creating a new `ByteArrayInputStream` and `Decoder` for each record is 
quite expensive. Can we reuse them as you did in the serializer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88464767
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/AvroDeserializationSchemaTest.java
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import 
org.apache.flink.streaming.util.serialization.AvroRowSerializationSchema;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.toTypeInfo;
+import static org.junit.Assert.assertEquals;
+
+public class AvroDeserializationSchemaTest {
+
+   private static final String[] FIELD_NAMES = new String[]{"f1", "f2", 
"f3"};
+   private static final TypeInformation[] FIELD_TYPES = toTypeInfo(new 
Class[]{Integer.class, Boolean.class, String.class});
+
+   private AvroRowSerializationSchema serializationSchema =  new 
AvroRowSerializationSchema(
+   FIELD_NAMES, FIELD_TYPES
+   );
+   private AvroRowDeserializationSchema deserializationSchema = new 
AvroRowDeserializationSchema(
+   FIELD_NAMES, FIELD_TYPES
+   );
+
+   @Test
+   public void serializeAndDeserializeRow() throws IOException {
+   Row row = createRow();
+
+   byte[] bytes = serializationSchema.serialize(row);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   @Test
+   public void serializeRowSeveralTimes() throws IOException {
+   Row row = createRow();
+
+   serializationSchema.serialize(row);
+   serializationSchema.serialize(row);
+   byte[] bytes = serializationSchema.serialize(row);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   @Test
+   public void deserializeRowSeveralTimes() throws IOException {
+   Row row = createRow();
+
+   byte[] bytes = serializationSchema.serialize(row);
+   deserializationSchema.deserialize(bytes);
+   deserializationSchema.deserialize(bytes);
+   Row resultRow = deserializationSchema.deserialize(bytes);
+
+   assertEqualsRows(row, resultRow);
+   }
+
+   private Row createRow() {
+   Row row = new Row(3);
--- End diff --

A bit more test data would be good: 
- rows with `null` values
- more complex types like `DateTime`, `BigInteger`, `BigDecimal`
- custom POJOs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88451582
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaAvroTableSource.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.sources.StreamTableSource;
+import org.apache.flink.streaming.connectors.kafka.internals.TypeUtil;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import org.apache.flink.streaming.util.serialization.DeserializationSchema;
+
+import java.util.Properties;
+
+/**
+ * A version-agnostic Kafka Avro {@link StreamTableSource}.
+ *
+ * The version-specific Kafka consumers need to extend this class and
+ * override {@link #getKafkaConsumer(String, Properties, 
DeserializationSchema)}}.
+ *
+ * The field names are used to parse the Avro file and so are the types.
+ */
+public abstract class KafkaAvroTableSource extends KafkaTableSource {
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   Class[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   TypeInformation[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   TypeInformation[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   Class[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, TypeUtil.toTypeInfo(fieldTypes));
--- End diff --

why are the field names set to `new String[]{"f1", "f2", "f3"}`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88461076
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/AvroRowSerializationSchema.java
 ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.util.serialization;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumWriter;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumWriter;
+import org.apache.avro.io.Encoder;
+import org.apache.avro.io.EncoderFactory;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.createRowAvroSchema;
+
+/**
+ * Serialization schema that serializes an object into a Avro bytes.
+ * 
+ */
+public class AvroRowSerializationSchema implements 
SerializationSchema {
+
+   /** Field names in a Row */
+   private final String[] fieldNames;
+   /** Avro serialization schema */
+   private final Schema schema;
+   /** Writer to serialize Avro GeneralRecord into a byte array */
+   private final DatumWriter datumWriter;
+   /** Output stream to serialize records into byte array */
+   private final ByteArrayOutputStream arrayOutputStream =  new 
ByteArrayOutputStream();
+   /** Low level class for serialization of Avro values */
+   private final Encoder encoder = 
EncoderFactory.get().directBinaryEncoder(arrayOutputStream, null);
--- End diff --

use `binaryEncoder` instead of `directBinaryEncoder` to get a buffering 
encoder


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88451554
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaAvroTableSource.java
 ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.kafka;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.sources.StreamTableSource;
+import org.apache.flink.streaming.connectors.kafka.internals.TypeUtil;
+import 
org.apache.flink.streaming.util.serialization.AvroRowDeserializationSchema;
+import org.apache.flink.streaming.util.serialization.DeserializationSchema;
+
+import java.util.Properties;
+
+/**
+ * A version-agnostic Kafka Avro {@link StreamTableSource}.
+ *
+ * The version-specific Kafka consumers need to extend this class and
+ * override {@link #getKafkaConsumer(String, Properties, 
DeserializationSchema)}}.
+ *
+ * The field names are used to parse the Avro file and so are the types.
+ */
+public abstract class KafkaAvroTableSource extends KafkaTableSource {
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   Class[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   /**
+* Creates a generic Kafka Avro {@link StreamTableSource}.
+*
+* @param topic  Kafka topic to consume.
+* @param properties Properties for the Kafka consumer.
+* @param fieldNames Row field names.
+* @param fieldTypes Row field types.
+*/
+   KafkaAvroTableSource(
+   String topic,
+   Properties properties,
+   String[] fieldNames,
+   TypeInformation[] fieldTypes) {
+
+   super(topic, properties, 
createDeserializationSchema(fieldTypes), fieldNames, fieldTypes);
+   }
+
+   private static AvroRowDeserializationSchema createDeserializationSchema(
+   TypeInformation[] fieldTypes) {
+
+   return new AvroRowDeserializationSchema(new String[]{"f1", 
"f2", "f3"}, fieldTypes);
--- End diff --

why are the field names set to `new String[]{"f1", "f2", "f3"}`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2762: [FLINK-3871] Add Kafka TableSource with Avro seria...

2016-11-17 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/2762#discussion_r88461322
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/AvroRowDeserializationSchema.java
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.streaming.util.serialization;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericDatumReader;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.io.DatumReader;
+import org.apache.avro.io.Decoder;
+import org.apache.avro.io.DecoderFactory;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.table.Row;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+
+import static 
org.apache.flink.streaming.connectors.kafka.internals.TypeUtil.createRowAvroSchema;
+
+/**
+ * Deserialization schema from Avro to {@link Row}.
+ *
+ * Deserializes the byte[] messages in Avro format and 
reads
+ * the specified fields.
+ *
+ * Failure during deserialization are forwarded as wrapped IOExceptions.
+ */
+public class AvroRowDeserializationSchema extends 
AbstractDeserializationSchema {
+
+   /** Field names in a row */
+   private final String[] fieldNames;
+   /** Types to parse fields as. Indices match fieldNames indices. */
+   private final TypeInformation[] fieldTypes;
+   /** Avro deserialization schema */
+   private final Schema schema;
+   /** Reader that deserializes byte array into a record */
+   private final DatumReader datumReader;
+   /** Record to deserialize byte array to */
+   private final GenericRecord record;
+
+   /**
+* Creates a Avro deserializtion schema for the given type classes.
--- End diff --

deserializtion + "a"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---