[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-21 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-713414467 @bvaradar You are correct. It worked fine once the config was added. For some reason , kafkacat was not showing up the tombstone record .

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-20 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712680562 @bvaradar Yes I think thats the tombstone event. You can disable it with configs This is an automated message

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712404711 Not sure if this is gonna be of any help but attaching the latest logs. I can see this messages towards the end ``` at

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-19 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-712377184 @bvaradar I can provide all the SQL's in Postgres which I'm using to reproduce this though : ``` DROP TABLE public.motor_crash_violation_incidents; CREATE

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-18 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-711553007 @bvaradar We are using the Debezium postgres connector of Confluent Kafka This is an automated message from

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-16 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-710034993 AvroKafkaSource : ``` spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4 --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-16 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-710023639 @bvaradar Isnt the ``` --source-ordering-field _ts_ms ``` Then precombine should be looking in for _ts_ms right for deletion ?

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-15 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-709557478 @bvaradar I thought that at first. To confirm the same I retried the scenario multiple times. Im getting the same error everytime. Only during Deletes

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-14 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-708500901 @bvaradar I changed postgres configuration and now the debezium delete action doesnt create null value in "Before" : ``` {"before": {"Value": {"inc_id": 3, "year":

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-14 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-708433619 @bvaradar The patch worked successfully for Insert and upserts except for Delete. I think due to the way debezium loads the delete changes into kafka this issue is coming

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-13 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-708003843 @bvaradar Thanks !!! .. It seems to ingest properly. I will test all scenarios like delete etc and let you know . Thanks for such amazing support .!!

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-13 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-707746861 @bvaradar getting following error in patch : ``` error: corrupt patch at line 252 ``` Doing git patch for the first time .Might be Im doing something silly

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-12 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-707061282 @bvaradar PFA below the files [Downloads.zip](https://github.com/apache/hudi/files/5364821/Downloads.zip)

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-12 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706904989 @bvaradar The json I had provided is the output of kafkacat utility which outputs as json. In our process we have Key = String and Value as AVRO for Kafka. Now the different

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705431204 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706123886 Following the Kafka data as consumed using Kafkacat ``` {"before": null, "after": {"Value": {"inc_id": 1, "year": {"int": 2016}, "violation_desc": {"string": "DRIVING

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706110206 @bvaradar Thanks for noticing it. I think that solved the previous error but producing following error now : ``` 20/10/09 10:32:09 INFO AppInfoParser: Kafka version

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706075094 Avro Payload : ``` package org.apache.hudi.common.model; import org.apache.hudi.common.util.Option; import org.apache.avro.generic.GenericRecord;

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706072486 @bvaradar I have changed all the code to as how you had send earlier So the HoodieDeleateField is not present now

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-706057675 @bvaradar Please find the details : ``` spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-09 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705990567 ``` { "connect.name": "airflow.public.motor_crash_violation_incidents.Envelope", "fields": [ { "default": null, "name": "before",

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-08 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705969068 @bvaradar My bad...Im attaching the logs [yarn-logs.txt](https://github.com/apache/hudi/files/5352619/logs.txt)

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-08 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705431204 @bvaradar Please find below the logs [log.txt](https://github.com/apache/hudi/files/5346518/log.txt)

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-07 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-70517 I found this Error message in the logs : ``` INFO DAGScheduler: Job 11 finished: sum at DeltaSync.java:406, took 0.108824 s 20/10/07 20:27:07 ERROR DeltaSync:

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-07 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705148134 ![hudi-kafka](https://user-images.githubusercontent.com/40498599/95378569-b880d400-0901-11eb-8f5c-9268b4b3a92f.JPG)

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-07 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-705121899 @bvaradar I changed the code to as previous and ran the deltastreamer . But some reason is causing error and data is getting rolled back : ``` 20/10/07 18:34:18

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-07 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-704918685 @bvaradar : I followed your instructions but tried to add _is_hoodie_deleted column to the dataset using following code for testing Im getting the following error with

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-06 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-704569933 @bvaradar I implemented the transformer class as `public class DebeziumCustomTransformer implements Transformer { private static final Logger LOG =

[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-06 Thread GitBox
ashishmgofficial commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-704557449 @bvaradar So in this case we should be giving updated schema file for the target ? This is an automated