I ended up not using the DeserializationSchema and instead going for a AvrioInputFormat in case of reading From file. I would have preferred to keep the code simpler but the map solution was a lot more complicated since the raw data I have is in Avro binary format so I cannot just read it and map it later.
cheers Martin On Fri, Feb 12, 2016 at 10:47 PM, Nick Dimiduk <ndimi...@apache.org> wrote: > My input file contains newline-delimited JSON records, one per text line. > The records on the Kafka topic are JSON blobs encoded to UTF8 and written > as bytes. > > On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann <mneum...@sics.se> wrote: > >> I'm trying the same thing now. >> >> I guess you need to read the file as byte arrays somehow to make it work. >> What read function did you use? The mapper is not hard to write but the >> byte array stuff gives me a headache. >> >> cheers Martin >> >> >> >> >> On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimi...@apache.org> >> wrote: >> >>> Hi Martin, >>> >>> I have the same usecase. I wanted to be able to load from dumps of data >>> in the same format as is on the kafak queue. I created a new application >>> main, call it the "job" instead of the "flow". I refactored my code a bit >>> for building the flow so all that can be reused via factory method. I then >>> implemented a MapFunction that simply calls my existing deserializer. >>> Create a new DataStream from flat file and tack on the MapFunction step. >>> The resulting DataStream is then type-compatible with the Kakfa consumer >>> that starts the "flow" application, so I pass it into the factory method. >>> Tweak the ParameterTools options for the "job" application, et voilà! >>> >>> Sorry I don't have example code for you; this would be a good example to >>> contribute back to the community's example library though. >>> >>> Good luck! >>> -n >>> >>> On Fri, Feb 12, 2016 at 2:25 AM, Martin Neumann <mneum...@sics.se> >>> wrote: >>> >>>> Its not only about testing, I will also need to run things against >>>> different datasets. I want to reuse as much of the code as possible to load >>>> the same data from a file instead of kafka. >>>> >>>> Is there a simple way of loading the data from a File using the same >>>> conversion classes that I would use to transfrom them when I read them from >>>> kafka or do I have to write a new avro deserializer (InputFormat). >>>> >>>> On Fri, Feb 12, 2016 at 2:06 AM, Gyula Fóra <gyula.f...@gmail.com> >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> A very simple thing you could do is to set up a simple kafka producer >>>>> in a java program that will feed the data into a topic. This also has the >>>>> additional benefit that you are actually testing against kafka. >>>>> >>>>> Cheers, >>>>> Gyula >>>>> >>>>> Martin Neumann <mneum...@sics.se> ezt írta (időpont: 2016. febr. 12., >>>>> P, 0:20): >>>>> >>>>>> Hej, >>>>>> >>>>>> I have a stream program reading data from Kafka where the data is in >>>>>> avro. I have my own DeserializationSchema to deal with it. >>>>>> >>>>>> For testing reasons I want to read a dump from hdfs instead, is there >>>>>> a way to use the same DeserializationSchema to read from an avro file >>>>>> stored on hdfs? >>>>>> >>>>>> cheers Martin >>>>>> >>>>> >>>> >>> >> >