My input file contains newline-delimited JSON records, one per text line. The records on the Kafka topic are JSON blobs encoded to UTF8 and written as bytes.
On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann <mneum...@sics.se> wrote: > I'm trying the same thing now. > > I guess you need to read the file as byte arrays somehow to make it work. > What read function did you use? The mapper is not hard to write but the > byte array stuff gives me a headache. > > cheers Martin > > > > > On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimi...@apache.org> wrote: > >> Hi Martin, >> >> I have the same usecase. I wanted to be able to load from dumps of data >> in the same format as is on the kafak queue. I created a new application >> main, call it the "job" instead of the "flow". I refactored my code a bit >> for building the flow so all that can be reused via factory method. I then >> implemented a MapFunction that simply calls my existing deserializer. >> Create a new DataStream from flat file and tack on the MapFunction step. >> The resulting DataStream is then type-compatible with the Kakfa consumer >> that starts the "flow" application, so I pass it into the factory method. >> Tweak the ParameterTools options for the "job" application, et voilà! >> >> Sorry I don't have example code for you; this would be a good example to >> contribute back to the community's example library though. >> >> Good luck! >> -n >> >> On Fri, Feb 12, 2016 at 2:25 AM, Martin Neumann <mneum...@sics.se> wrote: >> >>> Its not only about testing, I will also need to run things against >>> different datasets. I want to reuse as much of the code as possible to load >>> the same data from a file instead of kafka. >>> >>> Is there a simple way of loading the data from a File using the same >>> conversion classes that I would use to transfrom them when I read them from >>> kafka or do I have to write a new avro deserializer (InputFormat). >>> >>> On Fri, Feb 12, 2016 at 2:06 AM, Gyula Fóra <gyula.f...@gmail.com> >>> wrote: >>> >>>> Hey, >>>> >>>> A very simple thing you could do is to set up a simple kafka producer >>>> in a java program that will feed the data into a topic. This also has the >>>> additional benefit that you are actually testing against kafka. >>>> >>>> Cheers, >>>> Gyula >>>> >>>> Martin Neumann <mneum...@sics.se> ezt írta (időpont: 2016. febr. 12., >>>> P, 0:20): >>>> >>>>> Hej, >>>>> >>>>> I have a stream program reading data from Kafka where the data is in >>>>> avro. I have my own DeserializationSchema to deal with it. >>>>> >>>>> For testing reasons I want to read a dump from hdfs instead, is there >>>>> a way to use the same DeserializationSchema to read from an avro file >>>>> stored on hdfs? >>>>> >>>>> cheers Martin >>>>> >>>> >>> >> >