Re: Read all json files from a hdfs partition folder

2018-12-13 Thread Andrey Zagrebin
Hi Rakesh, So the problem is that you want your Flink job to monitor ' /data/ingestion/ingestion-raw-product’ path for new files inside and process them when they appear, right? Can you try env.readFile but with watchType = FileProcessingMode.PROCESS_CONTINUOUSLY? You can see an example in how

Re: Read all json files from a hdfs partition folder

2018-12-12 Thread Andrey Zagrebin
Actually, does it not work if you just provide directory in env.readTextFile as in your code example or what is the problem? > On 12 Dec 2018, at 17:24, Andrey Zagrebin wrote: > > Hi, > > If the question is how to read all files from hdfs directory, > in general, each file is potentially a dif

Re: Read all json files from a hdfs partition folder

2018-12-12 Thread Andrey Zagrebin
Hi, If the question is how to read all files from hdfs directory, in general, each file is potentially a different DataSet (not DataStream). It needs to be decided how to combine/join them in Flink pipeline. If the files are small enough, you could list them as string paths and use env.fromColl

Read all json files from a hdfs partition folder

2018-12-12 Thread Rakesh Kumar
Hi, I wanted to read all json files from hdfs with partition folder. public static void main(String[] args) { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //path //hdfs://localhost:8020/data/ingestion/ingestion.raw.product/2018/12/05/23 DataStream