Re: Reading csv-files in parallel

2018-05-09 Thread Fabian Hueske
L together in Scala code ? > > > > Best, Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Tuesday, May 8, 2018 10:26 PM > > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in pa

RE: Reading csv-files in parallel

2018-05-09 Thread Esa Heikkinen
.org Subject: Re: Reading csv-files in parallel Hi, the Table API / SQL and the DataSet API can be used together in the same program. So you could read the data with a custom input format or a TextInputFormat and a custom MapFunction parser and hand it to SQL afterwards. The program would be a r

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske
> > > > Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Tuesday, May 8, 2018 2:00 PM > > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files in parallel > > >

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen
(state-machine-based) logic for reading csv-files by certain order. Esa From: Fabian Hueske <fhue...@gmail.com> Sent: Tuesday, May 8, 2018 2:00 PM To: Esa Heikkinen <esa.heikki...@student.tut.fi> Cc: user@flink.apache.org Subject: Re: Reading csv-files in parallel Hi, the easiest approac

Re: Reading csv-files in parallel

2018-05-08 Thread Fabian Hueske
at read csv-files > parallel ? > > > > Best, Esa > > > > *From:* Fabian Hueske <fhue...@gmail.com> > *Sent:* Monday, May 7, 2018 3:48 PM > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-

RE: Reading csv-files in parallel

2018-05-08 Thread Esa Heikkinen
Monday, May 7, 2018 3:48 PM To: Esa Heikkinen <esa.heikki...@student.tut.fi> Cc: user@flink.apache.org Subject: Re: Reading csv-files in parallel Hi Esa, you can certainly read CSV files in parallel. This works very well in a batch query. For streaming queries, that expect data to be ingested

Re: Reading csv-files in parallel

2018-05-07 Thread Fabian Hueske
Hi Esa, you can certainly read CSV files in parallel. This works very well in a batch query. For streaming queries, that expect data to be ingested in timestamp order this is much more challenging, because you need 1) read the files in the right order and 2) cannot split files (unless you

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
*From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Thursday, March 1, 2018 11:23 AM > > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files > > > > Hi Esa, > > IMO, the easiest appro

RE: Reading csv-files

2018-03-01 Thread Esa Heikkinen
a Heikkinen <esa.heikki...@student.tut.fi<mailto:esa.heikki...@student.tut.fi>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: Reading csv-files Yes, that is mostly correct. You can of course read files in parallel, assign watermarks, and obtain a DataStream with co

Re: Reading csv-files

2018-03-01 Thread Fabian Hueske
Esa > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Tuesday, February 27, 2018 11:27 PM > *To:* Esa Heikkinen <esa.heikki...@student.tut.fi> > *Cc:* user@flink.apache.org > *Subject:* Re: Reading csv-files > > > > Yes, that is mostly correct.

RE: Reading csv-files

2018-02-28 Thread Esa Heikkinen
: user@flink.apache.org Subject: Re: Reading csv-files Yes, that is mostly correct. You can of course read files in parallel, assign watermarks, and obtain a DataStream with correct timestamps and watermarks. If you do that, you should ensure that each parallel source tasks reads the files in the

Re: Reading csv-files

2018-02-27 Thread Fabian Hueske
Yes, that is mostly correct. You can of course read files in parallel, assign watermarks, and obtain a DataStream with correct timestamps and watermarks. If you do that, you should ensure that each parallel source tasks reads the files in the order of increasing timestamps. As I said before, you

Re: Reading csv-files

2018-02-27 Thread Esa Heikkinen
Hi Thanks for the answer. All csv-files are already present and they will not change during the processing. Because Flink can read many streams in parallel, i think it is also possbile to read many csv-files in parallel. From what i have understand, it is possible to convert csv-files to

Re: Reading csv-files

2018-02-27 Thread Fabian Hueske
Hi Esa, Reading records from files with timestamps that need watermarks can be tricky. If you are aware of Flink's watermark mechanism, you know that records should be ingested in (roughly) increasing timestamp order. This means that files usually cannot be split (i.e, need to be read by a single