Re: How to load multiple same-format files with single batch job?

françois lacombe Mon, 11 Feb 2019 12:09:39 -0800

Hi Fabian,

I've got issues for a custom InputFormat implementation with my existing
code.


Is this can be used in combination with a BatchTableSource custom source?
As I understand your solution, I should move my source to implementations
like :

tableEnvironment
  .connect(...)
  .withFormat(...)
  .withSchema(...)
  .inAppendMode()
  .registerTableSource("MyTable")

right?

I currently have a BatchTableSource class which produce a DataSet<Row> from
a single geojson file.
This doesn't sound compatible with a custom InputFormat, don't you?

Thanks in advance for any addition hint, all the best

François

Le lun. 4 févr. 2019 à 12:10, Fabian Hueske <fhue...@gmail.com> a écrit :

> Hi,
>
> The files will be read in a streaming fashion.
> Typically files are broken down into processing splits that are
> distributed to tasks for reading.
> How a task reads a file split depends on the implementation, but usually
> the format reads the split as a stream and does not read the split as a
> whole before emitting records.
>
> Best,
> Fabian
>
> Am Mo., 4. Feb. 2019 um 12:06 Uhr schrieb françois lacombe <
> francois.laco...@dcbrain.com>:
>
>> Hi Fabian,
>>
>> Thank you for this input.
>> This is interesting.
>>
>> With such an input format, will all the file will be loaded in memory
>> before to be processed or will all be streamed?
>>
>> All the best
>> François
>>
>> Le mar. 29 janv. 2019 à 22:20, Fabian Hueske <fhue...@gmail.com> a
>> écrit :
>>
>>> Hi,
>>>
>>> You can point a file-based input format to a directory and the input
>>> format should read all files in that directory.
>>> That works as well for TableSources that are internally use file-based
>>> input formats.
>>> Is that what you are looking for?
>>>
>>> Best, Fabian
>>>
>>> Am Mo., 28. Jan. 2019 um 17:22 Uhr schrieb françois lacombe <
>>> francois.laco...@dcbrain.com>:
>>>
>>>> Hi all,
>>>>
>>>> I'm wondering if it's possible and what's the best way to achieve the
>>>> loading of multiple files with a Json source to a JDBC sink ?
>>>> I'm running Flink 1.7.0
>>>>
>>>> Let's say I have about 1500 files with the same structure (same format,
>>>> schema, everything) and I want to load them with a *batch* job
>>>> Can Flink handle the loading of one and each file in a single source
>>>> and send data to my JDBC sink?
>>>> I wish I can provide the URL of the directory containing my thousand
>>>> files to the batch source to make it load all of them sequentially.
>>>> My sources and sinks are currently available for BatchTableSource, I
>>>> guess the cost to make them available for streaming would be quite
>>>> expensive for me for the moment.
>>>>
>>>> Have someone ever done this?
>>>> Am I wrong to expect doing so with a batch job?
>>>>
>>>> All the best
>>>>
>>>> François Lacombe
>>>>
>>>>
>>>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>>>>    <https://www.linkedin.com/company/dcbrain>
>>>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>>>
>>>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>>>> nécessaire
>>>>
>>>
>>
>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>> <https://www.linkedin.com/company/dcbrain>
>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>
>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>> nécessaire
>>
>

-- 

 <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>   
<https://www.linkedin.com/company/dcbrain>   
<https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>


 Pensez à la 
planète, imprimer ce papier que si nécessaire

Re: How to load multiple same-format files with single batch job?

Reply via email to