Re: streaming pdf

2018-11-19 Thread Jörn Franke
And you have to write your own input format, but this is not so complicated (probably anyway recommended for the PDF case) > Am 20.11.2018 um 08:06 schrieb Jörn Franke : > > Well, I am not so sure about the use cases, but what about using > StreamingContext.fileStream? >

Re: streaming pdf

2018-11-19 Thread Jörn Franke
Well, I am not so sure about the use cases, but what about using StreamingContext.fileStream?

Re: streaming pdf

2018-11-19 Thread Nicolas Paris
On Mon, Nov 19, 2018 at 07:23:10AM +0100, Jörn Franke wrote: > Why does it have to be a stream? > Right now I manage the pipelines as spark batch processing. Mooving to stream would add some improvements such: - simplification of the pipeline - more frequent data ingestion - better resource

Re: streaming pdf

2018-11-18 Thread Jörn Franke
Why does it have to be a stream? > Am 18.11.2018 um 23:29 schrieb Nicolas Paris : > > Hi > > I have pdf to load into spark with at least > format. I have considered some options: > > - spark streaming does not provide a native file stream for binary with > variable size (binaryRecordStream

streaming pdf

2018-11-18 Thread Nicolas Paris
Hi I have pdf to load into spark with at least format. I have considered some options: - spark streaming does not provide a native file stream for binary with variable size (binaryRecordStream specifies a constant size) and I would have to write my own receiver. - Structured streaming