And you have to write your own input format, but this is not so complicated
(probably anyway recommended for the PDF case)
> Am 20.11.2018 um 08:06 schrieb Jörn Franke :
>
> Well, I am not so sure about the use cases, but what about using
> StreamingContext.fileStream?
>
Well, I am not so sure about the use cases, but what about using
StreamingContext.fileStream?
On Mon, Nov 19, 2018 at 07:23:10AM +0100, Jörn Franke wrote:
> Why does it have to be a stream?
>
Right now I manage the pipelines as spark batch processing. Mooving to
stream would add some improvements such:
- simplification of the pipeline
- more frequent data ingestion
- better resource
Why does it have to be a stream?
> Am 18.11.2018 um 23:29 schrieb Nicolas Paris :
>
> Hi
>
> I have pdf to load into spark with at least
> format. I have considered some options:
>
> - spark streaming does not provide a native file stream for binary with
> variable size (binaryRecordStream
Hi
I have pdf to load into spark with at least
format. I have considered some options:
- spark streaming does not provide a native file stream for binary with
variable size (binaryRecordStream specifies a constant size) and I
would have to write my own receiver.
- Structured streaming