Re: streaming pdf

2018-11-18 Thread Jörn Franke
Why does it have to be a stream?

> Am 18.11.2018 um 23:29 schrieb Nicolas Paris :
> 
> Hi
> 
> I have pdf to load into spark with at least 
> format. I have considered some options:
> 
> - spark streaming does not provide a native file stream for binary with
>  variable size (binaryRecordStream specifies a constant size) and I
>  would have to write my own receiver.
> 
> - Structured streaming allows to process avro/parquet/orc files
>  containing pdfs, but this makes things more complicated than
>  monitoring a simple folder  containing pdfs
> 
> - Kafka is not designed to handle messages > 100KB, and for this reason
>  it is not a good option to use in the stream pipeline.
> 
> Somebody has a suggestion ?
> 
> Thanks,
> 
> -- 
> nicolas
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



streaming pdf

2018-11-18 Thread Nicolas Paris
Hi

I have pdf to load into spark with at least 
format. I have considered some options:

- spark streaming does not provide a native file stream for binary with
  variable size (binaryRecordStream specifies a constant size) and I
  would have to write my own receiver.

- Structured streaming allows to process avro/parquet/orc files
  containing pdfs, but this makes things more complicated than
  monitoring a simple folder  containing pdfs

- Kafka is not designed to handle messages > 100KB, and for this reason
  it is not a good option to use in the stream pipeline.

Somebody has a suggestion ?

Thanks,

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



CVE-2018-17190: Unsecured Apache Spark standalone executes user code

2018-11-18 Thread Sean Owen
Severity: Low

Vendor: The Apache Software Foundation

Versions Affected:
All versions of Apache Spark

Description:
Spark's standalone resource manager accepts code to execute on a 'master' host,
that then runs that code on 'worker' hosts. The master itself does not, by
design, execute user code. A specially-crafted request to the master can,
however, cause the master to execute code too. Note that this does not affect
standalone clusters with authentication enabled. While the master host
typically has less outbound access to other resources than a worker, the
execution of code on the master is nevertheless unexpected.

Mitigation:
Enable authentication on any Spark standalone cluster that is not otherwise
secured from unwanted access, for example by network-level restrictions. Use
spark.authenticate and related security properties described at
https://spark.apache.org/docs/latest/security.html

Credit:
Andre Protas, Apple Information Security

References:
https://spark.apache.org/security.html

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org