date:20180726

Re: How to read json data from kafka and store to hdfs with spark structued streaming?

2018-07-26 Thread Tathagata Das

Are you writing multiple streaming query output to the same location? If so, I can see this error occurring. Multiple streaming queries writing to the same directory is not supported. On Tue, Jul 24, 2018 at 3:38 PM, dddaaa wrote: > I'm trying to read json messages from kafka and store them in

Re: Backpressure initial rate not working

2018-07-26 Thread Biplob Biswas

Hi Todd, Thanks a lot, that works. Althouhg I am curious whether you know why the initialRate setting not kicking in? But for now the pipeline is usable again. Thanks a lot. Thanks & Regards Biplob Biswas On Thu, Jul 26, 2018 at 3:03 PM Todd Nist wrote: > Have you tried reducing the

Re: Exceptions with simplest Structured Streaming example

2018-07-26 Thread Tathagata Das

Unfortunately, your output is not visible in the email that we see. Was it an image that some got removed? Maybe best to copy the output text (i.e. the error message) into the email. On Thu, Jul 26, 2018 at 5:41 AM, Jonathan Apple wrote: > Hello, > > There is a streaming World Count example at

Re: Exceptions with simplest Structured Streaming example

2018-07-26 Thread Jonathan Apple

(My apologies; I used Nabble to post and it stripped out the HTML) The original message is below, but note that we just had the issue solved on Stack Overflow: https://stackoverflow.com/questions/51541134/pyspark-exceptions-with-simplest-structured-streaming-example Turns out it's a known issue

Re: Backpressure initial rate not working

2018-07-26 Thread Biplob Biswas

Did anyone face similar issue? and any viable way to solve this? Thanks & Regards Biplob Biswas On Wed, Jul 25, 2018 at 4:23 PM Biplob Biswas wrote: > I have enabled the spark.streaming.backpressure.enabled setting and also > set spark.streaming.backpressure.initialRate to 15000, but my

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist

Hi Biplob, How many partitions are on the topic you are reading from and have you set the maxRatePerPartition? iirc, spark back pressure is calculated as follows: *Spark back pressure:* Back pressure is calculated off of the following: • maxRatePerPartition=200 • batchInterval 30s • 3

Re: Backpressure initial rate not working

2018-07-26 Thread Biplob Biswas

Hi Todd, Thanks for the reply. I have the mayxRatePerPartition set as well. Below is the spark submit config we used and still got the issue. Also the *batch interval is set at 10s* and *number of partitions on the topic is set to 4* : spark2-submit --name "${YARN_NAME}" \ --master yarn \

Exceptions with simplest Structured Streaming example

2018-07-26 Thread Jonathan Apple

Hello, There is a streaming World Count example at the beginning of the Structured Streaming Programming Guide . First, we execute *nc -lk * in a separate terminal. Next, following the Python code, we have

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist

Have you tried reducing the maxRatePerPartition to a lower value? Based on your settings, I believe you are going to be able to pull *600K* worth of messages from Kafka, basically: • maxRatePerPartition=15000 • batchInterval 10s • 4 partitions on Ingest topic This results in a maximum

Optimizing a join with bucketing

2018-07-26 Thread Vitaliy Pisarev

I am joining two entities. One of the entities weighs ~0.5 TB. The other weighs ~16GB Both are stored in parquet. Another trait of the problem is that the "smaller" entity does not change, so I figured I'd pre-bucket it to improve performance. * What are the guidelines for deciding the best

Re: How to read json data from kafka and store to hdfs with spark structued streaming?

Re: Backpressure initial rate not working

Re: Exceptions with simplest Structured Streaming example

Re: Exceptions with simplest Structured Streaming example

Re: Backpressure initial rate not working

Re: Backpressure initial rate not working

Re: Backpressure initial rate not working

Exceptions with simplest Structured Streaming example

Re: Backpressure initial rate not working

Optimizing a join with bucketing

10 matches

Site Navigation

Mail list logo

Footer information