Yes we do something very similar and it's working well:
Kafka ->
Spark Streaming (write temp files, serialized RDDs) ->
Spark Batch Application (build partitioned Parquet files on HDFS; this is
needed because building Parquet files of a reasonable size is too slow for
streaming) ->
query with
: 88
>
> Topic: CEQReceiver Partition: 4Leader: 89 Replicas:
> 89Isr: 89
>
>
>
> *From:* Jeff Nadler [mailto:jnad...@srcginc.com]
> *Sent:* Wednesday, September 14, 2016 12:46 PM
> *To:* Rachana Srivastava
> *Cc:* user@spark.apache.org; d...@spark
Have you checked your Kafka brokers to be certain that data is going to all
5 partitions?We use something very similar (but in Scala) and have no
problems.
Also you might not get the best response blasting both user+dev lists like
this. Normally you'd want to use 'user' only.
-Jeff
On
2016 at 5:54 PM, Jeff Nadler <jnad...@srcginc.com> wrote:
> Yes I'll test that next.
>
> On Sep 9, 2016 5:36 PM, "Cody Koeninger" <c...@koeninger.org> wrote:
>
>> Does the same thing happen if you're only using direct stream plus back
>> pressure, not
Yes I'll test that next.
On Sep 9, 2016 5:36 PM, "Cody Koeninger" <c...@koeninger.org> wrote:
> Does the same thing happen if you're only using direct stream plus back
> pressure, not the receiver stream?
>
> On Sep 9, 2016 6:41 PM, "Jeff Nadler" &
it is eventually consuming 1 record / second / partition.
This happens even though there's no scheduling delay, and the
receiver-based stream does not appear to be throttled.
Anyone ever see anything like this?
Thanks!
Jeff Nadler
Aerohive Networks
Your receiver must extend Receiver[String].Try changing it to extend
Receiver[Message]?
On Mon, Oct 12, 2015 at 2:03 PM, Something Something <
mailinglist...@gmail.com> wrote:
> In my custom receiver for Spark Streaming I've code such as this:
>
>
Gerard - any chance this is related to task locality waiting?Can you
try (just as a diagnostic) something like this, does the unexpected delay
go away?
.set("spark.locality.wait", "0")
On Tue, Oct 6, 2015 at 12:00 PM, Gerard Maas wrote:
> Hi Cody,
>
> The job is
Spark standalone doesn't come with a UI for submitting jobs. Some Hadoop
distros might, for example EMR in AWS has a job submit UI.
Spark submit just calls a REST api, you could build any UI you want on top
of that...
On Tue, Oct 6, 2015 at 9:37 AM, shahid qadri
bmit pyspark job, can you
>> point me to Spark submit REST api
>>
>> On Oct 6, 2015, at 10:25 PM, Jeff Nadler <jnad...@srcginc.com> wrote:
>>
>>
>> Spark standalone doesn't come with a UI for submitting jobs. Some
>> Hadoop distros might, for ex
While investigating performance challenges in a Streaming application using
UpdateStateByKey, I found that serialization of state was a meaningful (not
dominant) portion of our execution time.
In StateDStream.scala, serialized persistence is required:
You can run multiple Spark clusters against one ZK cluster. Just use this
config to set independent ZK roots for each cluster:
spark.deploy.zookeeper.dir
The directory in ZooKeeper to store recovery state (default: /spark).
-Jeff
From: Sean Owen so...@cloudera.com
To: Akhil Das
JavaDStreamLike is
used for Java code, return a Scala DStream is not reasonable. You can fix
this by submitting a PR, or I can help you to fix this.
Thanks
Jerry
*From:* Jeff Nadler [mailto:jnad...@srcginc.com]
*Sent:* Monday, January 19, 2015 2:04 PM
*To:* user@spark.apache.org
*Subject
13 matches
Mail list logo