Hi,
We have been using Spark Kafka streaming for real time processing with
success. The scale of this stream has been increasing with data growth, and
we have been able to scale up by adding more brokers to the Kafka cluster,
adding more partitions to the topic, and adding more executors to the sp
"The dataset is 100gb at most, the spills can up to 10T-100T"
-- I have had the same experiences, although not to this extreme (the
spills were < 10T while the input was ~ 100s gb) and haven't found any
solution yet. I don't believe this is related to input data format. in my
case, I got my input
e to redeploy spark)
> - write / find equivalent code yourself
>
> If you want to build a patched version of the subproject and need a hand,
> just ask on the list.
>
>
> On Fri, Jan 22, 2016 at 1:30 PM, Charles Chao
> wrote:
>
>> Hi,
>>
>> I have been usin
Hi,
I have been using DirectKafkaInputDStream in Spark Streaming to consumer kafka
messages and it's been working very well. Now I have the need to batch process
messages from Kafka, for example, retrieve all messages every hour and process
them, output to destinations like Hive or HDFS. I woul
>
>DR
>
>On 09/09/2015 11:50 AM, Charles Chao wrote:
>> I have encountered the same problem after migrating from 1.2.2 to 1.3.0.
>> After some searching this appears to be a bug introduced in 1.3.
>>Hopefully
>> it¹s fixed in 1.4.
>>
>> Thanks,
I have encountered the same problem after migrating from 1.2.2 to 1.3.0.
After some searching this appears to be a bug introduced in 1.3. Hopefully
it¹s fixed in 1.4.
Thanks,
Charles
On 9/9/15, 7:30 AM, "David Rosenstrauch" wrote:
>Standalone.
>
>On 09/08/2015 11:18 PM, Jeff Zhang wrote: