Hi Junfeng ,

Is your kafka topic partitioned? 

Are you referring to the duration or the CPU time spent by the job as being 20% 
- 50% higher than running in local? 

Thanks & Regards
Gopal 


> On 09-Apr-2018, at 11:42 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> Probably network / shuffling cost? Or broadcast variables? Can you provide 
> more details what you do and some timings?
> 
>> On 9. Apr 2018, at 07:07, Junfeng Chen <darou...@gmail.com> wrote:
>> 
>> I have wrote an spark streaming application reading kafka data and convert 
>> the json data to parquet and save to hdfs. 
>> What make me puzzled is, the processing time of app in yarn mode cost 20% to 
>> 50% more time than in local mode. My cluster have three nodes with three 
>> node managers, and all three hosts have same hardware, 40cores and 256GB 
>> memory. .
>> 
>> Why? How to solve it? 
>> 
>> Regard,
>> Junfeng Chen
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to