Re: how to integrate Apache Kafka with spark ?

2016-12-28 Thread Tushar Adeshara
Please see below links depending on version of Spark


2.x  http://spark.apache.org/docs/latest/streaming-kafka-integration.html

Spark Streaming + Kafka Integration Guide - Spark 2.0.2 
...<http://spark.apache.org/docs/latest/streaming-kafka-integration.html>
spark.apache.org
Spark Streaming + Kafka Integration Guide. Apache Kafka is publish-subscribe 
messaging rethought as a distributed, partitioned, replicated commit log 
service.


1.6.x https://spark.apache.org/docs/1.6.3/streaming-kafka-integration.html


Regards,
Tushar Adeshara
Technical Specialist - Analytics Practice
Persistent Systems Ltd. | Partners in Innovation | 
www.persistentsys.com<http://www.persistentsys.com/>



From: sathyanarayanan mudhaliyar <sathyanarayananmudhali...@gmail.com>
Sent: 28 December 2016 12:27
To: user@spark.apache.org
Subject: how to integrate Apache Kafka with spark ?

How do I take input from Apache Kafka into Apache Spark Streaming for stream 
processing ?

-sathya

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.



Re: Spark - Partitions

2017-10-13 Thread Tushar Adeshara
You can also try coalesce as it will avoid full shuffle.


Regards,
Tushar Adeshara
Technical Specialist – Analytics Practice
Cell: +91-81490 04192
Persistent Systems Ltd. | Partners in Innovation | 
www.persistentsys.com<http://www.persistentsys.com/>



From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
Sent: 13 October 2017 09:35
To: user @spark
Subject: Spark - Partitions

Hi,

I am reading hive query and wiriting the data back into hive after doing some 
transformations.

I have changed setting spark.sql.shuffle.partitions to 2000 and since then job 
completes fast but the main problem is I am getting 2000 files for each 
partition
size of file is 10 MB .

is there a way to get same performance but write lesser number of files ?

I am trying repartition now but would like to know if there are any other 
options.

Thanks,
Asmath
DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.