Re: Spark - Partitions
You can also try coalesce as it will avoid full shuffle. Regards, Tushar Adeshara Technical Specialist – Analytics Practice Cell: +91-81490 04192 Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> Sent: 13 October 2017 09:35 To: user @spark Subject: Spark - Partitions Hi, I am reading hive query and wiriting the data back into hive after doing some transformations. I have changed setting spark.sql.shuffle.partitions to 2000 and since then job completes fast but the main problem is I am getting 2000 files for each partition size of file is 10 MB . is there a way to get same performance but write lesser number of files ? I am trying repartition now but would like to know if there are any other options. Thanks, Asmath DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: how to integrate Apache Kafka with spark ?
Please see below links depending on version of Spark 2.x http://spark.apache.org/docs/latest/streaming-kafka-integration.html Spark Streaming + Kafka Integration Guide - Spark 2.0.2 ...<http://spark.apache.org/docs/latest/streaming-kafka-integration.html> spark.apache.org Spark Streaming + Kafka Integration Guide. Apache Kafka is publish-subscribe messaging rethought as a distributed, partitioned, replicated commit log service. 1.6.x https://spark.apache.org/docs/1.6.3/streaming-kafka-integration.html Regards, Tushar Adeshara Technical Specialist - Analytics Practice Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> From: sathyanarayanan mudhaliyar <sathyanarayananmudhali...@gmail.com> Sent: 28 December 2016 12:27 To: user@spark.apache.org Subject: how to integrate Apache Kafka with spark ? How do I take input from Apache Kafka into Apache Spark Streaming for stream processing ? -sathya DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.