Try setting this value to your block Size, for 128 mb block size, > set mapred.min.split.size=128000
Sent from my iPhone On May 7, 2012, at 10:11 PM, Bhavesh Shah <[email protected]> wrote: > Thanks Nitin for your reply. > > In short my Task is > 1) Initially I want to import the data from MS SQL Server into HDFS using > SQOOP. > 2) Through Hive I am processing the data and generating the result in one > table > 3) That result containing table from Hive is again exported to MS SQL SERVER > back. > > Actually the data which I am importing from MS SQL Server is very large (near > about 5,00,000 entries in one table. Like wise I have 30 tables). For this I > have written a task in Hive which contains only queries (And each query has > used a lot of joins in it). So due to this the performance is very poor on > my single local machine ( It takes near about 3 hrs to execute completely). I > have observed that when I have submitted a single query to Hive CLI it took > 10-11 jobs to execute completely. > > set mapred.min.split.size > set mapred.max.split.size > Should this value to be set in bootstrap action while submitting jobs to > amazon EMR? What value to be set for it as I don't know? > > > -- > Regards, > Bhavesh Shah > > > On Tue, May 8, 2012 at 10:31 AM, Nitin Pawar <[email protected]> wrote: > 1) check the jobtracker url to see how many maps/reducers have been launched > 2) if you have a large dataset and wants to execute it fast, you set > mapred.min.split.size and mapred.max.split.size to an optimal value so that > more mappers will be launched and will finish > 3) if you are doing joins, there are different ways to go according to the > data you have and size of data > > it will be helpful if you can let us know your datasizes and query details > > > On Tue, May 8, 2012 at 10:07 AM, Bhavesh Shah <[email protected]> wrote: > Hello all, > I have written a Hive JDBC code and created a JAR of it. I am running that > JAR on 10 cluster. > But the problem as I am using the 10 cluster still the performance is same as > that on single cluster. > > What to do to improve the performance of Hive Jobs? Is there anything > configuration setting to set before the submitting Hive Jobs to cluster? > One more thing I want to know is that How can we come to know that is job > running on all cluster? > > Please let me know if anyone knows about it? > > -- > Regards, > Bhavesh Shah > > > > > -- > Nitin Pawar > > >
