That I fail to know, how many maps and reducers are there. Because due to some reason my instance get terminated :( I want to know one thing that If we use multiple nodes, then what should be the count of maps and reducers. Actually I am confused about that. How to decide it?
Also I want to try the different properties like block size, compress output, size of in-memorybuffer, parallel execution etc. Will these all properties matters to increase the performance? Nitin, you have read all my use case. Whatever the thing I did to implement with the help of Hadoop is correct? Is it possible to increase the performance? Thanks Nitin for your reply. :) -- Regards, Bhavesh Shah On Mon, May 14, 2012 at 2:07 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > with a 10 node cluster the performance should improve. > how many maps and reducers are being launched? > > > On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah <bhavesh25s...@gmail.com>wrote: > >> I have near about 1 billion records in my relational database. >> Currently locally I am using just one cluster. But I also tried this on >> Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the >> complete program is same as that on my single local machine. >> >> >> On Mon, May 14, 2012 at 1:13 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: >> >>> how many # records? >>> >>> what is your hadoop cluster setup? how many nodes? >>> if you are running hadoop on a single node setup with normal desktop, i >>> doubt it will be of any help. >>> >>> You need a stronger cluster setup for better query runtimes and ofcourse >>> query optimization which I guess you would have already taken care. >>> >>> >>> >>> On Mon, May 14, 2012 at 12:39 PM, Bhavesh Shah >>> <bhavesh25s...@gmail.com>wrote: >>> >>>> Hello all, >>>> My Use Case is: >>>> 1) I have a relational database which has a very large data. (MS SQL >>>> Server) >>>> 2) I want to do analysis on these huge data and want to generate >>>> reports >>>> on it after analysis. >>>> Like this I have to generate various reports based on different >>>> analysis. >>>> >>>> I tried to implement this using Hive. What I did is: >>>> 1) I imported all tables in Hive from MS SQL Server using SQOOP. >>>> 2) I wrote many queries in Hive which is executing using JDBC on Hive >>>> Thrift Server >>>> 3) I am getting the correct result in table form, which I am expecting >>>> 4) But the problem is that the time which require to execute is too much >>>> long. >>>> (My complete program is executing in near about 3-4 hours on *small >>>> amount of data*). >>>> >>>> >>>> I decided to do this using Hive. >>>> And as I told previously how much time Hive consumed for execution. >>>> my >>>> organization is expecting to complete this task in near about less than >>>> 1/2 hours >>>> >>>> Now after spending too much time for complete execution for this task >>>> what >>>> should I do? >>>> I want to ask one thing that: >>>> *Is this Use Case is possible with Hive?* If possible what should I do >>>> in >>>> >>>> my program to increase the performance? >>>> *And If not possible what is the other good way to implement this Use >>>> Case?* >>>> >>>> >>>> Please reply me. >>>> Thanks >>>> >>>> >>>> -- >>>> Regards, >>>> Bhavesh Shah >>>> >>> >>> >>> >>> -- >>> Nitin Pawar >>> >>> >> >> >> -- >> Regards, >> Bhavesh Shah >> >> > > > -- > Nitin Pawar > >