with a 10 node cluster the performance should improve. how many maps and reducers are being launched?
On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah <bhavesh25s...@gmail.com>wrote: > I have near about 1 billion records in my relational database. > Currently locally I am using just one cluster. But I also tried this on > Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the > complete program is same as that on my single local machine. > > > On Mon, May 14, 2012 at 1:13 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > >> how many # records? >> >> what is your hadoop cluster setup? how many nodes? >> if you are running hadoop on a single node setup with normal desktop, i >> doubt it will be of any help. >> >> You need a stronger cluster setup for better query runtimes and ofcourse >> query optimization which I guess you would have already taken care. >> >> >> >> On Mon, May 14, 2012 at 12:39 PM, Bhavesh Shah >> <bhavesh25s...@gmail.com>wrote: >> >>> Hello all, >>> My Use Case is: >>> 1) I have a relational database which has a very large data. (MS SQL >>> Server) >>> 2) I want to do analysis on these huge data and want to generate reports >>> on it after analysis. >>> Like this I have to generate various reports based on different analysis. >>> >>> I tried to implement this using Hive. What I did is: >>> 1) I imported all tables in Hive from MS SQL Server using SQOOP. >>> 2) I wrote many queries in Hive which is executing using JDBC on Hive >>> Thrift Server >>> 3) I am getting the correct result in table form, which I am expecting >>> 4) But the problem is that the time which require to execute is too much >>> long. >>> (My complete program is executing in near about 3-4 hours on *small >>> amount of data*). >>> >>> >>> I decided to do this using Hive. >>> And as I told previously how much time Hive consumed for execution. >>> my >>> organization is expecting to complete this task in near about less than >>> 1/2 hours >>> >>> Now after spending too much time for complete execution for this task >>> what >>> should I do? >>> I want to ask one thing that: >>> *Is this Use Case is possible with Hive?* If possible what should I do in >>> >>> my program to increase the performance? >>> *And If not possible what is the other good way to implement this Use >>> Case?* >>> >>> >>> Please reply me. >>> Thanks >>> >>> >>> -- >>> Regards, >>> Bhavesh Shah >>> >> >> >> >> -- >> Nitin Pawar >> >> > > > -- > Regards, > Bhavesh Shah > > -- Nitin Pawar