That I fail to know, how many maps and reducers are there. Because due to
some reason my instance get terminated   :(
I want to know one thing that If we use multiple nodes, then what should be
the count of maps and reducers.
Actually I am confused about that. How to decide it?

Also I want to try the different properties like block size, compress
output, size of in-memorybuffer, parallel execution etc.
Will these all properties matters to increase the performance?

Nitin, you have read all my use case. Whatever the thing I did to implement
with the help of Hadoop is correct?
Is it possible to increase the performance?

Thanks Nitin for your reply.   :)

-- 
Regards,
Bhavesh Shah


On Mon, May 14, 2012 at 2:07 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> with a 10 node cluster the performance should improve.
> how many maps and reducers are being launched?
>
>
> On Mon, May 14, 2012 at 1:18 PM, Bhavesh Shah <bhavesh25s...@gmail.com>wrote:
>
>> I have near about 1 billion records in my relational database.
>> Currently locally I am using just one cluster. But I also tried this on
>> Amazon Elastic Mapreduce with 10 nodes. But the time taken to execute the
>> complete program is same as that on my  single local machine.
>>
>>
>> On Mon, May 14, 2012 at 1:13 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote:
>>
>>> how many # records?
>>>
>>> what is your hadoop cluster setup? how many nodes?
>>> if you are running hadoop on a single node setup with normal desktop, i
>>> doubt it will be of any help.
>>>
>>> You need a stronger cluster setup for better query runtimes and ofcourse
>>> query optimization which I guess you would have already taken care.
>>>
>>>
>>>
>>> On Mon, May 14, 2012 at 12:39 PM, Bhavesh Shah 
>>> <bhavesh25s...@gmail.com>wrote:
>>>
>>>> Hello all,
>>>> My Use Case is:
>>>> 1) I have a relational database which has a very large data. (MS SQL
>>>> Server)
>>>> 2) I want to do analysis on these huge data  and want to generate
>>>> reports
>>>> on it after analysis.
>>>> Like this I have to generate various reports based on different
>>>> analysis.
>>>>
>>>> I tried to implement this using Hive. What I did is:
>>>> 1) I imported all tables in Hive from MS SQL Server using SQOOP.
>>>> 2) I wrote many queries in Hive which is executing using JDBC on Hive
>>>> Thrift Server
>>>> 3) I am getting the correct result in table form, which I am expecting
>>>> 4) But the problem is that the time which require to execute is too much
>>>> long.
>>>>    (My complete program is executing in near about 3-4 hours on *small
>>>> amount of data*).
>>>>
>>>>
>>>>    I decided to do this using Hive.
>>>>     And as I told previously how much time Hive consumed for execution.
>>>> my
>>>> organization is expecting to complete this task in near about less than
>>>> 1/2 hours
>>>>
>>>> Now after spending too much time for complete execution for this task
>>>> what
>>>> should I do?
>>>> I want to ask one thing that:
>>>> *Is this Use Case is possible with Hive?* If possible what should I do
>>>> in
>>>>
>>>> my program to increase the performance?
>>>> *And If not possible what is the other good way to implement this Use
>>>> Case?*
>>>>
>>>>
>>>> Please reply me.
>>>> Thanks
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Bhavesh Shah
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>>
>>
>>
>> --
>> Regards,
>> Bhavesh Shah
>>
>>
>
>
> --
> Nitin Pawar
>
>

Reply via email to