Re: Database insertion by HAdoop

Masoud Tue, 19 Feb 2013 03:05:21 -0800

Dear Tariq

No, exactly in opposite way, actually we compute the similarity betweendocuments and insert them in database, in every table almost 2/000/000records.


Best Regards

On 02/19/2013 06:41 PM, Mohammad Tariq wrote:

Hello Masoud,

So you want to pull your data from SQL server to your Hadoopcluster first and then do the processing. Please correct me if I amwrong. You can do that using Sqoop as mention by Hemanth sir. BTW,what exactly is the kind of processing which you are planning to do onyour data.


Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com <http://cloudfront.blogspot.com>

On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala<[email protected] <mailto:[email protected]>> wrote:


    Hi,

    You could consider using sqoop. http://sqoop.apache.org/ there
    seemed to be a SQL connector from Microsoft.
    http://www.microsoft.com/en-gb/download/details.aspx?id=27584

    Thanks
    Hemanth

    On Tuesday, February 19, 2013, Masoud wrote:

        Hello Tariq,

        Our database is sql server 2008,
        and we dont need to develop a professional app, we just need
        to develop it fast and make our experiment result soon.
        Thanks


        On 02/18/2013 11:58 PM, Hemanth Yamijala wrote:

        What database is this ? Was hbase mentioned ?

        On Monday, February 18, 2013, Mohammad Tariq wrote:

            Hello Masoud,
                      You can use the Bulk Load feature. You might
            find it more
            efficient than normal client APIs or using
            the TableOutputFormat.

            The bulk load feature uses a MapReduce job to output
            table data
            in HBase's internal data format, and then directly loads the
            generated StoreFiles into a running cluster. Using bulk
            load will use
            less CPU and network resources than simply using the
            HBase API.

            For a detailed info you can go here :
            http://hbase.apache.org/book/arch.bulk.load.html

            Warm Regards,
            Tariq
            https://mtariq.jux.com/
            cloudfront.blogspot.com <http://cloudfront.blogspot.com>


            On Mon, Feb 18, 2013 at 5:00 PM, Masoud
            <[email protected]> wrote:


                Dear All,

                We are going to do our experiment of a scientific
                papers, ]
                We must insert data in our database for later
                consideration, it almost
                300 tables each one has 2/000/000 records.
                as you know It takes lots of time to do it with a
                single machine,
                we are going to use our Hadoop cluster (32 machines)
                and divide 300
                insertion tasks between them,
                I need some hint to progress faster,
                1- as i know we dont need to Reduser, just Mapper in
                enough.
                2- so wee need just implement Mapper class with
                needed code.

                Please let me know if there is any point,

                Best Regards
                Masoud

--Masoud Reyhani Hamedani

        Ph.D. Candidate
        Department of Electronics and Computer Engineering, Hanyang University
        Data Mining and Knowledge Engineering Lab,
        Room 803 IT/BT Building 17
        Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
        Tel: +82-2-2220-4567
        [email protected]
        http://agape.hanyang.ac.kr



--
Masoud Reyhani Hamedani
Ph.D. Candidate
Department of Electronics and Computer Engineering, Hanyang University
Data Mining and Knowledge Engineering Lab,
Room 803 IT/BT Building 17
Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
Tel: +82-2-2220-4567
[email protected]
http://agape.hanyang.ac.kr

Re: Database insertion by HAdoop

Reply via email to