Dear Tariq

No, exactly in opposite way, actually we compute the similarity between documents and insert them in database, in every table almost 2/000/000 records.

Best Regards

On 02/19/2013 06:41 PM, Mohammad Tariq wrote:
Hello Masoud,

So you want to pull your data from SQL server to your Hadoop cluster first and then do the processing. Please correct me if I am wrong. You can do that using Sqoop as mention by Hemanth sir. BTW, what exactly is the kind of processing which you are planning to do on your data.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com <http://cloudfront.blogspot.com>


On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    You could consider using sqoop. http://sqoop.apache.org/ there
    seemed to be a SQL connector from Microsoft.
    http://www.microsoft.com/en-gb/download/details.aspx?id=27584

    Thanks
    Hemanth

    On Tuesday, February 19, 2013, Masoud wrote:

        Hello Tariq,

        Our database is sql server 2008,
        and we dont need to develop a professional app, we just need
        to develop it fast and make our experiment result soon.
        Thanks


        On 02/18/2013 11:58 PM, Hemanth Yamijala wrote:
        What database is this ? Was hbase mentioned ?

        On Monday, February 18, 2013, Mohammad Tariq wrote:

            Hello Masoud,
                      You can use the Bulk Load feature. You might
            find it more
            efficient than normal client APIs or using
            the TableOutputFormat.

            The bulk load feature uses a MapReduce job to output
            table data
            in HBase's internal data format, and then directly loads the
            generated StoreFiles into a running cluster. Using bulk
            load will use
            less CPU and network resources than simply using the
            HBase API.

            For a detailed info you can go here :
            http://hbase.apache.org/book/arch.bulk.load.html

            Warm Regards,
            Tariq
            https://mtariq.jux.com/
            cloudfront.blogspot.com <http://cloudfront.blogspot.com>


            On Mon, Feb 18, 2013 at 5:00 PM, Masoud
            <[email protected]> wrote:


                Dear All,

                We are going to do our experiment of a scientific
                papers, ]
                We must insert data in our database for later
                consideration, it almost
                300 tables each one has 2/000/000 records.
                as you know It takes lots of time to do it with a
                single machine,
                we are going to use our Hadoop cluster (32 machines)
                and divide 300
                insertion tasks between them,
                I need some hint to progress faster,
                1- as i know we dont need to Reduser, just Mapper in
                enough.
                2- so wee need just implement Mapper class with
                needed code.

                Please let me know if there is any point,

                Best Regards
                Masoud






-- Masoud Reyhani Hamedani
        Ph.D. Candidate
        Department of Electronics and Computer Engineering, Hanyang University
        Data Mining and Knowledge Engineering Lab,
        Room 803 IT/BT Building 17
        Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
        Tel: +82-2-2220-4567
        [email protected]
        http://agape.hanyang.ac.kr




--
Masoud Reyhani Hamedani
Ph.D. Candidate
Department of Electronics and Computer Engineering, Hanyang University
Data Mining and Knowledge Engineering Lab,
Room 803 IT/BT Building 17
Haengdang-dong, Sungdong-gu Seoul, Republic of Korea, 133-791
Tel: +82-2-2220-4567
[email protected]
http://agape.hanyang.ac.kr

Reply via email to