Re: Sqoop downloads split into chunks

Jarek Jarcec Cecho Thu, 24 May 2012 00:20:18 -0700

Hi Brian,
parameter --num-mappers will limit number of parallel threads exporting your 
data. Which should decrease load on your server. However you're right that by 
limiting --num-mappers to small number you will increase amount of data that 
will be transferred in each mapper.

Another way of limiting export data is parameter --where (for table import), 
that could be basically anything that will be passed into the WHERE clause of 
generated SQL statement. You can limit export data with this --where and thus 
form your batch almost arbitrary. For example if your table have autoincrement 
integer based primary key, you can very easily specify range of keys that you 
want to export in each call.

I'm not sure what your use case is, but it appears to me that you're exporting 
your tables on periodical basis, each time with full import. If that is right, 
you might consider sqoop "incremental import" support:

http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_incremental_imports

Jarcec

On Thu, May 24, 2012 at 12:04:22AM -0700, Brian Tran wrote:
> Hi Sqoop gurus,
> 
> I currently use Sqoop to import from MySQL into HDFS.
> 
> Some of the tables that I import have become significantly larger to the
> point that a full dump significantly slows down the host.
> 
> I would like to split the imports into smaller chunks, but limit the number
> of chunks I download in parallel to avoid significant load on the server.
> 
> Is there anything in Sqoop that provides this functionality?
> 
> The closest thing I could find in the Sqoop user guide was the
> --num-mappers option, but using it to download in smaller chunks would
> increase the server load as all the chunks are downloaded in parallel.
> 
> Thanks!
> 
> Brian

signature.asc
Description: Digital signature

Re: Sqoop downloads split into chunks

Reply via email to