Hello Wei, OK, great.
In addition to my previous comments we often see customers using more splits than appropriate. When considering splitting we also have to consider the cost of creating the additional map tasks, meaning importing/exporting 1M (just an example) rows can often be faster with a single map task vs. 2, 4, etc. Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Fri, Jul 1, 2016 at 11:46 AM, Wei Yan <ywsk...@gmail.com> wrote: > Markus, the [1,12] is an example :) > The table has 100+ millions records. > > On Fri, Jul 1, 2016 at 7:57 AM, Markus Kemper <mar...@cloudera.com> wrote: > >> Hello Wei, >> >> In addition to Liz's request, can you share the volumes (depth and width) >> of data you are working with? >> >> I am curious to know if they are really as small (12 rows) as previously >> noted. >> >> >> Markus Kemper >> Customer Operations Engineer >> [image: www.cloudera.com] <http://www.cloudera.com> >> >> >> On Fri, Jul 1, 2016 at 10:48 AM, Erzsebet Szilagyi < >> liz.szila...@cloudera.com> wrote: >> >>> Hi Wei, >>> Let us know if fine tuning the number of map tasks solved your problem >>> or we should dig further into it. >>> Thanks, >>> Liz >>> >>> >>> On Fri, Jul 1, 2016 at 7:57 AM, Wei Yan <ywsk...@gmail.com> wrote: >>> >>>> Thanks, Erzsebet and Markus. Tuning the number of map tasks can be a >>>> reasonal solution here, and I'll try that. >>>> As Sqoop 1 is a MapReduce job, I think it's hard to have both (1) many >>>> small queries and (2) limited concurrent executing queries. >>>> >>>> -Wei >>>> >>>> On Thu, Jun 30, 2016 at 3:50 PM, Erzsebet Szilagyi < >>>> liz.szila...@cloudera.com> wrote: >>>> >>>>> Hi Wei, >>>>> Markus (in CC) offered the following explanation: >>>>> >>>>> " >>>>> The Sqoop1 default is 4 map tasks. When working with customers I >>>>> usually start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8) >>>>> until finding a performance sweet spot while keeping in mind the potential >>>>> rdbms impact. >>>>> >>>>> Estimating the real rdbms impact is often challenging for some of the >>>>> following reasons: >>>>> 1. DBAs are often not present >>>>> 2. Jobs are often reviewed in isolation (excluding other simultaneous >>>>> Sqoop or non-sqoop workloads) >>>>> 3. Tests are often performed against smaller data volumes and/or >>>>> virtual resources than what will be in production (includes rdbms, network >>>>> and had pop cluster) >>>>> 4. There is not a uniform way to monitor/analyze impact across rdbms >>>>> vendors. >>>>> 4.1. I have not really tried to review Sqoop console debug from a dB >>>>> impact context, perhaps it could be used. >>>>> 5. Once deployed production job volumes often change >>>>> >>>>> Thanks, Markus >>>>> " >>>>> >>>>> On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <ywsk...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Would like to check whether Sqoop supports this type of ingestion: >>>>>> consider we have records with range [1,12], and we have 3 mappers. So in >>>>>> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12]. >>>>>> >>>>>> Not sure whether we can split the range to smaller one, like, [1], >>>>>> [2], [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We >>>>>> want this feature because: (1) if configured smaller mapper number, each >>>>>> mapper will be assigned a larger range and take much longer time to >>>>>> finish, >>>>>> and the infra may kill long running query; (2) But if we configured a >>>>>> larger mapper number, each mapper has a smaller range, but meanwhile we >>>>>> generates lots of network traffic to the database, which will also be >>>>>> bad. >>>>>> One good way we want is: still 12 ranges, but 3 mappers, and at most 3 >>>>>> concurrent connections at most. >>>>>> >>>>>> Appreciate any help here. >>>>>> >>>>>> -Wei >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Erzsebet Szilagyi >>>>> Software Engineer >>>>> [image: www.cloudera.com] <http://www.cloudera.com> >>>>> >>>> >>>> >>> >>> >>> -- >>> Erzsebet Szilagyi >>> Software Engineer >>> [image: www.cloudera.com] <http://www.cloudera.com> >>> >> >> >