David, What database are you importing from? The description I gave was for datatypes that map to the BigDecimal Splitter. The userguide might be referring to the IntegerSplitter which will add the remainder to the last value.
-Abe On Wed, Jun 19, 2013 at 1:23 PM, David Kincaid <[email protected]>wrote: > Thanks. We didn't specify the number of mappers, so it's giving us 4. I > understand your explanation, but it seems to conflict with the Sqoop user > guide ( > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_controlling_parallelism > ): > > "When performing parallel imports, Sqoop needs a criterion by which it > can split the workload. Sqoop uses a *splitting column* to split the > workload. By default, Sqoop will identify the primary key column (if > present) in a table and use it as the splitting column. The low and high > values for the splitting column are retrieved from the database, and the > map tasks operate on evenly-sized components of the total range. For > example, if you had a table with a primary key column of id whose minimum > value was 0 and maximum value was 1000, and Sqoop was directed to use 4 > tasks, Sqoop would run four processes which each execute SQL statements of > the form SELECT * FROM sometable WHERE id >= lo AND id < hi, with (lo, hi) set > to (0, 250), (250, 500), (500, 750), and (750, 1001) in the different > tasks." > > > On Wed, Jun 19, 2013 at 3:14 PM, Abraham Elmahrek <[email protected]>wrote: > >> Hey David, >> >> Here's the algorithm: >> Split lengths are defined by (max - min)/(# mappers) and whatever is left >> is tacked on at the end. So in this case, (288272191-2110)/3 = >> 96090027.33... So I'm assuming the .33... is rounded down and split lengths >> will be of length 96090027. Sqoop will then create splits with the >> following points: (min) + (range length)*(n). We can see that 2110 + >> 96090027*0 >> = 2110, 2110 + 96090027*1 = 96092137, 2110 + 96090027*2 = 192182164, and 2110 >> + 96090027*3 = 288272191 will be generated based off of this algorithm. >> The last point to be added will be 288272192 because the max value is >> not part of the generated split points. Then sqoop will distributed >> accordingly based off of these points as you've pointed out above. >> >> Just to be sure, did you configure sqoop to use 3 mappers? >> >> Hope this helps, >> -Abe >> >> >> On Wed, Jun 19, 2013 at 8:33 AM, David Kincaid <[email protected]>wrote: >> >>> We're seeing a strange thing happen with a sqoop import job with the way >>> the key range is getting distributed among the 4 mappers that are running. >>> The minimum key value is 2110 and the maximum value is 288272191. We are >>> getting one mapper that is only getting one record to import. Here is the >>> distribution among the mappers: >>> >>> [2110, 96092137) >>> [96092137, 192182164) >>> [192182164, 288272191) >>> [288272191, 288272192) >>> >>> you can see that the fourth mapper is given a range with only one value >>> in it. Could someone help me understand what is going on? >>> >>> Thanks, >>> >>> Dave >>> >> >> >
