[jira] [Commented] (SQOOP-331) Support boundary query on the command line

[email protected] (Commented) (JIRA) Mon, 26 Sep 2011 13:00:42 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114886#comment-13114886
 ]

[email protected] commented on SQOOP-331:
-----------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1946/#review2034
-----------------------------------------------------------

Jarek, the changes look good. Regarding the test case problem you have 
mentioned, I don't think that it is really a test case issue. On the contrary, 
it seems to be a bug to me. Here is why: the purpose of boundary query is to 
limit the overall size of ingest from the database. It does not and should not 
matter how many mappers are used to perform the ingest. You can fix this by 
modifying the DataDrivenDBInputFormat.getSplits() method and ensuring that the 
single split it generates for the one mapper case uses the boundary query if 
specified.

/src/docs/user/import.txt
<https://reviews.apache.org/r/1946/#comment4582>

    nit:The BOUNDARY need not be all capitalized.

/src/docs/user/import.txt
<https://reviews.apache.org/r/1946/#comment4583>

    Please remove the trailing whitespace.

/src/docs/user/import.txt
<https://reviews.apache.org/r/1946/#comment4584>

    Trailing whitespace.

- Arvind

On 2011-09-20 08:29:02, Jarek Jarcec wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1946/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-09-20 08:29:02)
bq.  
bq.  
bq.  Review request for Sqoop and Arvind Prabhakar.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  I've incorporated all Arvind's suggestions (hopefully :-)).
bq.  
bq.  
bq.  This addresses bug SQOOP-331.
bq.      https://issues.apache.org/jira/browse/SQOOP-331
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/docs/man/import-args.txt 1171925 
bq.    /src/docs/user/import.txt 1171925 
bq.    /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925 
bq.    /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925 
bq.    /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925 
bq.    /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925 
bq.    /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925 
bq.    /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925 
bq.  
bq.  Diff: https://reviews.apache.org/r/1946/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  I'm still having troubles to create meaningful tests for this patch. I've 
came up with two different approaches, but I wasn't able to get running either 
of them:
bq.  
bq.  1) Use boundary query for limiting import data (like "select 1, 2"). This 
is totally wrong usage of this parameter, but I was thinking that It might be 
fine for the testing purpose. Unfortunately underlying code is using this query 
only in case that is creating more than one map task and I was not able to 
forced it create more than one. Which make sense because the -m parameter is 
also only a hint.
bq.  
bq.  2) Parse logs. Fortunately class responsible for creating splits is 
printing used boundary query, so there is possibility to parse those logs and 
look for used boundary query. But I'm not sure how this can be done in proper 
fashion.
bq.  
bq.  Any ideas will be welcomed.
bq.  
bq.  Jarcec
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jarek
bq.  
bq.

> Support boundary query on the command line
> ------------------------------------------
>
>                 Key: SQOOP-331
>                 URL: https://issues.apache.org/jira/browse/SQOOP-331
>             Project: Sqoop
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 1.4.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>         Attachments: SQOOP-331.patch
>
>
> It would be nice if the sqoop would have ability to specify query that will 
> fetch minimal and maximal value for creating splits in 
> DataDrivenDBInputFormat from the command line.
> Normally sqoop will generate query to get maximal and minimal value for 
> creating splits in following form: SELECT min($split_by_column), 
> max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed 
> to import only portion of data with ranges based on the split_by_column that 
> I already have preselected and that are available in special table that holds 
> data ranges and appropriate primary key values. So my auto generated query 
> looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and 
> id <= max_id. That query is obviously useless and is just creating 
> unnecessary load on the database server. It would be nice to supply my own 
> boundary query that will use the extra table with data ranges.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-331) Support boundary query on the command line

Reply via email to