> On 2011-09-23 02:04:19, Arvind Prabhakar wrote:
> > Thanks for taking this up Jarek. This is a very important functionality 
> > extension for Sqoop. Looking at the code change, I feel that you have tried 
> > to implement the super-set functionality such that the user can:
> > 
> > * Either map SQL types directly to Java/Hive types, or
> > * Map specific columns to Java/Hive types.
> > 
> > Between these two, I feel that the later is more relevant use-case for 
> > Sqoop consumers. Mapping SQL types to Java/Hive types can be done by 
> > extending the Manager and that in itself is not as flexible for the user as 
> > the other option of mapping specific columns to a data type. Even when 
> > considering the option to map columns to specific data types, the user may 
> > not necessarily know what column names Sqoop will use. If these column 
> > names do not match, the default mapping will be used silently and that 
> > could lead to other problems.
> > 
> > Therefore I suggest the following:
> > * Introduce a new option that tells Sqoop to generate a mapping file for 
> > the job. This file could be a java properties file that contains the names 
> > of columns as read by Sqoop and their default mappings and does not run the 
> > actual job.
> > * Introduce another new option that tells Sqoop to use a given mapping file 
> > for the job.
> > 
> > So the typical workflow would be - if you want to run an import you would 
> > do the following:
> > * run: sqoop import --connect .... --genrate-mapping-only 
> > /path/to/mapping-file
> > * manually modify the mapping file to override the default types where 
> > necessary
> > * run: sqoop import --connect .... --use-mapping-file /path/to/mapping-file
> > 
> > What do you think about this approach?
> 
> Jarek Jarcec wrote:
>     Hi Arvind,
>     thank you for your review. I just noticed that I did not include basic 
> help for my changes, so obviously you have to dig it into source code to find 
> out what and how is working. I'm sorry for that. I'll not forgot to include 
> basic help next time.
>     
>     I did not realized that SQL types can be mapped by extending Manager 
> class. My original goal here was to offer user chance to change any type 
> mapping out of the box, without any source code changes. SQL type mapping can 
> be simulated by the column mapping (on precondition that user do know names 
> of of all "problematic" columns), so I don't have problems to implement only 
> the second part as you suggested.
>     
>     I think that in most cases the column names are known or user can force 
> them to be known (for example by using "as": SELECT bla-lba-bla AS 
> known_value), so I would say that user can pass the column names without 
> reading the mapping in a normal situation. I don't see a problem to add a 
> option that would generate mapping file that user can read to find out the 
> names used by sqoop, but I would prefer to pass new mapping on command line 
> rather than in separate file. Reason for that is than most of the times, I'm 
> executing sqoop using oozie and in this case it's not easy to guarantee that 
> given property file is located on all nodes.
>     
>     What do you think?
>     
>     Jarcec
> 
> Arvind Prabhakar wrote:
>     Thanks Jarek. I agree with your assessment that you could use projections 
> to tie down the column names, although it will be come more verbose for users 
> who simply want to import a table. I understand your preference of specifying 
> it on the command line for Oozie integration - that makes sense. 
>     
>     Overall - I think your approach to this is good and we can implement it 
> that way. The one suggestion I have is that if a mapping cannot be applied, 
> it should raise an exception that fails the job, as opposed to defaulting to 
> the built-in mapping. For example if a user specifies a column name that does 
> not exist - that should be an error.

Hi Arvind,
let me summarize our conversation so that we're both on the same page. I'll 
change my patch to support only mapping based on column name which will be 
specified on command line. I'll throw an exception in case that some mapping 
will not be used.

For the second part "creating mapping for columns and save it to output file", 
I would suggest to create new JIRA bug and place code for that to codegen tool. 
I think that it is better place for such functionality because import tool 
should be used for importing data. Stopping import just because user specified 
parameter "--generate-mapping" doesn't seem to me as good idea. On the other 
hand codegen tool is meant for generating code, so I would hack it to generate 
the mapping as well in case that "--generate-mapping" parameter is present.

Do you have any objections?

Jarcec


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1975/#review2032
-----------------------------------------------------------


On 2011-09-20 08:28:11, Jarek Jarcec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1975/
> -----------------------------------------------------------
> 
> (Updated 2011-09-20 08:28:11)
> 
> 
> Review request for Sqoop and Arvind Prabhakar.
> 
> 
> Summary
> -------
> 
> This is not fully featured patch yet, it's more only preview of what I have 
> in my mind when I created the bug and how would I image to solve it. I would 
> like to check with community whether this is acceptable solution and if so, 
> I'll finish the patch.
> 
> Things that are missing and I'll add them if this way will be accepted:
> * Tests
> * Documentation
> * Supporting for type names (so that user don't have to type the integer 
> constants on command line)
> 
> Any feedback will be greatly appreciated.
> 
> 
> This addresses bug sqoop-342.
>     https://issues.apache.org/jira/browse/sqoop-342
> 
> 
> Diffs
> -----
> 
>   /src/java/com/cloudera/sqoop/SqoopOptions.java 1172203 
>   /src/java/com/cloudera/sqoop/hive/HiveTypes.java 1172203 
>   /src/java/com/cloudera/sqoop/hive/TableDefWriter.java 1172203 
>   /src/java/com/cloudera/sqoop/manager/ConnManager.java 1172203 
>   /src/java/com/cloudera/sqoop/manager/SqlManager.java 1172203 
>   /src/java/com/cloudera/sqoop/mapreduce/JdbcExportJob.java 1172203 
>   /src/java/com/cloudera/sqoop/orm/ClassWriter.java 1172203 
>   /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1172203 
> 
> Diff: https://reviews.apache.org/r/1975/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jarek
> 
>

Reply via email to