There are couple of options I have to specify explicitly to make it work
using query option 1) target-dir 2) split-by OR --m
We import the data to Hive by hourly partition.
sqoop imprt --options-file "credential.txt"
--query "select col1,col2,'col3' as col3 ,col10 from
TABLE where
created between 1353960000000 and 1353963600000
\$CONDITION"
--target-dir='/tmp/TABLENAME/12-01-2002-00-00-00'
--hive-import
--hive-partition-key part --hive-partition-value
'PARTITION_VALUE'
--hive-overwrite
--hive-delims-replacement
--split-by created (OR) --m=1
Previously we were using --table option so we don't have to specify column
name, target directory and split-by option.
By implementing this the existing schema is intact in Hive and no changes
required.
On Sun, Dec 2, 2012 at 9:11 AM, Jarek Jarcec Cecho <[email protected]>wrote:
> Hi Anad,
> would you mind providing details on your question about partition support
> in query argument? I'm not sure what you mean and how it connects to
> something that is supported by --table argument.
>
> Jarcec
>
> On Sat, Dec 01, 2012 at 08:13:34PM -0800, Anandha L Ranganathan wrote:
> > Hi Jarcec,
> >
> > I went with second option.
> >
> > giving "select col1, col2, NULL as col3, col4" was throwing
> > NullPointereException.
> >
> > So I went with
> >
> > "Select col1,col2,'col3' as col3, col4"
> >
> > Thanks
> > Anand
> >
> >
> >
> >
> >
> > The second option --query looks good but is it support partition ?
> >
> >
> > On Sat, Dec 1, 2012 at 10:03 AM, Jarek Jarcec Cecho <[email protected]
> >wrote:
> >
> > > Hi Anandha,
> > > I do have couple of suggestions that could help in your scenario:
> > >
> > > 1) Rename your hive table to "TABLENAME_OLD" or something and let Sqoop
> > > create new hive table with proper schema. This way you will not loose
> any
> > > data nor appropriate schema, however you will end up with two tables.
> > >
> > > 2) Change your table based import to query based import and select one
> > > column extra:
> > >
> > > --query "select col1, col2, NULL as col3, col4, ..."
> > >
> > > Jarcec
> > >
> > > On Fri, Nov 30, 2012 at 06:15:02PM -0800, Anandha L Ranganathan wrote:
> > > > In my Sqoop import one of the column in the source table got deleted
> and
> > > > that resulting in data issue. This resulting data are off by 1
> column.
> > > > The removed column was in the middle of the schema. If it were last
> > > column
> > > > then wouldn't have any worries.
> > > >
> > > > Data is imported from MySql to Hive using Sqoop. I am using
> > > sqoop-1.3.0
> > > >
> > > > Here is the syntax.
> > > >
> > > > sqoop import --hive-import
> > > > --options-file 'credential.txt'
> > > > --table 'TABLENAME '
> > > > --where 'created between 1353960000000 and
> 1353963600000'
> > > > --hive-partition-key part
> > > > --hive-partition-value 'PARTITION_VALUE'
> > > > --hive-overwrite
> > > > --hive-delims-replacement
> > > >
> > > >
> > > >
> > > > Now the problem is One of the column in the source DB got removed.
> > > >
> > > > I tried with workaround by including the --columns
> > > >
> > > > 1) By hardcoding third column with quotes.
> > > > --columns "col1,col2,'col3' as col3,col4"
> > > >
> > > > but this gives error Column name 'col3' not in table
> > > >
> > > >
> > > > 2) Then i tried with (col2 repeated twice)
> > > > --columns " col1,col2, col2 , col4"
> > > >
> > > > It threw an error
> > > > Imported Failed: Duplicate Column identifier specified:
> > > >
> > > >
> > > > 3) Then i tried with (col2 as col3)
> > > > --columns " col1,col2, col2 as col3, col4"
> > > >
> > > > ERROR tool.ImportTool: Imported Failed: Column name 'authid uuid'
> not in
> > > > table
> > > >
> > > > Could anybody suggest workaround for this.
> > > >
> > > > Thanks
> > >
>