Thanks for your response Jarek :) I've started a new import run with --hive-drop-import-delims added and --direct removed (since the two are mutually exclusive), we'll see how it goes.
Going to sleep now. I'll report back tomorrow :) -- Felix On Thu, Mar 21, 2013 at 12:42 AM, Jarek Jarcec Cecho <[email protected]>wrote: > Hi Felix, > we've seen similar behaviour in the past when the data itself contains > Hive special characters like new line characters. Would you mind trying > your import with --hive-drop-import-delims to see if it helps? > > Jarcec > > On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote: > > Hello, > > > > I'm trying to import a full table from MySQL to Hadoop/Hive. It works > with > > certain parameters, but when I try to do an ETL that's somewhat more > > complex, I start getting bogus rows in my resulting table. > > > > This works: > > > > sqoop import \ > > --connect > > > 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull' > > \ > > --username xxxxx \ > > --password xxxxx \ > > --hive-import \ > > --hive-overwrite \ > > -m 23 \ > > --direct \ > > --hive-table profile_felix_test17 \ > > --split-by id \ > > --table Profile > > > > But if I use a --query instead of a --table, then I start getting bogus > > records (and by that, I mean rows that have a non-sensically high primary > > key that doesn't exist in my source database and null for the rest of the > > cells). > > > > The output I get with the above query is not exactly the way I want it. > > Using --query, I can get the data in the format I want (by transforming > > some stuff inside MySQL), but then I also get the bogus rows, which > pretty > > much makes the Hive table unusable. > > > > I tried various combinations of parameters and it's hard to pin-point > > exactly what causes the problem, so it could be more intricate than my > > above simplistic description. That being said, removing --table and > adding > > the following params definitely breaks it: > > > > --target-dir /tests/sqoop/general/profile_felix_test \ > > --query "select * from Profile WHERE \$CONDITIONS" > > > > (Ultimately, I want to use a query that's more complex than this, but > even > > a simple query like this breaks...) > > > > Any ideas why this would happen and how to solve it? > > > > Is this the kind of problem that Sqoop2's cleaner architecture intends to > > solve? > > > > I use CDH 4.2, BTW. > > > > Thanks :) ! > > > > -- > > Felix >
