The --direct option most of the time is incompatible with many of the parsing options in Sqoop (unless some of them are specifically adapted).
Some of the connectors do not allow --direct and --query for example (for example Postgres, Netezza etc). May be other connectors should check also. But I am not sure what is the intersection of --query and --hive-drop-delims? I mean, if it is table or query should not affect the output processing. Do you mean --direct and --hive-drop-delims? Thanks Venkat On Thu, Mar 21, 2013 at 5:32 PM, Felix GV <[email protected]> wrote: > So... the verdict appears to be the following: > > Using --direct with --table produces correct results. > Using --query instead of --table absolutely requires using > --hive-drop-import-delims. > --hive-drop-import-delims is not compatible with --direct. > Sqoop allows you to use --query and --direct together, but produces bogus > results. > > Considering the nature of --hive-drop-import-delims, I would say it's quite > likely that the above observations only hold true if the data in the > imported table actually contains the conflicting delimiting characters. > > In any case, don't take my word for it: test on your own dataset if you're > having any problem. Hopefully the above observations can provide a good > starting point though. > > Maybe Sqoop should spit out an error or a warning if --query is used without > --hive-drop-import-delims ...? > > -- > Felix > > > On Thu, Mar 21, 2013 at 4:46 PM, Felix GV <[email protected]> wrote: >> >> I seem to be getting a proper output with the above parameters BTW. >> >> I'll try to re-integrate the rest of my more complex ETL query in that >> sqoop job... >> >> Thanks :) ! >> >> -- >> Felix >> >> >> On Thu, Mar 21, 2013 at 12:47 AM, Felix GV <[email protected]> wrote: >>> >>> Thanks for your response Jarek :) >>> >>> I've started a new import run with --hive-drop-import-delims added and >>> --direct removed (since the two are mutually exclusive), we'll see how it >>> goes. >>> >>> Going to sleep now. I'll report back tomorrow :) >>> >>> -- >>> Felix >>> >>> >>> On Thu, Mar 21, 2013 at 12:42 AM, Jarek Jarcec Cecho <[email protected]> >>> wrote: >>>> >>>> Hi Felix, >>>> we've seen similar behaviour in the past when the data itself contains >>>> Hive special characters like new line characters. Would you mind trying >>>> your >>>> import with --hive-drop-import-delims to see if it helps? >>>> >>>> Jarcec >>>> >>>> On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote: >>>> > Hello, >>>> > >>>> > I'm trying to import a full table from MySQL to Hadoop/Hive. It works >>>> > with >>>> > certain parameters, but when I try to do an ETL that's somewhat more >>>> > complex, I start getting bogus rows in my resulting table. >>>> > >>>> > This works: >>>> > >>>> > sqoop import \ >>>> > --connect >>>> > >>>> > 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull' >>>> > \ >>>> > --username xxxxx \ >>>> > --password xxxxx \ >>>> > --hive-import \ >>>> > --hive-overwrite \ >>>> > -m 23 \ >>>> > --direct \ >>>> > --hive-table profile_felix_test17 \ >>>> > --split-by id \ >>>> > --table Profile >>>> > >>>> > But if I use a --query instead of a --table, then I start getting >>>> > bogus >>>> > records (and by that, I mean rows that have a non-sensically high >>>> > primary >>>> > key that doesn't exist in my source database and null for the rest of >>>> > the >>>> > cells). >>>> > >>>> > The output I get with the above query is not exactly the way I want >>>> > it. >>>> > Using --query, I can get the data in the format I want (by >>>> > transforming >>>> > some stuff inside MySQL), but then I also get the bogus rows, which >>>> > pretty >>>> > much makes the Hive table unusable. >>>> > >>>> > I tried various combinations of parameters and it's hard to pin-point >>>> > exactly what causes the problem, so it could be more intricate than my >>>> > above simplistic description. That being said, removing --table and >>>> > adding >>>> > the following params definitely breaks it: >>>> > >>>> > --target-dir /tests/sqoop/general/profile_felix_test \ >>>> > --query "select * from Profile WHERE \$CONDITIONS" >>>> > >>>> > (Ultimately, I want to use a query that's more complex than this, but >>>> > even >>>> > a simple query like this breaks...) >>>> > >>>> > Any ideas why this would happen and how to solve it? >>>> > >>>> > Is this the kind of problem that Sqoop2's cleaner architecture intends >>>> > to >>>> > solve? >>>> > >>>> > I use CDH 4.2, BTW. >>>> > >>>> > Thanks :) ! >>>> > >>>> > -- >>>> > Felix >>> >>> >> >
