Oh got it. Thanks! Jarec. I'll try and let you know if I get it working. On Tue, Oct 23, 2012 at 12:58 PM, Jarek Jarcec Cecho <[email protected]>wrote:
> Hi Chalcy, > let me try :-) > > Merge is taking two directories on HDFS and updating logical rows in those > files. Those files are most likely in form of CSV files without any > additional metadata. However sqoop needs those metadata - for example how > many columns are there? What are the column names and data types? What are > the delimiters? Normally, such information is retrieved from database, > however in merge case, there is no connection to database (as you correctly > guessed). And therefore you need to supply previously generated class. > > Does that help in understanding the issue you're facing? > > Jarcec > > On Tue, Oct 23, 2012 at 12:00:55PM -0400, Chalcy wrote: > > Hi Jarec, > > > > If we are merging two hdfs data, I do not understand why we would need > > database connection. Could you explain? > > > > Thanks, > > Chalcy > > > > > > On Tue, Oct 23, 2012 at 10:59 AM, Jarek Jarcec Cecho <[email protected] > >wrote: > > > > > Hi Chalcy, > > > Sqoop needs to be able to parse the files you're trying to merge as > newer > > > entries must be updated. Usually Sqoop generate special class for this > > > purpose based on connection in use, however in merge case there is no > > > connection to the database and therefore you need to specify such class > > > manually. This class is automatically generated for you in case of an > > > import tool and might be manually generated using codegen tool [1]. You > > > might get additional information about those two arguments in merge > tool in > > > our user guide [2]. > > > > > > Jarcec > > > > > > Links: > > > 1: > > > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal > > > 2: > > > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal > > > > > > On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote: > > > > Hello Sqoop users, > > > > > > > > I tried to use sqoop merge and understand all the parameters except > > > > --class-name and --jar-file. What should that be? Sqoop errors out > if I > > > > do not specify them. > > > > > > > > The command I am using is > > > > sqoop merge --new-data user/hadoop/testincrement --onto > > > > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir > --merge-key > > > > rowid > > > > > > > > Thanks, > > > > Chalcy > > > >
