Hi, I am new to Hbase/Hadoop concept. Following is the scenario -:
1) Our Hadoop is installed in a remote system. Data is loaded in HBase through HBase writer. 2) I am trying to install pig on my local mac OS X( version 10.6.5) so that i will fetch data from that remote system. I downloaded Pig latest release from http://pig.apache.org/releases.html ( 17 December, 2010: release 0.8.0 available) I did the following things - : supp:~ rashmi$ export PATH=/Users/rashmi/Desktop/pig-0.8.0/bin:$PATH supp:~ rashmi$ pig -help Error: JAVA_HOME is not set. supp:~ rashmi$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home when i ran pig -help i got the following output -: supp:~ rashmi$ pig -help Apache Pig version 0.8.0 (r1043805) compiled Dec 08 2010, 17:26:09 USAGE: Pig [options] [-] : Run interactively in grunt shell. Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s). Pig [options] [-f[ile]] file : Run cmds found in file. options include: -4, -log4jconf - Log4j configuration file, overrides log conf -b, -brief - Brief logging (no timestamps) -c, -check - Syntax check -d, -debug - Debug level, INFO is default -e, -execute - Commands to execute (within quotes) -f, -file - Path to the script to execute -h, -help - Display this message. You can specify topic to get help for that topic. properties is the only topic currently supported: -h properties. -i, -version - Display version information -l, -logfile - Path to client side log file; default is current working directory. -m, -param_file - Path to the parameter file -p, -param - Key value pair of the form param=val -r, -dryrun - Produces script with substituted parameters. Script is not executed. -t, -optimizer_off - Turn optimizations off. The following values are supported: SplitFilter - Split filter conditions MergeFilter - Merge filter conditions PushUpFilter - Filter as early as possible PushDownForeachFlatten - Join or explode as late as possible ColumnMapKeyPrune - Remove unused data LimitOptimizer - Limit as early as possible AddForEach - Add ForEach to remove unneeded columns MergeForEach - Merge adjacent ForEach LogicalExpressionSimplifier - Combine multiple expressions All - Disable all optimizations All optimizations are enabled by default. Optimization values are case insensitive. -v, -verbose - Print all error messages to screen -w, -warning - Turn warning logging on; also turns warning aggregation off -x, -exectype - Set execution mode: local|mapreduce, default is mapreduce. -F, -stop_on_failure - Aborts execution on the first failed job; default is off -M, -no_multiquery - Turn multiquery optimization off; default is on -P, -propertyFile - Path to property file when i ran pig command i got the following error -: supp:~ rashmi$ pig 2011-02-22 12:48:26,319 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/rashmi/pig_1298359106317.log 2011-02-22 12:48:26,474 [main] ERROR org.apache.pig.Main - ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).If you plan to use local mode, please put -x local option in command line Details at logfile: /Users/rashmi/pig_1298359106317.log My Question is 1) What all i need to do , so that i could connect to remote hadoop system and fetch data. I read the documentation for this , but couldn't get any clear idea. may be becoz i m not java developer. But could you please explain what all changes i need to do in my case? I ll be highly grateful for this. -- Thanks and Regards Rashmi R B
