Hi, I'm running a PigScript on my Windows machine. I don't have a hadoop/pig environment installed.
Some questions : 1. Can I run PigUnit test cases in *Windows *without having any *hadoop*/*pig environment setup *? 2. Can I run PigUnit testcases in *local *mode through eclipse if I can configure the cluster details ? If yes, where can I provide my cluster details ? 3. Can I run PigUnit testcases in *mapreduce *mode through eclipse if I can configure the cluster details ? If yes, where can I provide my cluster details ? 4. Can I build maven jar without running test cases in my Windows machine and deploy them in a cluster having hadoop/pig ? Appreciate your help. I executed a pigunit test case and it errored out. Please find the log below which has error details : 14/07/12 17:55:30 INFO pigunit.PigTest: Using default local mode 14/07/12 17:55:30 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: file:/// 14/07/12 17:55:30 INFO pigunit.PigTest: -- Load users from hdfs users = LOAD 'src/test/resources/input/users.txt' USING PigStorage(',') AS (id:long, firstName:chararray, lastName:chararray, country:chararray, city:chararray, company:chararray); -- Load ratings from hdfs awesomenessRating = LOAD 'src/test/resources/input/rating.txt' USING PigStorage(',') AS (userId:long, rating:long); -- Join records by userId joinedRecords = JOIN users BY id, awesomenessRating BY userId; -- Filter users with awesomenessRating > 150 filteredRecords = FILTER joinedRecords BY awesomenessRating::rating > 150; -- Generate fields that we are interested in generatedRecords = FOREACH filteredRecords GENERATE users::id AS id, users::firstName AS firstName, users::country AS country, awesomenessRating::rating AS rating; -- Store results STORE generatedRecords INTO 'src/test/resources/results/awesomeness' USING PigStorage(); 14/07/12 17:55:30 INFO util.Utils: Default bootup file C:\Users\krkrishnamoorthy/.pigbootup not found users = LOAD 'src/test/resources/input/users.txt' USING PigStorage(',') AS (id:long, firstName:chararray, lastName:chararray, country:chararray, city:chararray, company:chararray); --> users = LOAD 'src/test/resources/input/users.txt' USING PigStorage(',') AS (id:long,firstName:chararray,lastName:chararray,country:chararray,city:chararray,company:chararray); awesomenessRating = LOAD 'src/test/resources/input/rating.txt' USING PigStorage(',') AS (userId:long, rating:long); --> awesomenessRating = LOAD 'src/test/resources/input/awesomeness-rating.txt' USING PigStorage(',') AS (userId:long, rating:long); STORE generatedRecords INTO 'src/test/resources/results/awesomeness' USING PigStorage(); --> none 14/07/12 17:55:31 INFO pigstats.ScriptState: Pig features used in the script: HASH_JOIN 14/07/12 17:55:31 INFO optimizer.LogicalPlanOptimizer: {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 14/07/12 17:55:31 INFO mapReduceLayer.MRCompiler: File concatenation threshold: 100 optimistic? false 14/07/12 17:55:31 INFO mapReduceLayer.MRCompiler$LastInputStreamingOptimizer: Rewrite: POPackage->POForEach to POJoinPackage 14/07/12 17:55:31 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 14/07/12 17:55:31 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 14/07/12 17:55:31 INFO pigstats.ScriptState: Pig script settings are added to the job 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Setting up single store job 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Key [pig.schematuple] is false, will not generate code. 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Starting process to move generated code to distributed cache 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: C:\Users\KRKRIS~1\AppData\Local\Temp\1405212931260-0 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Reduce phase detected, estimating # of required reducers. 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 14/07/12 17:55:31 INFO mapReduceLayer.InputSizeReducerEstimator: BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Could not estimate number of reducers and no requested or default parallelism set. Defaulting to 1 reducer. 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Setting Parallelism to 1 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 1 map-reduce job(s) waiting for submission. 14/07/12 17:55:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/07/12 17:55:31 ERROR security.UserGroupInformation: PriviledgedActionException as:krkrishnamoorthy cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging to 0700 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 0% complete 14/07/12 17:55:31 WARN mapReduceLayer.MapReduceLauncher: Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: job null has failed! Stop running all dependent jobs 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 100% complete 14/07/12 17:55:31 WARN mapReduceLayer.Launcher: There is no log file to write to. 14/07/12 17:55:31 ERROR mapReduceLayer.Launcher: Backend error message during job submission java.io.IOException: Failed to set permissions of path: \tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157) at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134) at java.lang.Thread.run(Thread.java:744) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) 14/07/12 17:55:31 ERROR pigstats.SimplePigStats: ERROR: Failed to set permissions of path: \tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging to 0700 14/07/12 17:55:31 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 14/07/12 17:55:31 INFO pigstats.SimplePigStats: Detected Local mode. Stats reported below may be incomplete 14/07/12 17:55:31 INFO pigstats.SimplePigStats: Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.2.1 0.12.0 krkrishnamoorthy 2014-07-12 17:55:31 2014-07-12 17:55:31 HASH_JOIN Failed! Failed Jobs: JobId Alias Feature Message Outputs N/A awesomenessRating,joinedRecords,users HASH_JOIN Message: java.io.IOException: Failed to set permissions of path: \tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157) at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134) at java.lang.Thread.run(Thread.java:744) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) file:/tmp/temp49116140/tmp1118481539, Input(s): Failed to read data from "file:///C:/Users/krkrishnamoorthy/workspace/test/pig-unit-example/src/test/resources/input/awesomeness-rating.txt" Failed to read data from "file:///C:/Users/krkrishnamoorthy/workspace/test/pig-unit-example/src/test/resources/input/users.txt" Output(s): Failed to produce result in "file:/tmp/temp49116140/tmp1118481539" Job DAG: null 14/07/12 17:55:32 INFO mapReduceLayer.MapReduceLauncher: Failed! Thanks, Krishnan