Before casting fields to the schema you specified, loader needs to split each record into fields. For PigStorage (the loader used in your script), the default field separator is '\t'. Since the data file doesn't use '\t' to mark the field boundary, the loader reads the whole record into a single field.
-Richard On 4/29/11 7:12 AM, "Zeynep PEHLIVAN" <[email protected]> wrote: Hi to all, I am newbie and I am just testing small scripts for training. My question is about the result of the script below in local mode: grunt> cat nested.txt {(8,9),(0,1)},{(8,9),(1,1)} {(2,3),(4,5)},{(2,3),(4,5)} {(6,7),(3,7)},{(2,2),(3,7)} grunt> A = LOAD 'nested.txt' AS (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)}); grunt> DUMP A; ({(8,9),(0,1)},) ({(2,3),(4,5)},) ({(6,7),(3,7)},) Why B2 is not displayed !???? When I executed the same script with PigPen, B2 is displayed but this time I have only one result instead of three. You can find the screenshot in the attachment. When I use grunt shell, I have all the messages below before displaying the result and it takes too much time. Should I use a parameter with pig -x local to avoid this? or I made errors with my installation? THANKS IN ADVANCE grunt> A = LOAD 'nested.txt' AS (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)}); grunt> DUMP A; 2011-04-29 15:37:44,954 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-04-29 15:37:44,954 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-04-29 15:37:44,955 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:44,959 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(file:/tmp/temp643030084/tmp-1663465556:org.apache.pig.impl.io.InterStorage) - scope-48 Operator Key: scope-48) 2011-04-29 15:37:44,959 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-04-29 15:37:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-04-29 15:37:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-04-29 15:37:44,961 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:44,964 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:44,966 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-04-29 15:37:44,966 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-04-29 15:37:46,270 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-04-29 15:37:46,273 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,275 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-04-29 15:37:46,295 [Thread-57] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,300 [Thread-57] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,308 [Thread-57] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-04-29 15:37:46,308 [Thread-57] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-04-29 15:37:46,308 [Thread-57] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-04-29 15:37:46,402 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,407 [Thread-66] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-04-29 15:37:46,407 [Thread-66] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-04-29 15:37:46,407 [Thread-66] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-04-29 15:37:46,442 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,446 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,449 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,452 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,486 [Thread-66] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting 2011-04-29 15:37:46,486 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,489 [Thread-66] INFO org.apache.hadoop.mapred.LocalJobRunner - 2011-04-29 15:37:46,489 [Thread-66] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0005_m_000000_0 is allowed to commit now 2011-04-29 15:37:46,489 [Thread-66] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:46,494 [Thread-66] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0005_m_000000_0' to file:/tmp/temp643030084/tmp-1663465556 2011-04-29 15:37:46,496 [Thread-66] INFO org.apache.hadoop.mapred.LocalJobRunner - 2011-04-29 15:37:46,496 [Thread-66] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0005_m_000000_0' done. 2011-04-29 15:37:46,776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005 2011-04-29 15:37:46,776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-04-29 15:37:51,778 [main] WARN org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local_0005 2011-04-29 15:37:51,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-04-29 15:37:51,778 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete 2011-04-29 15:37:51,778 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2 0.8.1 pehlivanz 2011-04-29 15:37:44 2011-04-29 15:37:51 UNKNOWN Success! Job Stats (time in seconds): JobId Alias Feature Outputs job_local_0005 A MAP_ONLY file:/tmp/temp643030084/tmp-1663465556, Input(s): Successfully read records from: "file:///home/pehlivanz/PIG/pig-0.8.1/tutorial/scripts/testzp/nested.txt" Output(s): Successfully stored records in: "file:/tmp/temp643030084/tmp-1663465556" Job DAG: job_local_0005 2011-04-29 15:37:51,778 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:51,781 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2011-04-29 15:37:51,782 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-04-29 15:37:51,784 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-04-29 15:37:51,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
