Hi Mohit,
 XMLLoader looks for the start and end tag for a given string argument. In
the given input there are no end tags and hence it read 0 records.

Example: 
raw = LOAD 'sample_xml' using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);
dump raw;

cat sample_xml
<abc><def></def></abc>
<abc><def></def></abc>

Thanks
Vivek
On 2/21/12 11:02 PM, "Mohit Anchlia" <[email protected]> wrote:

> I am trying to use XMLLoader to process the files but it doesn't seem to be
> quite working. For the first pass I am just trying to dump all the contents
> but it's saying 0 records found:
> 
> bash-3.2$ hadoop fs -cat /examples/testfile.txt
> 
> <abc><def></def><abc>
> 
> <abc><def></def><abc>
> 
> register 'pig-0.8.1-cdh3u3/contrib/piggybank/java/piggybank.jar'
> 
> raw = LOAD '/examples/testfile.txt' using
> org.apache.pig.piggybank.storage.XMLLoader('<abc>') as (document:chararray);
> 
> dump raw;
> 
> 2012-02-21 09:22:18,947 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
> 
> 2012-02-21 09:22:24,998 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 
> 2012-02-21 09:22:24,999 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
> 
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 
> 0.20.2-cdh3u3 0.8.1-cdh3u3 hadoop 2012-02-21 09:22:12 2012-02-21 09:22:24
> UNKNOWN
> 
> Success!
> 
> Job Stats (time in seconds):
> 
> JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime
> MinReduceTime AvgReduceTime Alias Feature Outputs
> 
> job_201202201638_0012 1 0 2 2 2 0 0 0 raw MAP_ONLY
> hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646,
> 
> Input(s):
> 
> Successfully read 0 records (402 bytes) from: "/examples/testfile.txt"
> 
> Output(s):
> 
> Successfully stored 0 records in:
> "hdfs://dsdb1:54310/tmp/temp1968655187/tmp-358114646"
> 
> Counters:
> 
> Total records written : 0
> 
> Total bytes written : 0
> 
> Spillable Memory Manager spill count : 0
> 
> Total bags proactively spilled: 0
> 
> Total records proactively spilled: 0
> 
> Job DAG:
> 
> job_201202201638_0012
> 
> 
> 
> 2012-02-21 09:22:25,004 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Success!
> 
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 
> 2012-02-21 09:22:25,011 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
> 
> grunt> quit

Reply via email to