Did you set heap size to 0? Sent from my iPhone
On Mar 28, 2012, at 12:12 PM, "Herbert Mühlburger" <[email protected]> wrote: > Hi, > > Am 28.03.12 18:28, schrieb Jonathan Coveney: >> - dev@pig >> + user@pig > > You are right, fits better to user@pig. > >> What command are you using to run this? Are you upping the max heap? > > I created a pig script wiki.pig with the following content: > > ===register piggybank.jar; > > pages = load > '/user/herbert/enwiki-latest-pages-meta-history1.xml-p000000010p000002162.bz2' > using org.apache.pig.piggybank.storage.XMLLoader('page') as > (page:chararray); > pages = limit pages 1; > dump pages; > === > and used the command: > > % pig wiki.pig > > to run the pig script. > > I use current Hadoop 1.0.1. My version of PIG is checked out from trunk > and build by myself. > > Everything that I customized was setting HADOOP_HEAPSIZE 00 in > hadoop-env.sh (default heap size was was 1000MB). > > Kind regards, > Herbert > >> 2012/3/28 Herbert Mühlburger<[email protected]> >> >>> Hi, >>> >>> I would like to use pig to work with wikipedia dump files. It works >>> successfully with an input file of around 8GB of size but not too big xml >>> element content. >>> >>> In my current case I would like to use the file "enwiki-latest-pages-meta- >>> **history1.xml-**p000000010p000002162.bz2" (around 2GB of compressed >>> size) which can be found here: >>> >>> http://dumps.wikimedia.org/**enwiki/latest/enwiki-latest-** >>> pages-meta-history1.xml-**p000000010p000002162.bz2<http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history1.xml-p000000010p000002162.bz2> >>> >>> Is it possible that due to the fact that the content of the<page></page> >>> xml element could potentially become very large (several GB for instance) >>> XMLLoader of Piggybank has problems loading elements splitted by<page>? >>> >>> Hopefully anybody could help me with this. >>> >>> I've tried to call the following PIG Latin script: >>> >>> ========>> register piggybank.jar; >>> >>> pages = load '/user/herbert/enwiki-latest-**pages-meta-history1.xml- >>> p000000010p000002162.bz2' using >>> org.apache.pig.piggybank.**storage.XMLLoader('page') >>> as (page:chararray); >>> pages = limit pages 1; >>> dump pages; >>> ========>> >>> and always get the following error (the generated logfile is attached): >>> >>> ========>> >>> 2012-03-28 14:49:54,695 [main] INFO org.apache.pig.Main - Apache Pig >>> version 0.11.0-SNAPSHOT (rexported) compiled Mrz 28 2012, 08:21:45 >>> 2012-03-28 14:49:54,696 [main] INFO org.apache.pig.Main - Logging error >>> messages to: /Users/herbert/Documents/**workspace/pig-wikipedia/pig_** >>> 1332938994693.log >>> 2012-03-28 14:49:54,936 [main] INFO org.apache.pig.impl.util.Utils - >>> Default bootup file /Users/herbert/.pigbootup not found >>> 2012-03-28 14:49:55,189 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**HExecutionEngine - Connecting to hadoop file system at: >>> hdfs://localhost:9000 >>> 2012-03-28 14:49:55,403 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**HExecutionEngine - Connecting to map-reduce job tracker >>> at: localhost:9001 >>> 2012-03-28 14:49:55,845 [main] INFO >>> org.apache.pig.tools.pigstats.**ScriptState >>> - Pig features used in the script: LIMIT >>> 2012-03-28 14:49:56,021 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.MRCompiler - File concatenation >>> threshold: 100 optimistic? false >>> 2012-03-28 14:49:56,067 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MultiQueryOptimizer - MR plan size >>> before optimization: 1 >>> 2012-03-28 14:49:56,067 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MultiQueryOptimizer - MR plan size >>> after optimization: 1 >>> 2012-03-28 14:49:56,171 [main] INFO >>> org.apache.pig.tools.pigstats.**ScriptState >>> - Pig script settings are added to the job >>> 2012-03-28 14:49:56,187 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**JobControlCompiler - >>> mapred.job.reduce.markreset.**buffer.percent is not set, set to default >>> 0.3 >>> 2012-03-28 14:49:56,274 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**JobControlCompiler - creating jar file >>> Job5733074907123320640.jar >>> 2012-03-28 14:49:59,720 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**JobControlCompiler - jar file >>> Job5733074907123320640.jar created >>> 2012-03-28 14:49:59,736 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**JobControlCompiler - Setting up single >>> store job >>> 2012-03-28 14:49:59,795 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 1 map-reduce >>> job(s) waiting for submission. >>> ****hdfs://localhost:9000/**user/herbert/enwiki-latest-** >>> pages-meta-history1.xml-**p000000010p000002162.bz2 >>> 2012-03-28 14:50:00,152 [Thread-11] INFO >>> org.apache.hadoop.mapreduce.**lib.input.FileInputFormat >>> - Total input paths to process : 1 >>> 2012-03-28 14:50:00,169 [Thread-11] INFO org.apache.pig.backend.hadoop.** >>> executionengine.util.**MapRedUtil - Total input paths (combined) to >>> process : 35 >>> 2012-03-28 14:50:00,299 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 0% complete >>> 2012-03-28 14:50:01,277 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - HadoopJobId: >>> job_201203281105_0009 >>> 2012-03-28 14:50:01,278 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - More information >>> at: >>> http://localhost:50030/**jobdetails.jsp?jobid=job_**201203281105_0009<http://localhost:50030/jobdetails.jsp?jobid=job_201203281105_0009> >>> 2012-03-28 14:50:23,145 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 1% complete >>> 2012-03-28 14:50:29,206 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 2% complete >>> 2012-03-28 14:50:38,288 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 4% complete >>> 2012-03-28 14:53:17,686 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 7% complete >>> 2012-03-28 14:53:41,529 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 9% complete >>> 2012-03-28 14:55:05,775 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 10% complete >>> 2012-03-28 14:55:32,685 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 12% complete >>> 2012-03-28 14:56:21,754 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 13% complete >>> 2012-03-28 14:58:36,797 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - job >>> job_201203281105_0009 has failed! Stop running all dependent jobs >>> 2012-03-28 14:58:36,799 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - 100% complete >>> 2012-03-28 14:58:36,850 [main] ERROR >>> org.apache.pig.tools.pigstats.**SimplePigStats >>> - ERROR 2997: Unable to recreate exception from backed error: Error: Java >>> heap space >>> 2012-03-28 14:58:36,850 [main] ERROR >>> org.apache.pig.tools.pigstats.**PigStatsUtil >>> - 1 map reduce job(s) failed! >>> 2012-03-28 14:58:36,854 [main] INFO >>> org.apache.pig.tools.pigstats.**SimplePigStats >>> - Script Statistics: >>> >>> HadoopVersion PigVersion UserId StartedAt FinishedAt >>> Features >>> 1.0.1 0.11.0-SNAPSHOT herbert 2012-03-28 14:49:56 2012-03-28 >>> 14:58:36 LIMIT >>> >>> Failed! >>> >>> Failed Jobs: >>> JobId Alias Feature Message Outputs >>> job_201203281105_0009 pages Message: Job failed! Error - # of >>> failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: >>> task_201203281105_0009_m_**000003 hdfs://localhost:9000/tmp/** >>> temp1813558187/tmp250990633, >>> >>> Input(s): >>> Failed to read data from "/user/herbert/enwiki-latest-** >>> pages-meta-history1.xml-**p000000010p000002162.bz2" >>> >>> Output(s): >>> Failed to produce result in "hdfs://localhost:9000/tmp/** >>> temp1813558187/tmp250990633" >>> >>> Counters: >>> Total records written : 0 >>> Total bytes written : 0 >>> Spillable Memory Manager spill count : 0 >>> Total bags proactively spilled: 0 >>> Total records proactively spilled: 0 >>> >>> Job DAG: >>> job_201203281105_0009 >>> >>> >>> 2012-03-28 14:58:36,855 [main] INFO org.apache.pig.backend.hadoop.** >>> executionengine.**mapReduceLayer.**MapReduceLauncher - Failed! >>> 2012-03-28 14:58:36,891 [main] ERROR org.apache.pig.tools.grunt.**Grunt - >>> ERROR 2997: Unable to recreate exception from backed error: Error: Java >>> heap space >>> Details at logfile: /Users/herbert/Documents/** >>> workspace/pig-wikipedia/pig_**1332938994693.log >>> pig wiki.pig 8,48s user 2,72s system 2% cpu 8:46,07 total >>> >>> ========>> >>> Thank you very much and kind reagards, >>> Herbert >>> >> > > -- > ================================================================Herbert > Muehlburger Software Development and Business Management > Graz University of Technology > www.muehlburger.at www.twitter.com/hmuehlburger > ================================================================
