Looks like you don't have Thrift on your classpath, or the wrong version of thrift.
Pig may be doing something weird with splits in local mode. It would be great if you could determine whether (assuming you fix the classpath) the problem happens in local mode only, or both in local and MR modes. D On Sun, Apr 1, 2012 at 9:49 PM, Dan Feldman <[email protected]> wrote: > Hi Dmitriy, > > Apologies for the delay - our server was misbehaving so it took a while to > get everything set up on a new one. In any case, we basically cloned > Cassandra from the old one to the new one - running Pig in local mode still > produces wrong number of results. Now, we never ran the scripts in MR mode, > so I don't know whether this is related to the original problem or not, but > this is the error I get when running on top of hadoop: > > =============================================================================== > .... > *2012-04-01 21:32:38,781 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - job job_201203301228_0003 has failed! Stop running all dependent jobs > 2012-04-01 21:32:38,782 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2012-04-01 21:32:38,790 [main] ERROR > org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to > recreate exception from backed error: Error: > java.lang.ClassNotFoundException: org.apache.thrift.TException > 2012-04-01 21:32:38,790 [main] ERROR > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! > 2012-04-01 21:32:38,791 [main] INFO > org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics*: > .... > ================================================================================ > > Thanks, > Dan F > > > On Thu, Mar 29, 2012 at 6:20 PM, Dmitriy Ryaboy <[email protected]> wrote: > >> What happens when you run in MR mode instead of local mode? >> >> On Thu, Mar 29, 2012 at 11:24 AM, Dan Feldman <[email protected]> >> wrote: >> > Hi, >> > >> > I'm loading a bunch of data into Pig using CassandraStorage. When I do a >> > dump and/or store, the amount of data that is outputted is actually only >> > 2-3% of the amount of data in Cassandra database. >> > >> > My Cassandra data consists of (for now) 4-5 wide rows where each data >> entry >> > is a super column ordered by TimeUUID. >> > >> > So, my script now looks like >> > >> > rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage() >> AS >> > (key:chararray, columns: bag{(t:chararray, subcolumns: bag{(name, >> > value)})}); >> > store rows into 'directory/test'; >> > >> > The output that I get when I run the script looks like this (I >> highlighted >> > the warnings): >> > >> > >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > *2012-03-29 11:10:58,063 [main] INFO org.apache.pig.Main - Logging error >> > messages to: /directory/pig_1333044658058.log >> > 2012-03-29 11:10:58,105 [main] INFO >> > org.apache.pig.tools.parameters.PreprocessorContext - Executing command : >> > date "+%y%m%d%H%M%S" >> > 2012-03-29 11:10:58,268 [main] INFO >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> Connecting >> > to hadoop file system at: file:/// >> > 2012-03-29 11:10:59,018 [main] INFO >> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >> > script: UNKNOWN >> > 2012-03-29 11:10:59,182 [main] INFO >> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - >> > File concatenation threshold: 100 optimistic? false >> > 2012-03-29 11:10:59,211 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >> > - MR plan size before optimization: 1 >> > 2012-03-29 11:10:59,211 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >> > - MR plan size after optimization: 1 >> > 2012-03-29 11:10:59,251 [main] INFO >> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added >> > to the job >> > 2012-03-29 11:10:59,269 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default >> 0.3 >> > 2012-03-29 11:10:59,292 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >> > - Setting up single store job >> > 2012-03-29 11:10:59,334 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> > - 1 map-reduce job(s) waiting for submission. >> > 2012-03-29 11:10:59,361 [Thread-1] WARN >> org.apache.hadoop.mapred.JobClient >> > - No job jar file set. User classes may not be found. See JobConf(Class) >> > or JobConf#setJar(String). >> > 2012-03-29 11:10:59,437 [Thread-1] INFO >> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total >> input >> > paths (combined) to process : 1 >> > 2012-03-29 11:10:59,836 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> > - HadoopJobId: job_local_0001 >> > 2012-03-29 11:10:59,836 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> > - 0% complete >> > 2012-03-29 11:11:01,185 [Thread-2] INFO org.apache.hadoop.mapred.Task - >> > Task:attempt_local_0001_m_000000_0 is done. And is in the process of >> > commiting >> > 2012-03-29 11:11:01,189 [Thread-2] INFO >> > org.apache.hadoop.mapred.LocalJobRunner - >> > 2012-03-29 11:11:01,189 [Thread-2] INFO org.apache.hadoop.mapred.Task - >> > Task attempt_local_0001_m_000000_0 is allowed to commit now >> > 2012-03-29 11:11:01,192 [Thread-2] INFO >> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output >> > of task 'attempt_local_0001_m_000000_0' to file:/root/root/test >> > 2012-03-29 11:11:02,714 [Thread-2] INFO >> > org.apache.hadoop.mapred.LocalJobRunner - >> > 2012-03-29 11:11:02,714 [Thread-2] INFO org.apache.hadoop.mapred.Task - >> > Task 'attempt_local_0001_m_000000_0' done. >> > 2012-03-29 11:11:04,842 [main] WARN >> > org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for >> > job job_local_0001 >> > 2012-03-29 11:11:04,845 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> > - 100% complete >> > 2012-03-29 11:11:04,845 [main] INFO >> > org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats >> > reported below may be incomplete >> > 2012-03-29 11:11:04,847 [main] INFO >> > org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: >> > >> > HadoopVersion PigVersion UserId StartedAt FinishedAt >> Features >> > 0.20.203.0 0.9.3-SNAPSHOT root 2012-03-29 11:10:59 2012-03-29 >> > 11:11:04 UNKNOWN >> > >> > Success! >> > >> > Job Stats (time in seconds): >> > JobId Alias Feature Outputs >> > job_local_0001 rows MAP_ONLY file:///root/directory/test, >> > >> > Input(s): >> > Successfully read records from: "cassandra://Keyspace/ColumnFamily" >> > >> > Output(s): >> > Successfully stored records in: "file:///root/directory/test" >> > >> > Job DAG: >> > job_local_0001 >> > >> > >> > 2012-03-29 11:11:04,849 [main] INFO >> > >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> > - Success!* >> > >> > >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> > >> > Now, I don't know whether it's related or not to the problem, but I >> > recently noticed that ILLUSTRATE dumps the data to the terminal before >> > actually illustrating the schema. It outputs the same amount of data >> (about >> > 2-3% of the total) as it would if I just ran DUMP or STORE. >> > >> > I'm using Pig 0.93 in local mode with Cassandra 1.0.8 >> > >> > >> > P.S. I tried setting -Dpig.splitCombination=false as was suggested by >> Matt >> > in >> > >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Pig-not-reading-all-cassandra-data-td5982283.html >> , >> > but it didn't help... >> > >> > >> > Thanks for your help! >> > Dan F. >>
