Sorry - I've been kind of out of it this weekend. Talking about it on IRC. What I'd like to do is get a small set of data and a script that can reproduce what you're trying to do and then try various things in my own environment. That way we can more easily log a Cassandra ticket if it can't be worked into what's currently there. I'll respond to this thread when we have something to go forward with.
On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote: > Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need > the LoadCaster... > > > D > > On Sun, Apr 24, 2011 at 1:17 PM, pob <[email protected]> wrote: > >> hello, >> >> thanks but w/out sucess ;/ >> >> >> grunt> pom = foreach rows generate myUDF.toTuple($1); >> grunt> describe pom >> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time: >> bytearray)}} >> grunt> data = foreach pom generate flatten($0) as (domain, spam, size, >> time); >> grunt> data = foreach data generate (chararray) domain, (int) spam, (long) >> size, >>>> (float) time; >> grunt> describe data; >> data: {domain: chararray,spam: int,size: long,time: float} >> >> z = foreach data generate time+size; >> >> >> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received >> a >> bytearray from the UDF. Cannot determine how to convert the bytearray to >> float. >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529) >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92) >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> 2011-04-24 22:16:06,129 [main] INFO >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >> - job job_local_0001 has failed! Stop running all dependent jobs >> >> >> >> >> z = foreach data generate time >> >> >> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received >> a >> bytearray from the UDF. Cannot determine how to convert the bytearray to >> float. >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529) >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) >> at >> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >> at >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> >> >> 2011/4/24 Dmitriy Ryaboy <[email protected]> >> >>> Try this: >>> >>> data = foreach pom generate flatten($0) as (domain, spam, size, time); >>> data = foreach data generate (chararray) domain, (int) spam, (long) size, >>> (float) time; >>> >>> Pig is inconsistent in what "as foo:type" does vs " (type) foo" >>> >>> D >>> >>> On Sun, Apr 24, 2011 at 10:44 AM, pob <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> but why i cant re-cast it during flatten? >>>> >>>> >>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int, >>>> size:long, time:float); >>>> >>>> grunt> z = foreach data generate time+size; >>>> >>>> >>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot >> be >>>> cast to java.lang.Float >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97) >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364) >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >>>> at >>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>> >>>> >>>> >>>> 2011/4/24 Dmitriy Ryaboy <[email protected]> >>>> >>>>> I think it's the deep-casting issue from >>>>> https://issues.apache.org/jira/browse/PIG-1758 . >>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1 >>>>> >>>>> D >>>>> >>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <[email protected]> wrote: >>>>> >>>>>> Thats stramge, pygmalion works fine (but there are any numerical >>>>>> operations). >>>>>> >>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk >> :( >>>>>> >>>>>> >>>>>> 2011/4/24 Jacob Perkins <[email protected]> >>>>>> >>>>>>> That changes things entirely. There's some weirdness in the way >>> data >>>> is >>>>>>> read from Cassandra. Have you applied the latest patches (eg. >>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ? >>>>>>> >>>>>>> See also some UDFs for working with Cassandra data that Jeremy >>> Hanna >>>>>>> (@jeromatron) wrote: >>>>>>> >>>>>>> https://github.com/jeromatron/pygmalion >>>>>>> >>>>>>> >>>>>>> Best of luck! >>>>>>> >>>>>>> --jacob >>>>>>> @thedatachef >>>>>>> >>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote: >>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra. >>>>>>>> >>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING >>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name, >>> value)}); >>>>>>>> >>>>>>>> I have no idea how to format AS for bag in foreach. >>>>>>>> >>>>>>>> >>>>>>>> P. >>>>>>>> >>>>>>>> 2011/4/24 Jacob Perkins <[email protected]> >>>>>>>> >>>>>>>>> Strange, that looks right to me. What happens if you try the >>> 'AS' >>>>>>>>> statement anyhow? >>>>>>>>> >>>>>>>>> --jacob >>>>>>>>> @thedatachef >>>>>>>>> >>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading >>> data >>>>>>>>>> describe pom >>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time: >>>>> float)}} >>>>>>>>>> >>>>>>>>>> data = foreach pom generate flatten($0); >>>>>>>>>> grunt> describe data; >>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size: >>> long,y::time: >>>>>>> float} >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I thing they are casted fine, right? >>>>>>>>>> >>>>>>>>>> UDF is python one with decorator >>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int, >>>>> size:long, >>>>>>>>>> time:float)}") >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2011/4/24 Jacob Perkins <[email protected]> >>>>>>>>>> >>>>>>>>>>> You're getting a 'ClassCastException' because the >> contents >>> of >>>>> the >>>>>>> bags >>>>>>>>>>> are DataByteArray and not long (or cannot be cast to >> long). >>> I >>>>>>> suspect >>>>>>>>>>> that you're generating the contents of the bag in some >> way >>>> from >>>>> a >>>>>>> UDF, >>>>>>>>>>> no? >>>>>>>>>>> >>>>>>>>>>> You need to either declare the output schema explicitly >> in >>>> the >>>>>> UDF >>>>>>> or >>>>>>>>>>> just use the 'AS' statement. For example, say you have a >>> UDF >>>>> that >>>>>>> sums >>>>>>>>>>> two numbers: >>>>>>>>>>> >>>>>>>>>>> data = LOAD 'foobar' AS (int:a, int:b); >>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS >>>>>> (sum:int); >>>>>>>>>>> DUMP summed; >>>>>>>>>>> >>>>>>>>>>> --jacob >>>>>>>>>>> @thedatachef >>>>>>>>>>> >>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote: >>>>>>>>>>>> x = foreach g2 generate group, data.(size); >>>>>>>>>>>> dump x; >>>>>>>>>>>> >>>>>>>>>>>> ((drm,0),{(464868)}) >>>>>>>>>>>> ((drm,1),{(464868)}) >>>>>>>>>>>> ((snezz,0),{(8073),(8073)}) >>>>>>>>>>>> >>>>>>>>>>>> but: >>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN >>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner - >> job_local_0038 >>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException: >>> ERROR >>>>>> 2106: >>>>>>>>> Error >>>>>>>>>>>> while computing sum in Initial >>>>>>>>>>>> at >>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87) >>>>>>>>>>>> at >>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >>>>>>>>>>>> at >>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>>>>>>>>>>> at >>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>>>>>>>>>>> at >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>>>>>>>>>> Caused by: java.lang.ClassCastException: >>>>>>>>>>> org.apache.pig.data.DataByteArray >>>>>>>>>>>> cannot be cast to java.lang.Long >>>>>>>>>>>> at >>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79) >>>>>>>>>>>> ... 14 more >>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>>> - HadoopJobId: job_local_0038 >>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>>> - 0% complete >>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>>> - job job_local_0038 has failed! Stop running all >>> dependent >>>>>> jobs >>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>>>>>>>>>>> - 100% complete >>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR >>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map >> reduce >>>>>> job(s) >>>>>>>>> failed! >>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO >>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats >>>>>>>>>>>> - Detected Local mode. Stats reported below may be >>>> incomplete >>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO >>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats >>>>>>>>>>>> - Script Statistics: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Pig Stack Trace >>>>>>>>>>>> --------------- >>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x >>>>>>>>>>>> >>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: >> ERROR >>>>> 1066: >>>>>>>>> Unable to >>>>>>>>>>>> open iterator for alias x >>>>>>>>>>>> at >>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) >>>>>>>>>>>> at >>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) >>>>>>>>>>>> at org.apache.pig.Main.run(Main.java:465) >>>>>>>>>>>> at org.apache.pig.Main.main(Main.java:107) >>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with >>>> anomalous >>>>>>> status >>>>>>>>>>> FAILED >>>>>>>>>>>> at >>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744) >>>>>>>>>>>> ... 7 more >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
