Sorry - I've been kind of out of it this weekend.  Talking about it on IRC.  
What I'd like to do is get a small set of data and a script that can reproduce 
what you're trying to do and then try various things in my own environment.  
That way we can more easily log a Cassandra ticket if it can't be worked into 
what's currently there.  I'll respond to this thread when we have something to 
go forward with.

On Apr 24, 2011, at 3:28 PM, Dmitriy Ryaboy wrote:

> Sigh. @jeromatron , @thedatachef -- this one's on you :). Toldya you need
> the LoadCaster...
> 
> 
> D
> 
> On Sun, Apr 24, 2011 at 1:17 PM, pob <[email protected]> wrote:
> 
>> hello,
>> 
>> thanks but w/out sucess ;/
>> 
>> 
>> grunt> pom = foreach rows generate myUDF.toTuple($1);
>> grunt> describe pom
>> pom: {y: {t: (domain: bytearray,spam: bytearray,size: bytearray,time:
>> bytearray)}}
>> grunt> data = foreach pom generate flatten($0) as (domain, spam, size,
>> time);
>> grunt> data = foreach data generate (chararray) domain, (int) spam, (long)
>> size,
>>>> (float) time;
>> grunt> describe data;
>> data: {domain: chararray,spam: int,size: long,time: float}
>> 
>> z = foreach data generate time+size;
>> 
>> 
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:92)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 2011-04-24 22:16:06,129 [main] INFO
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - job job_local_0001 has failed! Stop running all dependent jobs
>> 
>> 
>> 
>> 
>> z = foreach data generate time
>> 
>> 
>> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received
>> a
>> bytearray from the UDF. Cannot determine how to convert the bytearray to
>> float.
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:529)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>> at
>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> 
>> 
>> 2011/4/24 Dmitriy Ryaboy <[email protected]>
>> 
>>> Try this:
>>> 
>>> data = foreach pom generate flatten($0) as (domain, spam, size, time);
>>> data = foreach data generate (chararray) domain, (int) spam, (long) size,
>>> (float) time;
>>> 
>>> Pig is inconsistent in what "as foo:type" does vs " (type) foo"
>>> 
>>> D
>>> 
>>> On Sun, Apr 24, 2011 at 10:44 AM, pob <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> but why i cant re-cast it during flatten?
>>>> 
>>>> 
>>>> data = foreach pom generate flatten($0) AS (domain:chararray, spam:int,
>>>> size:long, time:float);
>>>> 
>>>> grunt> z = foreach data generate time+size;
>>>> 
>>>> 
>>>> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot
>> be
>>>> cast to java.lang.Float
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:97)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>> at
>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>> 
>>>> 
>>>> 
>>>> 2011/4/24 Dmitriy Ryaboy <[email protected]>
>>>> 
>>>>> I think it's the deep-casting issue from
>>>>> https://issues.apache.org/jira/browse/PIG-1758 .
>>>>> Should work in 0.9 but didn't get into 0.8 or 0.8.1
>>>>> 
>>>>> D
>>>>> 
>>>>> On Sun, Apr 24, 2011 at 9:52 AM, pob <[email protected]> wrote:
>>>>> 
>>>>>> Thats stramge, pygmalion works fine (but there are any numerical
>>>>>> operations).
>>>>>> 
>>>>>> I think Im using C* 0.7.5 where it suppose to be patched ;/ so idk
>> :(
>>>>>> 
>>>>>> 
>>>>>> 2011/4/24 Jacob Perkins <[email protected]>
>>>>>> 
>>>>>>> That changes things entirely. There's some weirdness in the way
>>> data
>>>> is
>>>>>>> read from Cassandra. Have you applied the latest patches (eg.
>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2387) ?
>>>>>>> 
>>>>>>> See also some UDFs for working with Cassandra data that Jeremy
>>> Hanna
>>>>>>> (@jeromatron) wrote:
>>>>>>> 
>>>>>>> https://github.com/jeromatron/pygmalion
>>>>>>> 
>>>>>>> 
>>>>>>> Best of luck!
>>>>>>> 
>>>>>>> --jacob
>>>>>>> @thedatachef
>>>>>>> 
>>>>>>> On Sun, 2011-04-24 at 18:31 +0200, pob wrote:
>>>>>>>> Maybe I forget one more thing, rows are taken from Cassandra.
>>>>>>>> 
>>>>>>>> rows = LOAD 'cassandra://emailArchive/messagesMetaData' USING
>>>>>>>> CassandraStorage() AS (key, columns: bag {T: tuple(name,
>>> value)});
>>>>>>>> 
>>>>>>>> I have no idea how to format AS for bag in foreach.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> P.
>>>>>>>> 
>>>>>>>> 2011/4/24 Jacob Perkins <[email protected]>
>>>>>>>> 
>>>>>>>>> Strange, that looks right to me. What happens if you try the
>>> 'AS'
>>>>>>>>> statement anyhow?
>>>>>>>>> 
>>>>>>>>> --jacob
>>>>>>>>> @thedatachef
>>>>>>>>> 
>>>>>>>>> On Sun, 2011-04-24 at 18:22 +0200, pob wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> pom = foreach rows generate myUDF.toTuple($1); -- reading
>>> data
>>>>>>>>>> describe pom
>>>>>>>>>> pom: {y: {t: (domain: chararray,spam: int,size: long,time:
>>>>> float)}}
>>>>>>>>>> 
>>>>>>>>>> data = foreach pom generate flatten($0);
>>>>>>>>>> grunt> describe data;
>>>>>>>>>> data: {y::domain: chararray,y::spam: int,y::size:
>>> long,y::time:
>>>>>>> float}
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I thing they are casted fine, right?
>>>>>>>>>> 
>>>>>>>>>> UDF is python one with decorator
>>>>>>>>>> @outputSchema("y:bag{t:tuple(domain:chararray, spam:int,
>>>>> size:long,
>>>>>>>>>> time:float)}")
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2011/4/24 Jacob Perkins <[email protected]>
>>>>>>>>>> 
>>>>>>>>>>> You're getting a 'ClassCastException' because the
>> contents
>>> of
>>>>> the
>>>>>>> bags
>>>>>>>>>>> are DataByteArray and not long (or cannot be cast to
>> long).
>>> I
>>>>>>> suspect
>>>>>>>>>>> that you're generating the contents of the bag in some
>> way
>>>> from
>>>>> a
>>>>>>> UDF,
>>>>>>>>>>> no?
>>>>>>>>>>> 
>>>>>>>>>>> You need to either declare the output schema explicitly
>> in
>>>> the
>>>>>> UDF
>>>>>>> or
>>>>>>>>>>> just use the 'AS' statement. For example, say you have a
>>> UDF
>>>>> that
>>>>>>> sums
>>>>>>>>>>> two numbers:
>>>>>>>>>>> 
>>>>>>>>>>> data   = LOAD 'foobar' AS (int:a, int:b);
>>>>>>>>>>> summed = FOREACH data GENERATE MyFancySummingUDF(a,b) AS
>>>>>> (sum:int);
>>>>>>>>>>> DUMP summed;
>>>>>>>>>>> 
>>>>>>>>>>> --jacob
>>>>>>>>>>> @thedatachef
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, 2011-04-24 at 18:02 +0200, pob wrote:
>>>>>>>>>>>> x = foreach g2 generate group, data.(size);
>>>>>>>>>>>> dump x;
>>>>>>>>>>>> 
>>>>>>>>>>>> ((drm,0),{(464868)})
>>>>>>>>>>>> ((drm,1),{(464868)})
>>>>>>>>>>>> ((snezz,0),{(8073),(8073)})
>>>>>>>>>>>> 
>>>>>>>>>>>> but:
>>>>>>>>>>>> x = foreach g2 generate group, SUM(data.size);
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2011-04-24 18:02:18,910 [Thread-793] WARN
>>>>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner -
>> job_local_0038
>>>>>>>>>>>> org.apache.pig.backend.executionengine.ExecException:
>>> ERROR
>>>>>> 2106:
>>>>>>>>> Error
>>>>>>>>>>>> while computing sum in Initial
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:87)
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:65)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>>>>>>>>>>> at
>>> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>>>>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>>>>>>>> at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>>>>>>>>> Caused by: java.lang.ClassCastException:
>>>>>>>>>>> org.apache.pig.data.DataByteArray
>>>>>>>>>>>> cannot be cast to java.lang.Long
>>>>>>>>>>>> at
>>>>> org.apache.pig.builtin.LongSum$Initial.exec(LongSum.java:79)
>>>>>>>>>>>> ... 14 more
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - HadoopJobId: job_local_0038
>>>>>>>>>>>> 2011-04-24 18:02:19,213 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 0% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,215 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - job job_local_0038 has failed! Stop running all
>>> dependent
>>>>>> jobs
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>>>>>>>>>> - 100% complete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] ERROR
>>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
>> reduce
>>>>>> job(s)
>>>>>>>>> failed!
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Detected Local mode. Stats reported below may be
>>>> incomplete
>>>>>>>>>>>> 2011-04-24 18:02:24,216 [main] INFO
>>>>>>>>>>> org.apache.pig.tools.pigstats.PigStats
>>>>>>>>>>>> - Script Statistics:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Pig Stack Trace
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> ERROR 1066: Unable to open iterator for alias x
>>>>>>>>>>>> 
>>>>>>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException:
>> ERROR
>>>>> 1066:
>>>>>>>>> Unable to
>>>>>>>>>>>> open iterator for alias x
>>>>>>>>>>>>        at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:754)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>>>>>>>        at
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>>>>>>>        at
>>>>> org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>>>>>>>        at org.apache.pig.Main.run(Main.java:465)
>>>>>>>>>>>>        at org.apache.pig.Main.main(Main.java:107)
>>>>>>>>>>>> Caused by: java.io.IOException: Job terminated with
>>>> anomalous
>>>>>>> status
>>>>>>>>>>> FAILED
>>>>>>>>>>>>        at
>>>>>>> org.apache.pig.PigServer.openIterator(PigServer.java:744)
>>>>>>>>>>>>        ... 7 more
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to