Re: Strange error when using custom LoadFunc

Jeff Zhang Thu, 14 Oct 2010 18:16:02 -0700

Hi Dimitriy,

I try trunk version. You are right. I just remembered long time ago I
can not put String type in tuple in LoadFunc.




On Thu, Oct 14, 2010 at 11:46 PM, Dmitriy Ryaboy <[email protected]> wrote:
> He should be able to put in Strings.
> Christian, do you have getLoadCaster implemented?
>
> From LoadFunc:
>
>    /**
>
>     * This will be called on the front end during planning and not on the
> back
>
>     * end during execution.
>
>     * @return the {...@link LoadCaster} associated with this loader. Returning
> null
>
>     * indicates that casts from byte array are not supported for this
> loader.
>
>     * construction
>
>     * @throws IOException if there is an exception during LoadCaster
>
>     */
>
>    public LoadCaster getLoadCaster() throws IOException {
>
>        return new Utf8StorageConverter();
>
>    }
>
> -D
> On Thu, Oct 14, 2010 at 8:00 AM, Jeff Zhang <[email protected]> wrote:
>
>> Hi Christian,
>>
>> Like Dmitriy said, You should put pig types to tuple.
>>
>> >> Tuple output = TupleFactory.getInstance().newTuple(3);
>> >> output.set(0, new DataByteArray(res.get("col1")).getBytes("UTF-8"));
>> >> output.set(1, new DataByteArray(res.get("col2")).getBytes("UTF-8"));
>> >> output.set(2, new DataByteArray(res.get("col3")).getBytes("UTF-8"));
>>
>>
>> On Thu, Oct 14, 2010 at 8:22 PM, Christian Decker
>> <[email protected]> wrote:
>> > Right now all my tuple values are of type String. Actually my code looks
>> > like this, still pretty basic but it's doing what it's supposed to:
>> >
>> > List<ColumnOrSuperColumn> cf =
>> >> (List<ColumnOrSuperColumn>)reader.getCurrentValue();
>> >> HashMap<String, Object> res = new HashMap<String, Object>();
>> >> for (ColumnOrSuperColumn c : cf){
>> >>   res.put(new String(c.column.name), new String(c.column.value));
>> >> }
>> >> Tuple output = TupleFactory.getInstance().newTuple(3);
>> >> output.set(0, res.get("col1"));
>> >> output.set(1, res.get("col2"));
>> >> output.set(2, res.get("col3"));
>> >>
>> >
>> > Any idea?
>> >
>> > Regards,
>> > Chris
>> >
>> > On Tue, Oct 12, 2010 at 11:23 PM, Dmitriy Ryaboy <[email protected]>
>> wrote:
>> >
>> >> What are the objects underlying col1, col2, and col3? You can only use
>> the
>> >> set of objects Pig understands (so, String, various Number derivatives,
>> >> DataByteArray, Map<String, Object> , Tuple, DataBag)
>> >>
>> >> -D
>> >>
>> >> On Tue, Oct 12, 2010 at 12:37 PM, Christian Decker <
>> >> [email protected]> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I'm currently working on a simple Cassandra Loader that reads an index
>> >> and
>> >> > then works on that data. Now whenever I try to work on the retrieved
>> data
>> >> I
>> >> > get a strange error:
>> >> >
>> >> > java.io.IOException: Type mismatch in key from map: expected
>> >> > > org.apache.pig.impl.io.NullableBytesWritable, recieved
>> >> > > org.apache.pig.impl.io.NullableText
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:115)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
>> >> > >     at
>> >> > >
>> >> >
>> >>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
>> >> > >     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> >> > >     at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> >> > >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> >> > >     at
>> >> > >
>> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> >> > >
>> >> >
>> >> > The script is pretty simple right now:
>> >> >
>> >> > rows = LOAD 'cassandra://localhost:9160/...' USING
>> CassandraIndexReader()
>> >> > as
>> >> > > (col1, col2, col3);
>> >> > > dump rows;
>> >> > > grouped = GROUP rows BY col1;
>> >> > > dump grouped;
>> >> > >
>> >> >
>> >> > The first dump works fine,while the second just dies with the above
>> >> error.
>> >> > Strangely when I store it on disc and then load it with PigStorage()
>> >> again
>> >> > it just works as expected.
>> >> >
>> >> > Am I doing something wrong with my Custom Loader?
>> >> >
>> >> > Regards,
>> >> > Chris
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>



-- 
Best Regards

Jeff Zhang

Re: Strange error when using custom LoadFunc

Reply via email to