Hi Dimitriy, I try trunk version. You are right. I just remembered long time ago I can not put String type in tuple in LoadFunc.
On Thu, Oct 14, 2010 at 11:46 PM, Dmitriy Ryaboy <[email protected]> wrote: > He should be able to put in Strings. > Christian, do you have getLoadCaster implemented? > > From LoadFunc: > > /** > > * This will be called on the front end during planning and not on the > back > > * end during execution. > > * @return the {...@link LoadCaster} associated with this loader. Returning > null > > * indicates that casts from byte array are not supported for this > loader. > > * construction > > * @throws IOException if there is an exception during LoadCaster > > */ > > public LoadCaster getLoadCaster() throws IOException { > > return new Utf8StorageConverter(); > > } > > -D > On Thu, Oct 14, 2010 at 8:00 AM, Jeff Zhang <[email protected]> wrote: > >> Hi Christian, >> >> Like Dmitriy said, You should put pig types to tuple. >> >> >> Tuple output = TupleFactory.getInstance().newTuple(3); >> >> output.set(0, new DataByteArray(res.get("col1")).getBytes("UTF-8")); >> >> output.set(1, new DataByteArray(res.get("col2")).getBytes("UTF-8")); >> >> output.set(2, new DataByteArray(res.get("col3")).getBytes("UTF-8")); >> >> >> On Thu, Oct 14, 2010 at 8:22 PM, Christian Decker >> <[email protected]> wrote: >> > Right now all my tuple values are of type String. Actually my code looks >> > like this, still pretty basic but it's doing what it's supposed to: >> > >> > List<ColumnOrSuperColumn> cf = >> >> (List<ColumnOrSuperColumn>)reader.getCurrentValue(); >> >> HashMap<String, Object> res = new HashMap<String, Object>(); >> >> for (ColumnOrSuperColumn c : cf){ >> >> res.put(new String(c.column.name), new String(c.column.value)); >> >> } >> >> Tuple output = TupleFactory.getInstance().newTuple(3); >> >> output.set(0, res.get("col1")); >> >> output.set(1, res.get("col2")); >> >> output.set(2, res.get("col3")); >> >> >> > >> > Any idea? >> > >> > Regards, >> > Chris >> > >> > On Tue, Oct 12, 2010 at 11:23 PM, Dmitriy Ryaboy <[email protected]> >> wrote: >> > >> >> What are the objects underlying col1, col2, and col3? You can only use >> the >> >> set of objects Pig understands (so, String, various Number derivatives, >> >> DataByteArray, Map<String, Object> , Tuple, DataBag) >> >> >> >> -D >> >> >> >> On Tue, Oct 12, 2010 at 12:37 PM, Christian Decker < >> >> [email protected]> wrote: >> >> >> >> > Hi, >> >> > >> >> > I'm currently working on a simple Cassandra Loader that reads an index >> >> and >> >> > then works on that data. Now whenever I try to work on the retrieved >> data >> >> I >> >> > get a strange error: >> >> > >> >> > java.io.IOException: Type mismatch in key from map: expected >> >> > > org.apache.pig.impl.io.NullableBytesWritable, recieved >> >> > > org.apache.pig.impl.io.NullableText >> >> > > at >> >> > > >> >> > >> >> >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:115) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:234) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) >> >> > > at >> >> > > >> >> > >> >> >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) >> >> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> >> > > at >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> >> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> >> > > at >> >> > > >> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> >> > > >> >> > >> >> > The script is pretty simple right now: >> >> > >> >> > rows = LOAD 'cassandra://localhost:9160/...' USING >> CassandraIndexReader() >> >> > as >> >> > > (col1, col2, col3); >> >> > > dump rows; >> >> > > grouped = GROUP rows BY col1; >> >> > > dump grouped; >> >> > > >> >> > >> >> > The first dump works fine,while the second just dies with the above >> >> error. >> >> > Strangely when I store it on disc and then load it with PigStorage() >> >> again >> >> > it just works as expected. >> >> > >> >> > Am I doing something wrong with my Custom Loader? >> >> > >> >> > Regards, >> >> > Chris >> >> > >> >> >> > >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > -- Best Regards Jeff Zhang
