Thanks Stan, This would be a great help.. !! I'll try to implement it. :-) Praveenesh
On Sat, Feb 4, 2012 at 8:10 AM, Stan Rosenberg < [email protected]> wrote: > Hi Praveenesh, > > Maybe this will get you started. > > Suppose we have the desired schema parsed and stored in 'map' of type > LinkedHashMap<String, String>. The key is your field name, and the > value denotes the data type, e.g., 'string', 'int', > etc. > > Now, let's derive pig's schema from this map: > > Schema schema = new Schema(); // pig schema > > for (Entry<String, String> entry : map.entrySet()) { > schema.add(new Schema.FieldSchema(entry.getKey(), > getPigType(entry.getValue()))); > } > > where getPigType returns the corresponding pig's data type: > > byte getPigType(String fieldType) { > if (fieldType.equalsIgnoreCase("string")) { > return DataType.CHARARRAY; > } else if (fieldType.equalsIgnoreCase("int")) { > return DataType.INTEGER; > } else if (fieldType.equalsIgnoreCase("long")) { > return DataType.LONG; > } else if (fieldType.equalsIgnoreCase("float")) { > return DataType.FLOAT; > } if (fieldType.equalsIgnoreCase("double")) { > return DataType.DOUBLE; > } if (fieldType.equalsIgnoreCase("boolean")) { > return DataType.BOOLEAN; > } else { > return DataType.CHARARRAY; > } > } > > > Now, you'll want to implement 'getSchema' in your custom loader: > > @Override > public ResourceSchema getSchema(String location, Job job) throws > IOException { > return new ResourceSchema(schema); // I'd actually cache this > result if the schema is fixed > } > > This should take care of the schema except you'd probably also need to > serialize it to the back-end so that > you can enforce the schema inside 'getNext'. > > stan > > P.S. The above is essentially pseudo-code; I haven't actually type-checked > it. > > On Fri, Feb 3, 2012 at 5:45 PM, praveenesh kumar <[email protected]> > wrote: > > Thanks Stan, > > I was going through these only. I was wondering is there a easy way to do > > it or am I reading something wrong. > > Now I will focus on what you have suggested. but I hope there is some > easy > > solution to my problem > > > > Praveenesh > > > > On Sat, Feb 4, 2012 at 4:12 AM, Stan Rosenberg < > > [email protected]> wrote: > > > >> Hi Praveenesh, > >> > >> Assuming you have already read these: > >> > >> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html > >> http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions > >> > >> my next step would be to peruse the source code of some existing > >> loaders, e.g., PigStorage. > >> > >> Best, > >> > >> stan > >> > >> > >> On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[email protected]> > >> wrote: > >> > Thanks Stan, > >> > If you were facing this kind of scenario, how would you have > proceeded ? > >> > Can you give me some pointers on how to write custom loader, some good > >> > tutorials..on it > >> > What is the current practice in order to solve the above scenario in > pig > >> ? > >> > > >> > Praveenesh > >> > > >> > > >> > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg < > >> > [email protected]> wrote: > >> > > >> >> My hunch is you'll have to write a custom loader, but I'll let the > >> >> experts chime in. E.g., AvroStorage loader can parse the schema > >> >> from a json file passed to it via the constructor. I don't think > >> >> PigStorage has the same option. > >> >> > >> >> stan > >> >> > >> >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar < > [email protected]> > >> >> wrote: > >> >> > Hey guys, > >> >> > > >> >> > I am new to Pig. > >> >> > I was wondering is it possible to pass schema in pig load statement > >> while > >> >> > loading it first time. > >> >> > > >> >> > Suppose if I have a huge dataset.. containing around 100 cols.. Is > >> there > >> >> a > >> >> > way through which I can pass the schema defined in some other file > >> (some > >> >> > kind of meta file) into pig load statement or do I have to define > it > >> >> every > >> >> > time inside LOAD statement ? > >> >> > > >> >> > Thanks, > >> >> > Praveenesh > >> >> > >> >
