Thanks Stan,
This would be a great help.. !! I'll try to implement it. :-)

Praveenesh

On Sat, Feb 4, 2012 at 8:10 AM, Stan Rosenberg <
[email protected]> wrote:

> Hi Praveenesh,
>
> Maybe this will get you started.
>
> Suppose we have the desired schema parsed and stored in 'map' of type
> LinkedHashMap<String, String>.  The key is your field name, and the
> value denotes the data type, e.g., 'string', 'int',
> etc.
>
> Now, let's derive pig's schema from this map:
>
> Schema schema = new Schema();  // pig schema
>
> for (Entry<String, String> entry : map.entrySet()) {
>    schema.add(new Schema.FieldSchema(entry.getKey(),
> getPigType(entry.getValue())));
> }
>
> where getPigType returns the corresponding pig's data type:
>
>       byte getPigType(String fieldType) {
>                if (fieldType.equalsIgnoreCase("string")) {
>                        return DataType.CHARARRAY;
>                } else if (fieldType.equalsIgnoreCase("int")) {
>                        return DataType.INTEGER;
>                } else if (fieldType.equalsIgnoreCase("long")) {
>                        return DataType.LONG;
>                } else if (fieldType.equalsIgnoreCase("float")) {
>                        return DataType.FLOAT;
>                } if (fieldType.equalsIgnoreCase("double")) {
>                        return DataType.DOUBLE;
>                } if (fieldType.equalsIgnoreCase("boolean")) {
>                        return DataType.BOOLEAN;
>                } else {
>                        return DataType.CHARARRAY;
>                }
>        }
>
>
> Now, you'll want to implement 'getSchema' in your custom loader:
>
> @Override
> public ResourceSchema getSchema(String location, Job job) throws
> IOException {
>     return new ResourceSchema(schema); // I'd actually cache this
> result if the schema is fixed
> }
>
> This should take care of the schema except you'd probably also need to
> serialize it to the  back-end so that
> you can enforce the schema inside 'getNext'.
>
> stan
>
> P.S. The above is essentially pseudo-code; I haven't actually type-checked
> it.
>
> On Fri, Feb 3, 2012 at 5:45 PM, praveenesh kumar <[email protected]>
> wrote:
> > Thanks Stan,
> > I was going through these only. I was wondering is there a easy way to do
> > it or am I reading something wrong.
> > Now I will focus on what you have suggested. but I hope there is some
> easy
> > solution to my problem
> >
> > Praveenesh
> >
> > On Sat, Feb 4, 2012 at 4:12 AM, Stan Rosenberg <
> > [email protected]> wrote:
> >
> >> Hi Praveenesh,
> >>
> >> Assuming you have already read these:
> >>
> >> http://ofps.oreilly.com/titles/9781449302641/load_and_store_funcs.html
> >> http://pig.apache.org/docs/r0.9.2/udf.html#load-store-functions
> >>
> >> my next step would be to peruse the source code of some existing
> >> loaders, e.g., PigStorage.
> >>
> >> Best,
> >>
> >> stan
> >>
> >>
> >> On Fri, Feb 3, 2012 at 5:35 PM, praveenesh kumar <[email protected]>
> >> wrote:
> >> > Thanks Stan,
> >> > If you were facing this kind of scenario, how would you have
> proceeded ?
> >> > Can you give me some pointers on how to write custom loader, some good
> >> > tutorials..on it
> >> > What is the current practice in order to solve the above scenario in
> pig
> >> ?
> >> >
> >> > Praveenesh
> >> >
> >> >
> >> > On Sat, Feb 4, 2012 at 4:02 AM, Stan Rosenberg <
> >> > [email protected]> wrote:
> >> >
> >> >> My hunch is you'll have to write a custom loader, but I'll let the
> >> >> experts chime in.  E.g., AvroStorage loader can parse the schema
> >> >> from a json file passed to it via the constructor.  I don't think
> >> >> PigStorage has the same option.
> >> >>
> >> >> stan
> >> >>
> >> >> On Fri, Feb 3, 2012 at 7:35 AM, praveenesh kumar <
> [email protected]>
> >> >> wrote:
> >> >> > Hey guys,
> >> >> >
> >> >> > I am new to Pig.
> >> >> > I was wondering is it possible to pass schema in pig load statement
> >> while
> >> >> > loading it first time.
> >> >> >
> >> >> > Suppose if I have a huge dataset.. containing around 100 cols.. Is
> >> there
> >> >> a
> >> >> > way through which I can pass the schema defined in some other file
> >> (some
> >> >> > kind of meta file) into pig load statement or do I have to define
> it
> >> >> every
> >> >> > time inside LOAD statement ?
> >> >> >
> >> >> > Thanks,
> >> >> > Praveenesh
> >> >>
> >>
>

Reply via email to