This would work, but the goal would be to *not* invoke local interactive
pig to execute a LOAD USING PigStorage() and pass the data into the UDF.  I
was hoping to keep this completely in the Java and JUnit testing universe.

Looking over the PigStorage()
doc<https://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/PigStorage.html>,
would you know how to construct this process from a baseline PigStorage
Object, such as:

PigStorage pigstorage = new PigStorage();

Any ideas?

-Dan

On Tue, Mar 19, 2013 at 12:08 PM, Jonathan Coveney <[email protected]>wrote:

> I definitely understand the benefits, I just wanted to understand your
> workflow so could weigh in with what I would do.
>
> In your case, if you're going to be making these by hand, then I would
> mimic what PigStorage outputs, and then just load it in using PigStorage.
>
>
> 2013/3/19 Dan DeCapria, CivicScience <[email protected]>
>
> > By hand; creating a new JUnit method to test a specific use case against
> a
> > functional requirement in the UDF.
> >
> > The UDFs I am testing are part of a larger ETL testing initiative I have
> > been undertaking.  To ensure that the various states of legacy data are
> > correctly extracted and transformed into a Pig context, I am creating
> > specific JUnit tests per each UDF containing specific use cases as
> testing
> > methods.
> >
> > Motivation to use String inputs for the Data Objects and Schema Objects
> is
> > the improvement on the conventional approach - creating Java Objects and
> > adding and appending nested Objects to create the desired complex type
> > DataBag Object to pass to the UDF as use case input. This simpler process
> > I'm looking for should improve scale-ability and rapid-prototyping within
> > the testing scripts.  It will also make the process more approachable for
> > another programmer to write additional unit tests.
> >
> > -Dan
> >
> > On Tue, Mar 19, 2013 at 11:43 AM, Jonathan Coveney <[email protected]
> > >wrote:
> >
> > > How are you planning on generating these cases? By hand? Or automated?
> > >
> > >
> > > 2013/3/19 Dan DeCapria, CivicScience <[email protected]>
> > >
> > > > String string_databag in this example was typed out by me, as the
> input
> > > > String for a JUnit test method. I am considering generating many of
> > these
> > > > for case specific unit testing of my UDFs.
> > > >
> > > > -Dan
> > > >
> > > > On Tue, Mar 19, 2013 at 11:27 AM, Jonathan Coveney <
> [email protected]
> > > > >wrote:
> > > >
> > > > > how was string_databag generated?
> > > > >
> > > > >
> > > > > 2013/3/19 Dan DeCapria, CivicScience <
> [email protected]>
> > > > >
> > > > > > Expanding upon this, the following use case's Schema Object can
> be
> > > > > resolved
> > > > > > from inputs:
> > > > > >
> > > > > >         String string_databag = "{(a,(b,d),f)}";
> > > > > >         String string_schema =
> > > > > >
> > "b1:bag{t1:tuple(a:chararray,t2:tuple(b:chararray,d:long),f:long)}";
> > > > > >         Schema schema = Utils.getSchemaFromString(string_schema);
> > > > > >
> > > > > > Next step is to resolve a DataBag Object from String
> string_databag
> > > and
> > > > > the
> > > > > > Schema Object.
> > > > > >
> > > > > > -Dan
> > > > > >
> > > > > > On Tue, Mar 19, 2013 at 9:37 AM, Dan DeCapria, CivicScience <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Thank you for your reply.
> > > > > > >
> > > > > > > The problem is I cannot find a methodology to go from a String
> > > > > > > representation of a complex data type to a nested Object of pig
> > > > > > DataTypes.
> > > > > > > I looked over the pig 0.10.1 docs, but cannot find a way to go
> > from
> > > > > > String
> > > > > > > and Schema to pig DataType Object.
> > > > > > >
> > > > > > > For context, I am generating these Strings for my own JUnit
> > testing
> > > > of
> > > > > > > other UDFs.  Currently, for complex types, I have to generate
> > each
> > > > > > nesting
> > > > > > > from Tuple and DataBag factories, append data, and next them
> > > > manually.
> > > > > >  For
> > > > > > > larger unit tests, this process becomes unwieldy (hundreds of
> > lines
> > > > per
> > > > > > > method, non-dynamic), and it would be much simpler to go
> directly
> > > > from
> > > > > a
> > > > > > > String and a Schema to a DataBag Object for UDF testing (few
> > lines
> > > of
> > > > > > code,
> > > > > > > easily modifiable).
> > > > > > >
> > > > > > > -Dan
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Mar 18, 2013 at 6:31 PM, Jonathan Coveney <
> > > > [email protected]
> > > > > > >wrote:
> > > > > > >
> > > > > > >> Why not just use PigStorage? This is essentially what it does.
> > It
> > > > > saves
> > > > > > a
> > > > > > >> bag as text, and then loads it again.
> > > > > > >>
> > > > > > >> I suppose the question becomes: why do you need to do this?
> > > > > > >>
> > > > > > >>
> > > > > > >> 2013/3/18 Dan DeCapria, CivicScience <
> > > [email protected]
> > > > >
> > > > > > >>
> > > > > > >> > In Java, I am trying to convert a DataBag from it's String
> > > > > > >> representation
> > > > > > >> > with its schema String to a valid DataBag Object:
> > > > > > >> >
> > > > > > >> > String databag_string = "{(apples,1024)}";
> > > > > > >> > String schema_string =
> "b1:bag{t1:tuple(a:chararray,b:long)}";
> > > > > > >> >
> > > > > > >> > I've tried implementing something along the lines of this,
> > but I
> > > > > > believe
> > > > > > >> > it's in the wrong direction, and then I get stuck:
> > > > > > >> >
> > > > > > >> >         String[] aliases = {"b1", "t1", "a", "b"};
> > > > > > >> >         byte[] types = {DataType.BAG, DataType.TUPLE,
> > > > > > >> DataType.CHARARRAY,
> > > > > > >> > DataType.LONG};
> > > > > > >> >         List<Schema.FieldSchema> fsList = new
> > > > > > >> > ArrayList<Schema.FieldSchema>();
> > > > > > >> >         for (int i = 0; i < aliases.length; i++) {
> > > > > > >> >             fsList.add(new Schema.FieldSchema(aliases[i],
> > > > > types[i])) ;
> > > > > > >> >         }
> > > > > > >> >         Schema origSchema = new Schema(fsList);
> > > > > > >> >         ResourceSchema rsSchema = new
> > > ResourceSchema(origSchema);
> > > > > > >> >         Schema genSchema = Schema.getPigSchema(rsSchema);
> > > > > > >> >         ResourceSchema.ResourceFieldSchema[] rfschema =
> > > > > > >> > rsSchema.getFields();
> > > > > > >> >         ... lost here, maybe Utf8StorageConverter c = new
> > > > > > >> > Utf8StorageConverter(); ???
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > An ideal process would be along the lines of:
> > > > > > >> >
> > > > > > >> > DataBag d = BagFactory.getInstance().newDefaultBag();
> > > > > > >> > d.something(databag_string, schema_string);    // ??? no
> idea
> > > what
> > > > > > this
> > > > > > >> > process could be
> > > > > > >> > d.toString().equals(databag_string) == true.
> > > > > > >> >
> > > > > > >> > Thanks, -Dan
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Dan DeCapria
> > > > > > > CivicScience, Inc.
> > > > > > > Senior Informatics / DM / ML / BI Specialist
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Dan DeCapria
> > > > > > CivicScience, Inc.
> > > > > > Senior Informatics / DM / ML / BI Specialist
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Dan DeCapria
> > > > CivicScience, Inc.
> > > > Senior Informatics / DM / ML / BI Specialist
> > > >
> > >
> >
> >
> >
> > --
> > Dan DeCapria
> > CivicScience, Inc.
> > Senior Informatics / DM / ML / BI Specialist
> >
>



-- 
Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist

Reply via email to