Sorry, Looks like my suggestion won't help unless you were able to specify the schema with the original load statement. If the number of field is ONLY available at runtime but each row have the same number field and you know the position of join key, then I have a ugly approach. First, sample the first line to get the number of fields. Write a UDF that takes all fields of the data. Pass the number to UDF and override the method public Schema outputSchema(Schema input) to output a complete schema. your exec method would return the tuple with same length as input tuple and convert each item in tuple to the datatype you know. The resulting relation should have valid schema. But I don't know how to pass the number to UDF efficiently. I hope some one can have better suggestions. Thanks,
On Mon, Jan 7, 2013 at 5:48 PM, Chan, Tim <[email protected]> wrote: > Hi Jinyuan, > > Since I don't know how many columns I will have, I do something like this. > > six_month_and_variable_month_sales_2 = FOREACH > six_month_and_variable_month_sales > GENERATE $0 AS ed_style_id, > $1 AS sale_start_month, > $2 AS sale_month_1, > $3 AS sale_month_2, > $4 AS sale_month_3, > $5 AS sale_month_4, > $6 AS sale_month_5, > $7 AS sale_month_6, > $8 ..; > > I still get the same error when I try to join on this relation. > > > > > On Mon, Jan 7, 2013 at 2:27 PM, Jinyuan Zhou <[email protected]> > wrote: > > > If you can load it but join operation need the complete schema, then you > > can try do a generate statement to project your original relation to > > produce the one you can define schema for all fields. > > > > On Mon, Jan 7, 2013 at 2:19 PM, Chan, Tim <[email protected]> wrote: > > > > > Is it possible to declare a schema when doing a LOAD for data in which > > you > > > do not know the total number of columns? > > > > > > For instance. I know the data contains 6 or more columns. These columns > > are > > > of the same data type. > > > > > > I basically want to join this data with another data set, but I was > > getting > > > the following error: > > > > > > ERROR 1109: Input (six_month_and_variable_month_sales) on which outer > > > join is desired should have a valid schema > > > > > > > > > > > -- > > -- Jinyuan (Jack) Zhou > > > -- -- Jinyuan (Jack) Zhou
