Thanks for the message.

There is an open issue about the public type / schema system that is
related to this topic:

You probably want to comment on that ticket as well.

On Sat, Jun 21, 2014 at 7:52 AM, guxiaobo1982 <> wrote:

> Hi ,
> The current implementation of JavaSchemaRDD need a special JavaBean class
> to define schema information for tables, but when developing applications
> using the Spark SQL API, table is a more dynamic component, the awkward
> thing is when new tables are defined, we must create a new JavaBean, and
> redeploy the whole application. So here comes an idea regarding a more
> general schema registration method,
> Step1: Defile a new Java class named RowSchema in API to define column
> information, column name and data type are most important ones.
> Step2: the actual data is store just as JavaRDD<Row>;
> Step3:When loading data into JavaRDD<Row>, the API provides a general map
> function, which takes a RowSchema object as parameter, to map each line to
> a Row object.
> Step4: Add a new applySchema method , which takes a RowSchema object as
> parameter, to the JavaSQLContext class ,
> Step 5: The registerAsTable and all other SQL releated methods of
> JavaSQLContext class should take care of the difference of defining schema
> throw JavaBean and RowSchemas.(That’s the work of the API layer)
> The API is something like this:
> Public Class RowSchema{
> Public RowSchema(List<String> colNames, List<String> colDataTypes);
> Public String getColName(integer i);//return column string of column I;
> Public integer getColDataType(integer i);//return data type of column I;
> Public integer getColNumber();// return number of columns
> };
> RowSchema rs = new RowSchema(……);
> JavaRDD<Row> table = ctx.textFile(“file path”).map(rs);
> JavaSchemaRDD schemaPeople = sqlCtx.applySchema(table, rs);
> schemaPeople.registerAsTable("people");
> Regards,
>  Xiaobo Gu

Reply via email to