Hi All,

I would like to apply a  regression to my data. One of the workflow is the 
prepare my data as a JavaRDD<LabeledPoint>  starting from a Dataset<Row> with 
its header.  So, what I did was the following:

== Step 1: transform the Dataset<Row>  into JavaRDD<Row>
        JavaRDD<Row> dataPointsWithHeader =modelDS.toJavaRDD();


== Step 2: take the first row (I was thinking that it was the header)
Row header= dataPointsWithHeader.first();

== Step 3: eliminate the row header by
JavaRDD<Row> dataPointsWithoutHeader = dataPointsWithHeader.filter((Row row) -> 
{
                return !row.equals(header);
            });

The issue with the above approach are:

a) the result of the Step 2 is not the header row;
b) the application of the Step 3 is very inefficient in case there is a way to 
access to the header.

My question is:

Is the an efficient way to access to the header and eliminate it ?

Many Thanks in advance for your help and suggestion.

Regards,
Carlo
-- The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302). 
The Open University is authorised and regulated by the Financial Conduct 
Authority.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to