Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-16 Thread Bryan Cutler
Yes, the workaround is to create multiple StringIndexers as you described.
OneHotEncoderEstimator is only in Spark 2.3.0, you will have to use just
OneHotEncoder.

On Tue, May 15, 2018, 8:40 AM Mina Aslani  wrote:

> Hi,
>
> So, what is the workaround? Should I create multiple indexer(one for each
> column), and then create pipeline and set stages to have all the
> StringIndexers?
> I am using 2.2.1 as I cannot move to 2.3.0. Looks like
> oneHotEncoderEstimator is broken, please see my email sent today with
> subject:
> OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql
> .Dataset.withColumns
>
> Regards,
> Mina
>
> On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath 
> wrote:
>
>> Multi column support for StringIndexer didn’t make it into Spark 2.3.0
>>
>> The PR is still in progress I think - should be available in 2.4.0
>>
>> On Mon, 14 May 2018 at 22:32, Mina Aslani  wrote:
>>
>>> Please take a look at the api doc:
>>> https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>>>
>>> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani 
>>> wrote:
>>>
 Hi,

 There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
 How multiple input/output columns can be specified then?

 Regards,
 Mina

>>>
>>>
>


Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-15 Thread Mina Aslani
Hi,

So, what is the workaround? Should I create multiple indexer(one for each
column), and then create pipeline and set stages to have all the
StringIndexers?
I am using 2.2.1 as I cannot move to 2.3.0. Looks like
oneHotEncoderEstimator is broken, please see my email sent today with
subject:
OneHotEncoderEstimator - java.lang.NoSuchMethodError: org.apache.spark.sql
.Dataset.withColumns

Regards,
Mina

On Tue, May 15, 2018 at 2:37 AM, Nick Pentreath 
wrote:

> Multi column support for StringIndexer didn’t make it into Spark 2.3.0
>
> The PR is still in progress I think - should be available in 2.4.0
>
> On Mon, 14 May 2018 at 22:32, Mina Aslani  wrote:
>
>> Please take a look at the api doc: https://spark.apache.org/
>> docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>>
>> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani 
>> wrote:
>>
>>> Hi,
>>>
>>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
>>> How multiple input/output columns can be specified then?
>>>
>>> Regards,
>>> Mina
>>>
>>
>>


Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Nick Pentreath
Multi column support for StringIndexer didn’t make it into Spark 2.3.0

The PR is still in progress I think - should be available in 2.4.0

On Mon, 14 May 2018 at 22:32, Mina Aslani  wrote:

> Please take a look at the api doc:
> https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html
>
> On Mon, May 14, 2018 at 4:30 PM, Mina Aslani  wrote:
>
>> Hi,
>>
>> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
>> How multiple input/output columns can be specified then?
>>
>> Regards,
>> Mina
>>
>
>


Re: How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Please take a look at the api doc:
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/ml/feature/StringIndexer.html

On Mon, May 14, 2018 at 4:30 PM, Mina Aslani  wrote:

> Hi,
>
> There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
> How multiple input/output columns can be specified then?
>
> Regards,
> Mina
>


How to use StringIndexer for multiple input /output columns in Spark Java

2018-05-14 Thread Mina Aslani
Hi,

There is no SetInputCols/SetOutputCols for StringIndexer in Spark java.
How multiple input/output columns can be specified then?

Regards,
Mina