Re: [Spark R]: Linear Mixed-Effects Models in Spark R

2018-03-26 Thread Nisha Muktewar
Look at LinkedIn's Photon ML package: https://github.com/linkedin/photon-ml

One of the caveats is/was that the input data has to be in Avro in a
specific format.

On Mon, Mar 26, 2018 at 1:46 PM, Josh Goldsborough <
joshgoldsboroughs...@gmail.com> wrote:

> The company I work for is trying to do some mixed-effects regression
> modeling in our new big data platform including SparkR.
>
> We can run via SparkR's support of native R & use lme4.  But it runs
> single threaded.  So we're looking for tricks/techniques to process large
> data sets.
>
>
> This was asked a couple years ago:
> https://stackoverflow.com/questions/39790820/mixed-
> effects-models-in-spark-or-other-technology
>
> But I wanted to ask again, in case anyone had an answer now.
>
> Thanks,
> Josh Goldsborough
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
The OneHotEncoder does *not* accept multiple columns.

You can use Michal's suggestion where he uses Pipeline to set the stages
and then executes them.

The other option is to write a function that performs one hot encoding on a
column and returns a dataframe with the encoded column and then call it
multiple times for the rest of the columns.




On Wed, Aug 17, 2016 at 10:59 AM, janardhan shetty <janardhan...@gmail.com>
wrote:

> I had already tried this way :
>
> scala> val featureCols = Array("category","newone")
> featureCols: Array[String] = Array(category, newone)
>
> scala>  val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
> :29: error: type mismatch;
>  found   : Array[String]
>  required: String
> val indexer = new StringIndexer().setInputCol(
> featureCols).setOutputCol("categoryIndex").fit(df1)
>
>
> On Wed, Aug 17, 2016 at 10:56 AM, Nisha Muktewar <ni...@cloudera.com>
> wrote:
>
>> I don't think it does. From the documentation:
>> https://spark.apache.org/docs/2.0.0-preview/ml-features.html
>> #onehotencoder, I see that it still accepts one column at a time.
>>
>> On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty <
>> janardhan...@gmail.com> wrote:
>>
>>> 2.0:
>>>
>>> One hot encoding currently accepts single input column is there a way to
>>> include multiple columns ?
>>>
>>
>>
>


Re: Spark ML : One hot Encoding for multiple columns

2016-08-17 Thread Nisha Muktewar
I don't think it does. From the documentation:
https://spark.apache.org/docs/2.0.0-preview/ml-features.html#onehotencoder,
I see that it still accepts one column at a time.

On Wed, Aug 17, 2016 at 10:18 AM, janardhan shetty 
wrote:

> 2.0:
>
> One hot encoding currently accepts single input column is there a way to
> include multiple columns ?
>