Re: Spark transformations

janardhan shetty Mon, 12 Sep 2016 10:11:11 -0700

Thanks Thunder. To copy the code base is difficult since we need to copy in
entirety or transitive dependency files as well.
If we need to do complex operations of taking a column as a whole instead
of each element in a row is not possible as of now.


Trying to find few pointers to easily solve this.

On Mon, Sep 12, 2016 at 9:43 AM, Thunder Stumpges <
thunder.stump...@gmail.com> wrote:

> Hi Janardhan,
>
> I have run into similar issues and asked similar questions. I also ran
> into many problems with private code when trying to write my own
> Model/Transformer/Estimator. (you might be able to find my question to the
> group regarding this, I can't really tell if my emails are getting through,
> as I don't get any responses). For now I have resorted to copying out the
> code that I need from the spark codebase and into mine. I'm certain this is
> not the best, but it has to be better than "implementing it myself" which
> was what the only response to my question said to do.
>
> As for the transforms, I also asked a similar question. The only way I've
> seen it done in code is using a UDF. As you mention, the UDF can only
> access fields on a "row by row" basis. I have not gotten any replies at all
> on my question, but I also need to do some more complicated operation in my
> work (join to another model RDD, flat-map, calculate, reduce) in order to
> get the value for the new column. So far no great solution.
>
> Sorry I don't have any answers, but wanted to chime in that I am also a
> bit stuck on similar issues. Hope we can find a workable solution soon.
> Cheers,
> Thunder
>
>
>
> On Tue, Sep 6, 2016 at 1:32 PM janardhan shetty <janardhan...@gmail.com>
> wrote:
>
>> Noticed few things about Spark transformers just wanted to be clear.
>>
>> Unary transformer:
>>
>> createTransformFunc: IN => OUT  = { *item* => }
>> Here *item *is single element and *NOT* entire column.
>>
>> I would like to get the number of elements in that particular column.
>> Since there is *no forward checking* how can we get this information ?
>> We have visibility into single element and not the entire column.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Sep 4, 2016 at 9:30 AM, janardhan shetty <janardhan...@gmail.com>
>> wrote:
>>
>>> In scala Spark ML Dataframes.
>>>
>>> On Sun, Sep 4, 2016 at 9:16 AM, Somasundaram Sekar <somasundar.sekar@
>>> tigeranalytics.com> wrote:
>>>
>>>> Can you try this
>>>>
>>>> https://www.linkedin.com/pulse/hive-functions-udfudaf-
>>>> udtf-examples-gaurav-singh
>>>>
>>>> On 4 Sep 2016 9:38 pm, "janardhan shetty" <janardhan...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there any chance that we can send entire multiple columns to an udf
>>>>> and generate a new column for Spark ML.
>>>>> I see similar approach as VectorAssembler but not able to use few
>>>>> classes /traitslike HasInputCols, HasOutputCol, DefaultParamsWritable 
>>>>> since
>>>>> they are private.
>>>>>
>>>>> Any leads/examples is appreciated in this regard..
>>>>>
>>>>> Requirement:
>>>>> *Input*: Multiple columns of a Dataframe
>>>>> *Output*:  Single new modified column
>>>>>
>>>>
>>>
>>

Re: Spark transformations

Reply via email to