Thanks. Created: https://issues.apache.org/jira/browse/SPARK-26616
On Mon, Jan 14, 2019 at 9:19 PM Sean Owen wrote:
> Yes that seems OK to me.
>
> On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri wrote:
> >
> > Thanks for the response. So do I go ahead and create a jira ticket?
> > Can then send a pu
Yes that seems OK to me.
On Mon, Jan 14, 2019 at 9:40 AM Jatin Puri wrote:
>
> Thanks for the response. So do I go ahead and create a jira ticket?
> Can then send a pull request for the same with the changes.
>
> On Mon, Jan 14, 2019 at 8:18 PM Sean Owen wrote:
>>
>> I think that's reasonable. T
Thanks for the response. So do I go ahead and create a jira ticket?
Can then send a pull request for the same with the changes.
On Mon, Jan 14, 2019 at 8:18 PM Sean Owen wrote:
> I think that's reasonable. The caller probably has the number of docs
> already but sure, it's one long and is alread
I think that's reasonable. The caller probably has the number of docs
already but sure, it's one long and is already computed. This would
have to be added to Pyspark too.
On Mon, Jan 14, 2019 at 7:56 AM Jatin Puri wrote:
>
> Hello.
>
> As part of `org.apache.spark.ml.feature.IDFModel`, I think it
Hello.
As part of `org.apache.spark.ml.feature.IDFModel`, I think it is a good
idea to also expose:
1. Document frequency vector
2. Number of documents
We get the above for free currently and they just need to be exposed as
public val.
This avoids re-implementation for someone who needs to comp