You can create tf vectors and then use
RowMatrix.computeColumnSummaryStatistics to get df (numNonzeros). For
tokenizer and stemmer, you can use scalanlp/chalk. Yes, it is worth
having a simple interface for it. -Xiangrui

On Fri, Jun 13, 2014 at 1:21 AM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
> Hi all,
>
>
>
> I wanted to perform Text Classification using Spark1.0 Naïve Bayes. I was
> looking for the way to convert text into sparse vector with TFIDF weighting
> scheme.
>
> I found that MLI library supports that but it is compatible with Spark 0.8.
>
>
>
> What are all the options available to achieve text vectorization. Is there
> any pre-built api’s which can be used or other way in which we can achieve
> this
>
> Please suggest
>
>
>
> Thanks
>
> Stuti Awasthi
>
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior written
> consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error please
> delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to