Thanks a lot DB.

I will test it and let you know the results.

BR,
Aslan


On Fri, Jun 13, 2014 at 12:34 AM, DB Tsai <dbt...@stanford.edu> wrote:

> Hi Asian,
>
> I'm not sure if mlbase code is maintained for the current spark
> master. The following is the code we use for standardization in my
> company. I'm intended to clean up, and submit a PR. You could use it
> for now.
>
>   def standardize(data: RDD[Vector]): RDD[Vector] = {
>     val summarizer = new RowMatrix(data).computeColumnSummaryStatistics
>     val mean = summarizer.mean
>     val variance = summarizer.variance
>
>     // The standardization will always densify the output, so the output
>     // will be stored in dense vector.
>     data.map(x => {
>       val n = x.toBreeze.length
>       val output = BDV.zeros[Double](n)
>       var i = 0
>       while(i < n) {
>         if(variance(i) == 0) {
>           output(i) = Double.NaN
>         } else {
>           output(i) = (x(i) - mean(i)) / Math.sqrt(variance(i))
>         }
>         i += 1
>       }
>       Vectors.fromBreeze(output)
>     })
>   }
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Thu, Jun 12, 2014 at 1:49 AM, Aslan Bekirov <aslanbeki...@gmail.com>
> wrote:
> > Hi DB,
> >
> > I found a piece of code that uses znorm to normalize data.
> >
> >
> > /**
> >  * build training data set from sample and summary data
> >  */
> >  val train_data = sample_data.map( v =>
> >    Array.tabulate[Double](field_cnt)(
> >      i => zscore(v._2(i),sample_mean(i),sample_stddev(i))
> >    )
> >  ).cache
> >
> > Please make your comments if you find something wrong.
> >
> > BR,
> > Aslan
> >
> >
> >
> > On Thu, Jun 12, 2014 at 11:13 AM, Aslan Bekirov <aslanbeki...@gmail.com>
> > wrote:
> >>
> >> Thanks a lot DB.
> >>
> >> I will try to do Znorm normalization using map transformation.
> >>
> >>
> >> BR,
> >> Aslan
> >>
> >>
> >> On Thu, Jun 12, 2014 at 12:16 AM, DB Tsai <dbt...@stanford.edu> wrote:
> >>>
> >>> Hi Aslan,
> >>>
> >>> Currently, we don't have the utility function to do so. However, you
> >>> can easily implement this by another map transformation. I'm working
> >>> on this feature now, and there will be couple different available
> >>> normalization option users can chose.
> >>>
> >>> Sincerely,
> >>>
> >>> DB Tsai
> >>> -------------------------------------------------------
> >>> My Blog: https://www.dbtsai.com
> >>> LinkedIn: https://www.linkedin.com/in/dbtsai
> >>>
> >>>
> >>> On Wed, Jun 11, 2014 at 6:25 AM, Aslan Bekirov <aslanbeki...@gmail.com
> >
> >>> wrote:
> >>> > Hi All,
> >>> >
> >>> > I have to normalize a set of values in the range 0-500 to the [0-1]
> >>> > range.
> >>> >
> >>> > Is there any util method in MLBase to normalize large set of data?
> >>> >
> >>> > BR,
> >>> > Aslan
> >>
> >>
> >
>

Reply via email to