Thanks a lot DB. I will test it and let you know the results.
BR, Aslan On Fri, Jun 13, 2014 at 12:34 AM, DB Tsai <dbt...@stanford.edu> wrote: > Hi Asian, > > I'm not sure if mlbase code is maintained for the current spark > master. The following is the code we use for standardization in my > company. I'm intended to clean up, and submit a PR. You could use it > for now. > > def standardize(data: RDD[Vector]): RDD[Vector] = { > val summarizer = new RowMatrix(data).computeColumnSummaryStatistics > val mean = summarizer.mean > val variance = summarizer.variance > > // The standardization will always densify the output, so the output > // will be stored in dense vector. > data.map(x => { > val n = x.toBreeze.length > val output = BDV.zeros[Double](n) > var i = 0 > while(i < n) { > if(variance(i) == 0) { > output(i) = Double.NaN > } else { > output(i) = (x(i) - mean(i)) / Math.sqrt(variance(i)) > } > i += 1 > } > Vectors.fromBreeze(output) > }) > } > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Thu, Jun 12, 2014 at 1:49 AM, Aslan Bekirov <aslanbeki...@gmail.com> > wrote: > > Hi DB, > > > > I found a piece of code that uses znorm to normalize data. > > > > > > /** > > * build training data set from sample and summary data > > */ > > val train_data = sample_data.map( v => > > Array.tabulate[Double](field_cnt)( > > i => zscore(v._2(i),sample_mean(i),sample_stddev(i)) > > ) > > ).cache > > > > Please make your comments if you find something wrong. > > > > BR, > > Aslan > > > > > > > > On Thu, Jun 12, 2014 at 11:13 AM, Aslan Bekirov <aslanbeki...@gmail.com> > > wrote: > >> > >> Thanks a lot DB. > >> > >> I will try to do Znorm normalization using map transformation. > >> > >> > >> BR, > >> Aslan > >> > >> > >> On Thu, Jun 12, 2014 at 12:16 AM, DB Tsai <dbt...@stanford.edu> wrote: > >>> > >>> Hi Aslan, > >>> > >>> Currently, we don't have the utility function to do so. However, you > >>> can easily implement this by another map transformation. I'm working > >>> on this feature now, and there will be couple different available > >>> normalization option users can chose. > >>> > >>> Sincerely, > >>> > >>> DB Tsai > >>> ------------------------------------------------------- > >>> My Blog: https://www.dbtsai.com > >>> LinkedIn: https://www.linkedin.com/in/dbtsai > >>> > >>> > >>> On Wed, Jun 11, 2014 at 6:25 AM, Aslan Bekirov <aslanbeki...@gmail.com > > > >>> wrote: > >>> > Hi All, > >>> > > >>> > I have to normalize a set of values in the range 0-500 to the [0-1] > >>> > range. > >>> > > >>> > Is there any util method in MLBase to normalize large set of data? > >>> > > >>> > BR, > >>> > Aslan > >> > >> > > >