Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-02 Thread DB Tsai
. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
this? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Sun, Mar 2, 2014 at 10:23 AM, Debasish Das debasish.da...@gmail.com wrote: Hi DB, 1. Could you point to the BFGS repositories used to publish artifacts

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
. Is this getting merged to the master or there will be revisions on it ? https://github.com/apache/spark/pull/53 Thanks. Deb Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread DB Tsai
to fix the L-BFGS in breeze, and we can get OWL-QN and L-BFGS-B. What do you think? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Mon, Mar 3, 2014 at 3:52 PM, DB Tsai dbt...@alpinenow.com wrote: Hi Deb

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-04 Thread DB Tsai
, let's have a design discussion around this. It may be more effective since we can design a architecture that have to work for both cases in the codebase, and will be easier to think about the edge case for it. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread DB Tsai
for you to investigate the issue? Or do I need to make it as a standalone test? Will send you the test later today. Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-06 Thread DB Tsai
Iteration 29: loss 0.30788249908237314, diff 0.23885980452569502 Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Wed, Mar 5, 2014 at 2:00 PM, David Hall d...@cs.berkeley.edu wrote: On Wed, Mar 5, 2014 at 1:57 PM

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-07 Thread DB Tsai
Hi Xiangrui, I think it doesn't matter whether we use Fortran/Breeze/RISO for optimizers since optimization only takes 1% of time. Most of the time is in gradientSum and lossSum parallel computation. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-07 Thread DB Tsai
Hi guys, The latest PR uses Breeze's L-BFGS implement which is introduced by Xiangrui's sparse input format work in SPARK-1212. https://github.com/apache/spark/pull/353 Now, it works with the new sparse framework! Any feedback would be greatly appreciated. Thanks. Sincerely, DB Tsai

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
stepSize: Double, var numIterations: Int, var regParam: Double, var miniBatchFraction: Double Xiangrui, what do you think? For now, you can use my L-BFGS solver by copying and pasting the LogisticRegressionWithSGD code, and changing the optimizer to L-BFGS. Sincerely, DB Tsai

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
computation per RDD is done on each of the workers... This miniBatchFraction is also a heuristic which I don't think makes sense for LogisticRegressionWithBFGS...does it ? On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai dbt...@stanford.edu wrote: Hi Debasish, The L-BFGS solver will be in the master like

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread DB Tsai
I don't experiment it. That's the use-case in theory I could think of. ^^ However, from what I saw, BFGS converges really fast so that I only need 20~30 iterations in general. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn

Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
/dbtsai/.rbenv/versions/2.1.1/lib/ruby/gems/2.1.0/gems/commander-4.1.6/lib/commander/import.rb:10:in `block in top (required)' Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 22

Jekyll documentation generation error

2014-04-23 Thread DB Tsai
But what doesSKIP_SCALADOC=1 mean? export SKIP_SCALADOC=1? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Jekyll documentation generation error

2014-04-23 Thread DB Tsai
Matei, thanks. It works with kramdown. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 22, 2014 at 11:38 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Try doing “gem install

MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
result. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
a 1% (or so) sparse dataset to experiment with? On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote: Hi all, I'm benchmarking Logistic Regression in MLlib using the newly added optimizer LBFGS and GD. I'm using the same dataset and the same methodology in this paper, http

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Sincerely, DB Tsai --- My Blog

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
ps, it doesn't make sense to have weight and gradient sparse unless with strong L1 penalty. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai dbt

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
Not yet since it's running in the cluster. Will run locally with profiler. Thanks for help. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:22 PM, David Hall d

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread DB Tsai
The figure showing the Log-Likelihood vs Time can be found here. https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf Let me know if you can not open it. Thanks. Sincerely, DB Tsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
. The difference you saw is actually from dense feature or sparse feature vector. For LBFGS and GD dense feature, you can see the first iteration takes the same time. It's true for GD. I'm going to run rcv1.binary which only has 0.15% non-zero elements to verify the hypothesis. Sincerely, DB Tsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
, Vectors.fromBreeze(gradientSum / miniBatchSize), stepSize, i, regParam) weights = update._1 regVal = update._2 timeStamp.append(System.nanoTime() - startTime) } Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-24 Thread DB Tsai
rcv1.binary is too sparse (0.15% non-zero elements), so dense format will not run due to out of memory. But sparse format runs really well. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread DB Tsai
Also, how many failure of rejection will terminate the optimization process? How is it related to numberOfImprovementFailures? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread DB Tsai
, miniBatchFraction, lbfgs, miniBatchSize) val states = lbfgs.iterations(new CachedDiffFunction(costFun), initialWeights.toBreeze.toDenseVector) Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com

Code Review for SPARK-1516: Throw exception in yarn client instead of System.exit

2014-04-29 Thread DB Tsai
straightforward, we wonder if this can be reviewed and have this in 1.0. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread DB Tsai
in Spark before we've deeper understanding of how stochastic LBFGS works. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 29, 2014 at 9:50 PM, David Hall d...@cs.berkeley.edu wrote

Re: reduce, transform, combine

2014-05-04 Thread DB Tsai
You could easily achieve this by mapPartition. However, it seems that it can not be done by using aggregate type of operation. I can see that it's a general useful operation. For now, you could use mapPartition. Sincerely, DB Tsai --- My Blog

Re: mllib vector templates

2014-05-05 Thread DB Tsai
+1 Would be nice that we can use different type in Vector. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:41 PM, Debasish Das debasish.da...@gmail.comwrote: Hi

Re: mllib vector templates

2014-05-05 Thread DB Tsai
Breeze could take any type (Int, Long, Double, and Float) in the matrix template. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da

Re: Multinomial Logistic Regression

2014-05-13 Thread DB Tsai
_ = math.log(numerators(math.round(y - 1).toInt) / denominator) } (loglike, predicted) } Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, May 13, 2014 at 4:08 AM, Debasish Das

Calling external classes added by sc.addJar needs to be through reflection

2014-05-16 Thread DB Tsai
will not be seen. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The reflection actually works. But you need to get the loader by `val loader = Thread.currentThread.getContextClassLoader` which is set by Spark executor. Our team verified this, and uses it as workaround. Sincerely, DB Tsai --- My Blog

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
The jars are included in my driver, and I can successfully use them in the driver. I'm working on a patch, and it's almost working. Will submit a PR soon. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

Fwd: Calling external classes added by sc.addJar needs to be through reflection

2014-05-18 Thread DB Tsai
the customClassloader to create a wrapped class, and in this wrapped class, the classloader is inherited from the customClassloader so that users don't need to do reflection in the wrapped class. I'm working on this now. Sincerely, DB Tsai --- My Blog: https

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
: Method = classOf[URLClassLoader].getDeclaredMethod(addURL, classOf[URL]) method.setAccessible(true) method.invoke(loader, url) } catch { case t: Throwable = { throw new IOException(Error, could not add URL to system classloader) } } } Sincerely, DB

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
the protected method `addURL` which will not work and throw exception if the code is wrapped in security manager. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, May 21, 2014 at 1:13 PM, Sandy

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
the primary jar is not in the system loader but custom one, so when we reference those jars added dynamically, we can find it without reflection. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-21 Thread DB Tsai
environment. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, May 21, 2014 at 2:57 PM, Koert Kuipers ko...@tresata.com wrote: db tsai, i do not think userClassPathFirst is working

Re: Standard preprocessing/scaling

2014-05-28 Thread DB Tsai
Sometimes for this case, I will just standardize without centerization. I still get good result. Sent from my Google Nexus 5 On May 28, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: RowMatrix has a method to compute column summary statistics. There is a trade-off here because centering

Re: Int tolerance in LBFGS.setConvergenceTol causes problems

2014-06-17 Thread DB Tsai
Hi Gang, This is a bug, and I'm the one who did it :) Just add the comment to your PR. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Jun 17, 2014 at 7:13 PM, Gang Bai

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-05 Thread DB Tsai
+1 On Jul 5, 2014 1:39 PM, Michael Armbrust mich...@databricks.com wrote: +1 I tested sql/hive functionality. On Sat, Jul 5, 2014 at 9:30 AM, Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, Jul 4, 2014 at 12:40 PM, Patrick Wendell pwend...@gmail.com wrote: I'll start

SBT gen-idea doesn't work well after merging SPARK-1776

2014-07-14 Thread DB Tsai
) at java.lang.reflect.Method.invoke(Method.java:597) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: [VOTE] Release Apache Spark 0.9.2 (RC1)

2014-07-17 Thread DB Tsai
+1 Tested with my Ubuntu Linux. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Jul 17, 2014 at 6:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac, verified

Re: OWLQN

2014-07-18 Thread DB Tsai
I'm working on it with weighted regularization. The problem is that OWLQN doesn't work nicely with Updater now since all the L1 logic should be in OWLQN instead of L1Updater. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-06 Thread DB Tsai
One related question, is mllib jar independent from hadoop version (doesnt use hadoop api directly)? Can I use mllib jar compile for one version of hadoop and use it in another version of hadoop? Sent from my Google Nexus 5 On Aug 6, 2014 8:29 AM, Debasish Das debasish.da...@gmail.com wrote: Hi

Re: Buidling spark in Eclipse Kepler

2014-08-06 Thread DB Tsai
After sbt gen-idea , you can open the intellji project directly without going through pom.xml If u want to compile inside intellji, you have to remove one of the messo jar. This is an open issue, and u can find the detail in JIRA. Sent from my Google Nexus 5 On Aug 6, 2014 8:54 PM, Ron Gonzalez

Re: [mllib] LogisticRegressionWithLBFGS interface is not consistent with LogisticRegressionWithSGD

2014-09-13 Thread DB Tsai
. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Sat, Sep 13, 2014 at 2:12 AM, Yanbo Liang yanboha...@gmail.com wrote: Hi All, I found that LogisticRegressionWithLBFGS interface

Re: Fwd: Breeze Library usage in Spark

2014-10-03 Thread DB Tsai
You dont have to include breeze jar which is already in spark assembly jar. For native one, its optional. Sent from my Google Nexus 5 On Oct 3, 2014 8:04 PM, Priya Ch learnings.chitt...@gmail.com wrote: yes. I have included breeze-0.9 in build.sbt file. I ll change this to 0.7. Apart from

Re: [MLlib] LogisticRegressionWithSGD and LogisticRegressionWithLBFGS converge with different weights.

2014-10-09 Thread DB Tsai
, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, Sep 29, 2014 at 11:45 AM, Yanbo Liang yanboha...@gmail.com wrote: Thank you for all your patient response. I can conclude that if the data

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread DB Tsai
, there are different strategies to do feature scalling for linear regression and logistic regression; as a result, we don't want to make it public api naively without addressing different use-case. Sincerely, DB Tsai --- My Blog: https

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try ./make-distribution.sh -Pyarn -Phive -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.3 -DskipTests See the detail of how to change the profile at https://spark.apache.org/docs/latest/building-with-maven.html Sincerely, DB Tsai

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
oh, I meant to say cdh5.1.3 used by Jakub's company is based on 2.3. You can see it from the first part of the Cloudera's version number - 2.3.0-cdh 5.1.3. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Hi Xiangrui, It seems that it's stateless so will be hard to implement regularization path. Any suggestion to extend it? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[M] can be overwritten to implement regularization path. Correct me if I'm wrong. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

Re: Maximum size of vector that reduce can handle

2015-01-23 Thread DB Tsai
are small. By default, depth 2 is used, so if you have so many partitions of large vector, this may still cause issue. You can increase the depth into higher numbers such that in the final reduce in driver, the number of partitions are very small. Sincerely, DB Tsai

Re: LinearRegressionWithSGD accuracy

2015-01-17 Thread DB Tsai
I'm working on LinearRegressionWithElasticNet using OWLQN now. This will do the data standardization internally so it's transparent to users. With OWLQN, you don't have to manually choose stepSize. Will send out PR soon next week. Sincerely, DB Tsai

Re: LinearRegressionWithSGD accuracy

2015-01-28 Thread DB Tsai
Hi Robin, You can try this PR out. This has built-in features scaling, and has ElasticNet regularization (L1/L2 mix). This implementation can stably converge to model from R's glmnet package. https://github.com/apache/spark/pull/4259 Sincerely, DB Tsai

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-15 Thread DB Tsai
It's a bug in breeze's side. Once David fixes it and publishes it to maven, we can upgrade to breeze 0.11.2. Please file a jira ticket for this issue. thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Sun, Mar 15, 2015 at 12:45

Re: LogisticGradient Design

2015-03-25 Thread DB Tsai
to avoid the second cache. In this case, the code will be more complicated, so I will split the code into two paths. Will be done in another PR. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Mar 25, 2015 at 11:57 AM, Joseph Bradley

Re: Regularization in MLlib

2015-04-14 Thread DB Tsai
Hi Theodore, I'm currently working on elastic-net regression in ML framework, and I decided not to have any extra layer of abstraction for now but focus on accuracy and performance. We may come out with proper solution later. Any idea is welcome. Sincerely, DB Tsai

Re: Regularization in MLlib

2015-04-07 Thread DB Tsai
. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Tue, Apr 7, 2015 at 3:03 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Could anyone elaborate on the regularization in Spark? I've found that L1 and L2 are implemented

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread DB Tsai
Is your HDP implementation based on distributed gibbs sampling? Thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao yuhao.y...@intel.com wrote: Hi Lorenz, I’m trying to build

Re: spark packages

2015-05-23 Thread DB Tsai
I thought LGPL is okay but GPL is not okay for Apache project. On Saturday, May 23, 2015, Patrick Wendell pwend...@gmail.com wrote: Yes - spark packages can include non ASF licenses. On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com javascript:; wrote: Hi, Is it

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread DB Tsai
There is a JIRA for this. I know Holden is interested in this. On Thursday, October 22, 2015, YiZhi Liu wrote: > Would someone mind giving some hint? > > 2015-10-20 15:34 GMT+08:00 YiZhi Liu >: > > Hi all, > > > > I noticed that in

Re: Spark Implementation of XGBoost

2015-10-27 Thread DB Tsai
shrinkage). Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> wrote: > Hi DB Tsai, > > Thank you very much fo

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Also, does it support categorical feature? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai <dbt...@dbtsai.com> wrote: > Interesting. For feature sub-sampling,

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Interesting. For feature sub-sampling, is it per-node or per-tree? Do you think you can implement generic GBM and have it merged as part of Spark codebase? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon

Re: [Spark MLlib] about linear regression issue

2015-11-01 Thread DB Tsai
ear regression, but currently, there is no open source implementation in Spark. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Dear All,

Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread DB Tsai
those code to share more.) Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> On Mon, Oct 12, 2015 at 1:24 AM, YiZhi Liu <javeli...@gmail.com>

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
maintenance cost. Once it's getting mature, and people are asking for them, we will gradually make them public. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sat, Nov 28, 2015 at 5:20 AM, Sasaki Kai

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
I used reflection initially, but I found it's very slow especially in a tight loop. Maybe caching the reflection can help which I never try. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Nov 30, 2015

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-06 Thread DB Tsai
+1 for renaming the jar file. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly <ch...@fregly.com> wrote: > perhaps renaming to Spark ML would actually clea

Re: welcoming Xiao Li as a committer

2016-10-05 Thread DB Tsai
Congrats, Xiao! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x9DCC1DBD7FC7BBB2 On Wed, Oct 5, 2016 at 2:36 PM, Fred Reiss <freiss@gmail.com> wrote: > Congratulations, Xiao! > > Fred > > > On

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-10 Thread DB Tsai
-1 I think that back-porting SPARK-20270 <https://github.com/apache/spark/pull/17577> and SPARK-18555 <https://github.com/apache/spark/pull/15994> are very important since it's a critical bug that na.fill will mess up the data in Long even the data isn't null. Thanks. Sincere

Re: Welcoming Tejas Patil as a Spark committer

2017-10-06 Thread DB Tsai
Congratulations! On Wed, Oct 4, 2017 at 6:55 PM, Liwei Lin wrote: > Congratulations! > > Cheers, > Liwei > > On Wed, Oct 4, 2017 at 2:27 PM, Yuval Itzchakov wrote: >> >> Congratulations and Good luck! :) >> >> >> >> -- >> Sent from:

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-06 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Fri, Oct 6, 2017 at 7:46 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Thanks Nick, Hyukjin. Yes this seems to be a longer stand

Re: Will higher order functions in spark SQL be pushed upstream?

2017-10-10 Thread DB Tsai
gt; datatypes? >> >> >> For parquet, this effort is primarily tracked via SPARK-4502 (see >> https://github.com/apache/spark/pull/16578) and is currently targeted for >> 2.3. -- Sincerely, DB Tsai -- PGP Key ID: 0x5CED8B896A6BDFA0 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
a blocker for us to move to newer version of Scala 2.12.x since the newer version of Scala 2.12.x has the same issue. In my opinion, Scala should fix the root cause and provide a stable hook for 3rd party developers to initialize their custom code. DB Tsai | Siri Open Source Technologies

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
ark context Web UI available at http://192.168.1.169:4040 Spark context available as 'sc' (master = local[*], app id = local-1528180279528). Spark session available as 'spark’. scala> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Jun 7, 2018, at 5:49 P

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
4 PM, Holden Karau >> wrote: >> > I agree that's a little odd, could we not add the bacspace terminal >> > character? Regardless even if not, I don't think that should be a >> blocker >> > for 2.12 support especially since it doesn't degrade the 2.11 >&g

Re: [MLLib] Logistic Regression and standadization

2018-04-24 Thread DB Tsai
, and the result should match R. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Apr 20, 2018, at 5:56 PM, Weichen Xu <weichen...@databricks.com> wrote: > > Right. If regularization item isn't zero, then enable/disable standardization > will ge

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread DB Tsai
I'll +1 on removing those legacy mllib code. Many users are confused about the APIs, and some of them have weird behaviors (for example, in gradient descent, the intercept is regularized which supports not to). DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread DB Tsai
are selected simultaneously. https://issues.apache.org/jira/browse/SPARK-25879 If we decide to not fix it in 2.4, we should at least document it in the release note to let users know. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: Java 11 support

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I think the only realistic option is to only support and test LTS JDK. I'll send out two separate emails to dev to facilitate the discussion. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 9:47 AM, sh

Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
to work on bugs and issues that we may run into. What do you think? Thanks, DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I feel the only realistic option is to only test and support JDK such as JDK 11 LTS and future LTS release. I would like to have a discussion on this in Spark community. Thanks, DB Tsai | Siri Open Source Technologies [not a contribution

Re: Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
OpenJDK will follow Oracle's release cycle, https://openjdk.java.net/projects/jdk/ <https://openjdk.java.net/projects/jdk/>, a strict six months model. I'm not familiar with other non-Oracle VMs and Redhat support. DB Tsai | Siri Open Source Technologies [not a contribution] |  Appl

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
Ideally, supporting only Scala 2.12 in Spark 3 will be ideal. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 2:55 PM, Felix Cheung wrote: > > So to clarify, only scala 2.12 is supported in Spark 3? > > > From: Ryan Blu

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
agree with Sean that this can make the decencies really complicated; hence I support to drop Scala 2.11 in Spark 3.0 directly. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 11:38 AM, Sean Owen wrote: > > I think we should make S

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread DB Tsai
if we want to change the alternative Scala version to 2.13 and drop 2.11 if we just want to support two Scala versions at one time. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Wed, Nov 7, 2018

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread DB Tsai
-1 Agreed with Anton that this bug will potentially corrupt the data silently. As he is ready to submit a PR, I'll suggest to wait to include the fix. Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0

Re: Automated formatting

2018-11-21 Thread DB Tsai
I like the idea of checking only the diff. Even I am sometimes confused about the right style in Spark since I am working on multiple projects with slightly different coding styles. On Wed, Nov 21, 2018 at 1:36 PM Sean Owen wrote: > I know the PR builder runs SBT, but I presume this would just

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-21 Thread DB Tsai
+1 on removing Scala 2.11 support for 3.0 given Scala 2.11 is already EOL. On Tue, Nov 20, 2018 at 2:53 PM Sean Owen wrote: > PS: pull request at https://github.com/apache/spark/pull/23098 > Not going to merge it until there's clear agreement. > > On Tue, Nov 20, 2018 at 10:16 AM Ryan Blue

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-08 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.2.3. >

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
RC9 was just cut. Will send out another thread once the build is finished. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Mar 25, 2019 at 5:10 PM Sean Owen wrote: > > That's all merged now. I

[VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-27 Thread DB Tsai
typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Te

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-24 Thread DB Tsai
Hello Sean, By looking at SPARK-26961 PR, seems it's ready to go. Do you think we can merge it into 2.4 branch soon? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Mar 23, 2019 at 12:04 PM Sean Owen

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-04 Thread DB Tsai
. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

  1   2   >