RE: Spark MLlib: MultilayerPerceptronClassifier error?
Hi Alexander, I used the same example from MLP user guide but on Java language. I modified an example a little bit and there was my nasty bug that I haven’t noticed (some inconsistence between layers and real feature count). After fixing that MLP works on my test. So here is my inadvertence, sorry. Best regards, Mikhail Shiryaev From: Ulanov, Alexander [mailto:alexander.ula...@hpe.com] Sent: Tuesday, July 5, 2016 9:32 PM To: Yanbo Liang <yblia...@gmail.com>; Shiryaev, Mikhail <mikhail.shiry...@intel.com> Cc: user@spark.apache.org Subject: RE: Spark MLlib: MultilayerPerceptronClassifier error? Hi Mikhail, I have followed the MLP user-guide and used the dataset and network configuration you mentioned. MLP was trained without any issues with default parameters, that is block size of 128 and 100 iterations. Source code: scala> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) scala> val data = sqlContext.read.format("libsvm").load("/data/aloi.scale") scala> val trainer = new MultilayerPerceptronClassifier().setLayers(Array(128, 128, 1000)) scala> val model = trainer.fit(data) (after a while) model: org.apache.spark.ml.classification.MultilayerPerceptronClassificationMode l = mlpc_fb3bd70d2ef2 It seems that submitting an Issue is premature. Could you share your code instead? Best regards, Alexander Just in case, here is the link to the user guide: https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier From: Yanbo Liang [mailto:yblia...@gmail.com] Sent: Monday, July 04, 2016 9:58 PM To: mshiryae <mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark MLlib: MultilayerPerceptronClassifier error? Would you mind to file a JIRA to track this issue? I will take a look when I have time. 2016-07-04 14:09 GMT-07:00 mshiryae <mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>>: Hi, I am trying to train model by MultilayerPerceptronClassifier. It works on sample data from data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 classes and layers [4, 4, 3]. But when I try to use other input files with other features and classes (from here for example: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) then I get errors. Example: Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): with block size = 1: ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to Infinity ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved? with default block size = 128: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624) Even if I modify sample_multiclass_classification_data.txt file (rename all 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same errors as for file above. So to resume: I can't run training with default block size and with more than 4 features. If I set block size to 1 then some actions are happened but I get errors from LBFGS. It is reproducible with Spark 1.5.2 and from master branch on github (from 4-th July). Did somebody already met with such behavior? Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> Joint Stock Company Intel A/O Registered legal address: Krylatsky Hills Business Park, 17 Krylatskaya Str., Bldg 4, Moscow 121614, Russian Federation This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
RE: Spark MLlib: MultilayerPerceptronClassifier error?
Hi Mikhail, I have followed the MLP user-guide and used the dataset and network configuration you mentioned. MLP was trained without any issues with default parameters, that is block size of 128 and 100 iterations. Source code: scala> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) scala> val data = sqlContext.read.format("libsvm").load("/data/aloi.scale") scala> val trainer = new MultilayerPerceptronClassifier().setLayers(Array(128, 128, 1000)) scala> val model = trainer.fit(data) (after a while) model: org.apache.spark.ml.classification.MultilayerPerceptronClassificationMode l = mlpc_fb3bd70d2ef2 It seems that submitting an Issue is premature. Could you share your code instead? Best regards, Alexander Just in case, here is the link to the user guide: https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier From: Yanbo Liang [mailto:yblia...@gmail.com] Sent: Monday, July 04, 2016 9:58 PM To: mshiryae <mikhail.shiry...@intel.com> Cc: user@spark.apache.org Subject: Re: Spark MLlib: MultilayerPerceptronClassifier error? Would you mind to file a JIRA to track this issue? I will take a look when I have time. 2016-07-04 14:09 GMT-07:00 mshiryae <mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>>: Hi, I am trying to train model by MultilayerPerceptronClassifier. It works on sample data from data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 classes and layers [4, 4, 3]. But when I try to use other input files with other features and classes (from here for example: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) then I get errors. Example: Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): with block size = 1: ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to Infinity ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved? with default block size = 128: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624) Even if I modify sample_multiclass_classification_data.txt file (rename all 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same errors as for file above. So to resume: I can't run training with default block size and with more than 4 features. If I set block size to 1 then some actions are happened but I get errors from LBFGS. It is reproducible with Spark 1.5.2 and from master branch on github (from 4-th July). Did somebody already met with such behavior? Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
Re: Spark MLlib: MultilayerPerceptronClassifier error?
Would you mind to file a JIRA to track this issue? I will take a look when I have time. 2016-07-04 14:09 GMT-07:00 mshiryae <mikhail.shiry...@intel.com>: > Hi, > > I am trying to train model by MultilayerPerceptronClassifier. > > It works on sample data from > data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 > classes and layers [4, 4, 3]. > But when I try to use other input files with other features and classes > (from here for example: > https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) > then I get errors. > > Example: > Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): > > > with block size = 1: > ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. > Decreasing step size to Infinity > ERROR LBFGS: Failure! Resetting history: > breeze.optimize.FirstOrderException: Line search failed > ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is > just poorly behaved? > > > with default block size = 128: > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629) > at > > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628) >at scala.collection.immutable.List.foreach(List.scala:381) >at > > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628) >at > > org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624) > > > > Even if I modify sample_multiclass_classification_data.txt file (rename all > 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the > same errors as for file above. > > > So to resume: > I can't run training with default block size and with more than 4 features. > If I set block size to 1 then some actions are happened but I get errors > from LBFGS. > It is reproducible with Spark 1.5.2 and from master branch on github (from > 4-th July). > > Did somebody already met with such behavior? > Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Spark MLlib: MultilayerPerceptronClassifier error?
Hi, I am trying to train model by MultilayerPerceptronClassifier. It works on sample data from data/mllib/sample_multiclass_classification_data.txt with 4 features, 3 classes and layers [4, 4, 3]. But when I try to use other input files with other features and classes (from here for example: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) then I get errors. Example: Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]): with block size = 1: ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to Infinity ERROR LBFGS: Failure! Resetting history: breeze.optimize.FirstOrderException: Line search failed ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is just poorly behaved? with default block size = 128: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628) at org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624) Even if I modify sample_multiclass_classification_data.txt file (rename all 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the same errors as for file above. So to resume: I can't run training with default block size and with more than 4 features. If I set block size to 1 then some actions are happened but I get errors from LBFGS. It is reproducible with Spark 1.5.2 and from master branch on github (from 4-th July). Did somebody already met with such behavior? Is there bug in MultilayerPerceptronClassifier or I use it incorrectly? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org