RE: Spark MLlib: MultilayerPerceptronClassifier error?

2016-07-05 Thread Shiryaev, Mikhail
Hi Alexander,

I used the same example from MLP user guide but on Java language.
I modified an example a little bit and there was my nasty bug that I haven’t 
noticed (some inconsistence between layers and real feature count).
After fixing that MLP works on my test.
So here is my inadvertence, sorry.

Best regards,
Mikhail Shiryaev

From: Ulanov, Alexander [mailto:alexander.ula...@hpe.com]
Sent: Tuesday, July 5, 2016 9:32 PM
To: Yanbo Liang <yblia...@gmail.com>; Shiryaev, Mikhail 
<mikhail.shiry...@intel.com>
Cc: user@spark.apache.org
Subject: RE: Spark MLlib: MultilayerPerceptronClassifier error?

Hi Mikhail,

I have followed the MLP user-guide and used the dataset and network 
configuration you mentioned. MLP was trained without any issues with default 
parameters, that is block size of 128 and 100 iterations.

Source code:
scala> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val data = sqlContext.read.format("libsvm").load("/data/aloi.scale")
scala> val trainer = new MultilayerPerceptronClassifier().setLayers(Array(128, 
128, 1000))
scala> val model = trainer.fit(data)
(after a while)
model: org.apache.spark.ml.classification.MultilayerPerceptronClassificationMode
l = mlpc_fb3bd70d2ef2

It seems that submitting an Issue is premature. Could you share your code 
instead?

Best regards, Alexander

Just in case, here is the link to the user guide:
https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier


From: Yanbo Liang [mailto:yblia...@gmail.com]
Sent: Monday, July 04, 2016 9:58 PM
To: mshiryae <mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark MLlib: MultilayerPerceptronClassifier error?

Would you mind to file a JIRA to track this issue? I will take a look when I 
have time.

2016-07-04 14:09 GMT-07:00 mshiryae 
<mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>>:
Hi,

I am trying to train model by MultilayerPerceptronClassifier.

It works on sample data from
data/mllib/sample_multiclass_classification_data.txt with 4 features, 3
classes and layers [4, 4, 3].
But when I try to use other input files with other features and classes
(from here for example:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html)
then I get errors.

Example:
Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]):


with block size = 1:
ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation.
Decreasing step size to Infinity
ERROR LBFGS: Failure! Resetting history:
breeze.optimize.FirstOrderException: Line search failed
ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is
just poorly behaved?


with default block size = 128:
 java.lang.ArrayIndexOutOfBoundsException
  at java.lang.System.arraycopy(Native Method)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
   at scala.collection.immutable.List.foreach(List.scala:381)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)



Even if I modify sample_multiclass_classification_data.txt file (rename all
4-th features to 5-th) and run with layers [5, 5, 3] then I also get the
same errors as for file above.


So to resume:
I can't run training with default block size and with more than 4 features.
If I set  block size to 1 then some actions are happened but I get errors
from LBFGS.
It is reproducible with Spark 1.5.2 and from master branch on github (from
4-th July).

Did somebody already met with such behavior?
Is there bug in MultilayerPerceptronClassifier or I use it incorrectly?

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>



Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


RE: Spark MLlib: MultilayerPerceptronClassifier error?

2016-07-05 Thread Ulanov, Alexander
Hi Mikhail,

I have followed the MLP user-guide and used the dataset and network 
configuration you mentioned. MLP was trained without any issues with default 
parameters, that is block size of 128 and 100 iterations.

Source code:
scala> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val data = sqlContext.read.format("libsvm").load("/data/aloi.scale")
scala> val trainer = new MultilayerPerceptronClassifier().setLayers(Array(128, 
128, 1000))
scala> val model = trainer.fit(data)
(after a while)
model: org.apache.spark.ml.classification.MultilayerPerceptronClassificationMode
l = mlpc_fb3bd70d2ef2

It seems that submitting an Issue is premature. Could you share your code 
instead?

Best regards, Alexander

Just in case, here is the link to the user guide:
https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier


From: Yanbo Liang [mailto:yblia...@gmail.com]
Sent: Monday, July 04, 2016 9:58 PM
To: mshiryae <mikhail.shiry...@intel.com>
Cc: user@spark.apache.org
Subject: Re: Spark MLlib: MultilayerPerceptronClassifier error?

Would you mind to file a JIRA to track this issue? I will take a look when I 
have time.

2016-07-04 14:09 GMT-07:00 mshiryae 
<mikhail.shiry...@intel.com<mailto:mikhail.shiry...@intel.com>>:
Hi,

I am trying to train model by MultilayerPerceptronClassifier.

It works on sample data from
data/mllib/sample_multiclass_classification_data.txt with 4 features, 3
classes and layers [4, 4, 3].
But when I try to use other input files with other features and classes
(from here for example:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html)
then I get errors.

Example:
Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]):


with block size = 1:
ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation.
Decreasing step size to Infinity
ERROR LBFGS: Failure! Resetting history:
breeze.optimize.FirstOrderException: Line search failed
ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is
just poorly behaved?


with default block size = 128:
 java.lang.ArrayIndexOutOfBoundsException
  at java.lang.System.arraycopy(Native Method)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
   at scala.collection.immutable.List.foreach(List.scala:381)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)



Even if I modify sample_multiclass_classification_data.txt file (rename all
4-th features to 5-th) and run with layers [5, 5, 3] then I also get the
same errors as for file above.


So to resume:
I can't run training with default block size and with more than 4 features.
If I set  block size to 1 then some actions are happened but I get errors
from LBFGS.
It is reproducible with Spark 1.5.2 and from master branch on github (from
4-th July).

Did somebody already met with such behavior?
Is there bug in MultilayerPerceptronClassifier or I use it incorrectly?

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>



Re: Spark MLlib: MultilayerPerceptronClassifier error?

2016-07-04 Thread Yanbo Liang
Would you mind to file a JIRA to track this issue? I will take a look when
I have time.

2016-07-04 14:09 GMT-07:00 mshiryae <mikhail.shiry...@intel.com>:

> Hi,
>
> I am trying to train model by MultilayerPerceptronClassifier.
>
> It works on sample data from
> data/mllib/sample_multiclass_classification_data.txt with 4 features, 3
> classes and layers [4, 4, 3].
> But when I try to use other input files with other features and classes
> (from here for example:
> https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html)
> then I get errors.
>
> Example:
> Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]):
>
>
> with block size = 1:
> ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation.
> Decreasing step size to Infinity
> ERROR LBFGS: Failure! Resetting history:
> breeze.optimize.FirstOrderException: Line search failed
> ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is
> just poorly behaved?
>
>
> with default block size = 128:
>  java.lang.ArrayIndexOutOfBoundsException
>   at java.lang.System.arraycopy(Native Method)
>   at
>
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
>   at
>
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
>at scala.collection.immutable.List.foreach(List.scala:381)
>at
>
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
>at
>
> org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)
>
>
>
> Even if I modify sample_multiclass_classification_data.txt file (rename all
> 4-th features to 5-th) and run with layers [5, 5, 3] then I also get the
> same errors as for file above.
>
>
> So to resume:
> I can't run training with default block size and with more than 4 features.
> If I set  block size to 1 then some actions are happened but I get errors
> from LBFGS.
> It is reproducible with Spark 1.5.2 and from master branch on github (from
> 4-th July).
>
> Did somebody already met with such behavior?
> Is there bug in MultilayerPerceptronClassifier or I use it incorrectly?
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark MLlib: MultilayerPerceptronClassifier error?

2016-07-04 Thread mshiryae
Hi,

I am trying to train model by MultilayerPerceptronClassifier.

It works on sample data from
data/mllib/sample_multiclass_classification_data.txt with 4 features, 3
classes and layers [4, 4, 3].
But when I try to use other input files with other features and classes
(from here for example:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html)
then I get errors.

Example:
Input file aloi (128 features, 1000 classes, layers [128, 128, 1000]):


with block size = 1:
ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation.
Decreasing step size to Infinity
ERROR LBFGS: Failure! Resetting history:
breeze.optimize.FirstOrderException: Line search failed
ERROR LBFGS: Failure again! Giving up and returning. Maybe the objective is
just poorly behaved?


with default block size = 128:
 java.lang.ArrayIndexOutOfBoundsException
  at java.lang.System.arraycopy(Native Method)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:629)
  at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3$$anonfun$apply$4.apply(Layer.scala:628)
   at scala.collection.immutable.List.foreach(List.scala:381)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:628)
   at
org.apache.spark.ml.ann.DataStacker$$anonfun$3$$anonfun$apply$3.apply(Layer.scala:624)



Even if I modify sample_multiclass_classification_data.txt file (rename all
4-th features to 5-th) and run with layers [5, 5, 3] then I also get the
same errors as for file above.


So to resume:
I can't run training with default block size and with more than 4 features.
If I set  block size to 1 then some actions are happened but I get errors
from LBFGS.
It is reproducible with Spark 1.5.2 and from master branch on github (from
4-th July).

Did somebody already met with such behavior?
Is there bug in MultilayerPerceptronClassifier or I use it incorrectly?

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLlib-MultilayerPerceptronClassifier-error-tp27279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org