Re: Shouldn't SparseVector constructor give error when declared number of elements less than array lenght?

2015-07-23 Thread Andrew Vykhodtsev
Hi Manoj,

Done.

https://issues.apache.org/jira/browse/SPARK-9277

On Thu, Jul 23, 2015 at 1:02 PM, Manoj Kumar  wrote:

> Hi,
>
> I think this should raise an error both in the scala code and python API.
>
> Please open a JIRA.
>
> On Thu, Jul 23, 2015 at 4:22 PM, Andrew Vykhodtsev 
> wrote:
>
>> Dear Developers,
>>
>> I found that one can create SparseVector inconsistently and it will lead
>> to an Java error in runtime, for example when training
>> LogisticRegressionWithSGD.
>>
>> Here is the test case:
>>
>>
>> In [2]:
>> sc.version
>> Out[2]:
>> u'1.3.1'
>> In [13]:
>> from pyspark.mllib.linalg import SparseVector
>> from pyspark.mllib.regression import LabeledPoint
>> from pyspark.mllib.classification import LogisticRegressionWithSGD
>> In [3]:
>> x =  SparseVector(2, {1:1, 2:2, 3:3, 4:4, 5:5})
>> In [10]:
>> l = LabeledPoint(0, x)
>> In [12]:
>> r = sc.parallelize([l])
>> In [14]:
>> m = LogisticRegressionWithSGD.train(r)
>>
>> Error:
>>
>>
>> Py4JJavaError: An error occurred while calling 
>> o86.trainLogisticRegressionModelWithSGD.
>> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 
>> in stage 11.0 failed 1 times, most recent failure: Lost task 7.0 in stage 
>> 11.0 (TID 47, localhost): *java.lang.ArrayIndexOutOfBoundsException: 2*
>>
>>
>>
>> Attached is the notebook with the scenario and the full message:
>>
>>
>>
>> Should I raise a JIRA for this (forgive me if there is such a JIRA and I did 
>> not notice it)
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>
>
> --
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> 
> http://github.com/MechCoder
>


Re: Shouldn't SparseVector constructor give error when declared number of elements less than array lenght?

2015-07-23 Thread Manoj Kumar
Hi,

I think this should raise an error both in the scala code and python API.

Please open a JIRA.

On Thu, Jul 23, 2015 at 4:22 PM, Andrew Vykhodtsev  wrote:

> Dear Developers,
>
> I found that one can create SparseVector inconsistently and it will lead
> to an Java error in runtime, for example when training
> LogisticRegressionWithSGD.
>
> Here is the test case:
>
>
> In [2]:
> sc.version
> Out[2]:
> u'1.3.1'
> In [13]:
> from pyspark.mllib.linalg import SparseVector
> from pyspark.mllib.regression import LabeledPoint
> from pyspark.mllib.classification import LogisticRegressionWithSGD
> In [3]:
> x =  SparseVector(2, {1:1, 2:2, 3:3, 4:4, 5:5})
> In [10]:
> l = LabeledPoint(0, x)
> In [12]:
> r = sc.parallelize([l])
> In [14]:
> m = LogisticRegressionWithSGD.train(r)
>
> Error:
>
>
> Py4JJavaError: An error occurred while calling 
> o86.trainLogisticRegressionModelWithSGD.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 
> in stage 11.0 failed 1 times, most recent failure: Lost task 7.0 in stage 
> 11.0 (TID 47, localhost): *java.lang.ArrayIndexOutOfBoundsException: 2*
>
>
>
> Attached is the notebook with the scenario and the full message:
>
>
>
> Should I raise a JIRA for this (forgive me if there is such a JIRA and I did 
> not notice it)
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>



-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com

http://github.com/MechCoder