Re: Streaming K-means not printing predictions

2016-04-28 Thread Ashutosh Kumar
It is reading the files now but throws another error complaining vector
sizes does not match. I saw this error reported on stack trace .

http://stackoverflow.com/questions/30737361/getting-java-lang-illegalargumentexception-requirement-failed-while-calling-spa

Also example given in scala model.setRandomCenters takes two arguments ,
where as java method needs 3 ?

Any clues ?
Thanks
Ashutosh


On Wed, Apr 27, 2016 at 9:59 PM, Ashutosh Kumar 
wrote:

> The problem seems to be streamconxt.textFileStream(path) is not reading
> the file at all. It does not throw any exception also. I tried some tricks
> given in mailing lists  like copying the file to specified directory  after
> start of program, touching the file to change timestamp etc but no luck.
>
> Thanks
> Ashutosh
>
>
>
> On Wed, Apr 27, 2016 at 2:43 PM, Niki Pavlopoulou  wrote:
>
>> One of the reasons that happened to me (assuming everything is ok on your
>> streaming process), is if you run it on local mode instead of local[*] use
>> local[4].
>>
>> On 26 April 2016 at 15:10, Ashutosh Kumar 
>> wrote:
>>
>>> I created a Streaming k means based on scala example. It keeps running
>>> without any error but never prints predictions
>>>
>>> Here is Log
>>>
>>> 19:15:05,050 INFO
>>> org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
>>> batch metadata: 146167824 ms
>>> 19:15:10,001 INFO
>>> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
>>> files took 1 ms
>>> 19:15:10,001 INFO
>>> org.apache.spark.streaming.dstream.FileInputDStream   - New files
>>> at time 146167831 ms:
>>>
>>> 19:15:10,007 INFO
>>> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
>>> files took 2 ms
>>> 19:15:10,007 INFO
>>> org.apache.spark.streaming.dstream.FileInputDStream   - New files
>>> at time 146167831 ms:
>>>
>>> 19:15:10,014 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Added jobs
>>> for time 146167831 ms
>>> 19:15:10,015 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Starting
>>> job streaming job 146167831 ms.0 from job set of time 146167831 ms
>>> 19:15:10,028 INFO
>>> org.apache.spark.SparkContext - Starting
>>> job: collect at StreamingKMeans.scala:89
>>> 19:15:10,028 INFO
>>> org.apache.spark.scheduler.DAGScheduler   - Job 292
>>> finished: collect at StreamingKMeans.scala:89, took 0.41 s
>>> 19:15:10,029 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Finished
>>> job streaming job 146167831 ms.0 from job set of time 146167831 ms
>>> 19:15:10,029 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Starting
>>> job streaming job 146167831 ms.1 from job set of time 146167831 ms
>>> ---
>>> Time: 146167831 ms
>>> ---
>>>
>>> 19:15:10,036 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Finished
>>> job streaming job 146167831 ms.1 from job set of time 146167831 ms
>>> 19:15:10,036 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2912 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2911 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.storage.BlockManager - Removing
>>> RDD 2912
>>> 19:15:10,037 INFO
>>> org.apache.spark.streaming.scheduler.JobScheduler - Total
>>> delay: 0.036 s for time 146167831 ms (execution: 0.021 s)
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.UnionRDD - Removing
>>> RDD 2800 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.storage.BlockManager - Removing
>>> RDD 2911
>>> 19:15:10,037 INFO
>>> org.apache.spark.streaming.dstream.FileInputDStream   - Cleared 1
>>> old files that were older than 146167825 ms: 1461678245000 ms
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2917 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.storage.BlockManager - Removing
>>> RDD 2800
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2916 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2915 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>>> RDD 2914 from persistence list
>>> 19:15:10,037 INFO
>>> org.apache.spark.rdd.UnionRDD - Removing
>>> RDD 2803 from persistence list
>>> 19:15:10,037 INFO
>>> 

Re: Streaming K-means not printing predictions

2016-04-27 Thread Ashutosh Kumar
The problem seems to be streamconxt.textFileStream(path) is not reading the
file at all. It does not throw any exception also. I tried some tricks
given in mailing lists  like copying the file to specified directory  after
start of program, touching the file to change timestamp etc but no luck.

Thanks
Ashutosh


On Wed, Apr 27, 2016 at 2:43 PM, Niki Pavlopoulou  wrote:

> One of the reasons that happened to me (assuming everything is ok on your
> streaming process), is if you run it on local mode instead of local[*] use
> local[4].
>
> On 26 April 2016 at 15:10, Ashutosh Kumar 
> wrote:
>
>> I created a Streaming k means based on scala example. It keeps running
>> without any error but never prints predictions
>>
>> Here is Log
>>
>> 19:15:05,050 INFO
>> org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
>> batch metadata: 146167824 ms
>> 19:15:10,001 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
>> files took 1 ms
>> 19:15:10,001 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - New files
>> at time 146167831 ms:
>>
>> 19:15:10,007 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
>> files took 2 ms
>> 19:15:10,007 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - New files
>> at time 146167831 ms:
>>
>> 19:15:10,014 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Added jobs
>> for time 146167831 ms
>> 19:15:10,015 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Starting
>> job streaming job 146167831 ms.0 from job set of time 146167831 ms
>> 19:15:10,028 INFO
>> org.apache.spark.SparkContext - Starting
>> job: collect at StreamingKMeans.scala:89
>> 19:15:10,028 INFO
>> org.apache.spark.scheduler.DAGScheduler   - Job 292
>> finished: collect at StreamingKMeans.scala:89, took 0.41 s
>> 19:15:10,029 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Finished
>> job streaming job 146167831 ms.0 from job set of time 146167831 ms
>> 19:15:10,029 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Starting
>> job streaming job 146167831 ms.1 from job set of time 146167831 ms
>> ---
>> Time: 146167831 ms
>> ---
>>
>> 19:15:10,036 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Finished
>> job streaming job 146167831 ms.1 from job set of time 146167831 ms
>> 19:15:10,036 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2912 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2911 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.storage.BlockManager - Removing
>> RDD 2912
>> 19:15:10,037 INFO
>> org.apache.spark.streaming.scheduler.JobScheduler - Total
>> delay: 0.036 s for time 146167831 ms (execution: 0.021 s)
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.UnionRDD - Removing
>> RDD 2800 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.storage.BlockManager - Removing
>> RDD 2911
>> 19:15:10,037 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - Cleared 1
>> old files that were older than 146167825 ms: 1461678245000 ms
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2917 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.storage.BlockManager - Removing
>> RDD 2800
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2916 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2915 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.MapPartitionsRDD - Removing
>> RDD 2914 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.rdd.UnionRDD - Removing
>> RDD 2803 from persistence list
>> 19:15:10,037 INFO
>> org.apache.spark.streaming.dstream.FileInputDStream   - Cleared 1
>> old files that were older than 146167825 ms: 1461678245000 ms
>> 19:15:10,038 INFO
>> org.apache.spark.streaming.scheduler.ReceivedBlockTracker - Deleting
>> batches ArrayBuffer()
>> 19:15:10,038 INFO
>> org.apache.spark.storage.BlockManager - Removing
>> RDD 2917
>> 19:15:10,038 INFO
>> org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
>> batch metadata: 1461678245000 ms
>> 19:15:10,038 INFO
>> org.apache.spark.storage.BlockManager - 

Re: Streaming K-means not printing predictions

2016-04-26 Thread Prashant Sharma
Since you are reading from file stream, I would suggest instead of printing
try to save it on a file. There may be output the first time and then no
data in subsequent iterations.

Prashant Sharma



On Tue, Apr 26, 2016 at 7:40 PM, Ashutosh Kumar 
wrote:

> I created a Streaming k means based on scala example. It keeps running
> without any error but never prints predictions
>
> Here is Log
>
> 19:15:05,050 INFO
> org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
> batch metadata: 146167824 ms
> 19:15:10,001 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
> files took 1 ms
> 19:15:10,001 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - New files
> at time 146167831 ms:
>
> 19:15:10,007 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
> files took 2 ms
> 19:15:10,007 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - New files
> at time 146167831 ms:
>
> 19:15:10,014 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Added jobs
> for time 146167831 ms
> 19:15:10,015 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Starting
> job streaming job 146167831 ms.0 from job set of time 146167831 ms
> 19:15:10,028 INFO
> org.apache.spark.SparkContext - Starting
> job: collect at StreamingKMeans.scala:89
> 19:15:10,028 INFO
> org.apache.spark.scheduler.DAGScheduler   - Job 292
> finished: collect at StreamingKMeans.scala:89, took 0.41 s
> 19:15:10,029 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Finished
> job streaming job 146167831 ms.0 from job set of time 146167831 ms
> 19:15:10,029 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Starting
> job streaming job 146167831 ms.1 from job set of time 146167831 ms
> ---
> Time: 146167831 ms
> ---
>
> 19:15:10,036 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Finished
> job streaming job 146167831 ms.1 from job set of time 146167831 ms
> 19:15:10,036 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2912 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2911 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2912
> 19:15:10,037 INFO
> org.apache.spark.streaming.scheduler.JobScheduler - Total
> delay: 0.036 s for time 146167831 ms (execution: 0.021 s)
> 19:15:10,037 INFO
> org.apache.spark.rdd.UnionRDD - Removing
> RDD 2800 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2911
> 19:15:10,037 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - Cleared 1
> old files that were older than 146167825 ms: 1461678245000 ms
> 19:15:10,037 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2917 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2800
> 19:15:10,037 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2916 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2915 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.rdd.MapPartitionsRDD - Removing
> RDD 2914 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.rdd.UnionRDD - Removing
> RDD 2803 from persistence list
> 19:15:10,037 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - Cleared 1
> old files that were older than 146167825 ms: 1461678245000 ms
> 19:15:10,038 INFO
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker - Deleting
> batches ArrayBuffer()
> 19:15:10,038 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2917
> 19:15:10,038 INFO
> org.apache.spark.streaming.scheduler.InputInfoTracker - remove old
> batch metadata: 1461678245000 ms
> 19:15:10,038 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2914
> 19:15:10,038 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2916
> 19:15:10,038 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2915
> 19:15:10,038 INFO
> org.apache.spark.storage.BlockManager - Removing
> RDD 2803
> 19:15:15,001 INFO
> org.apache.spark.streaming.dstream.FileInputDStream   - Finding new
> files took 1 ms
>