[jira] [Comment Edited] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675
 ] 

Tsai Li Ming edited comment on SPARK-15039 at 5/3/16 1:16 PM:
--

[~zsxwing] Nothing suspicious in the logs. The streaming tab has 1 receiver but 
has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 

I will test in Standalone mode.


was (Author: ltsai):
[~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver 
but has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 



> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-05-03 Thread Tsai Li Ming (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675
 ] 

Tsai Li Ming commented on SPARK-15039:
--

[~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver 
but has 0 events/sec

[~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. 



> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsai Li Ming updated SPARK-15039:
-
Description: 
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 

Not sure whether this is related to SPARK-12453

  was:
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 


> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 
> Not sure whether this is related to SPARK-12453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsai Li Ming updated SPARK-15039:
-
Description: 
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working fine when using 
local mode. 

{code}
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
{code}

I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 

  was:
Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working when using 
local mode. 

```
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
```

I had to downgrade the package to 1.5.1 before it can work. 


> Kinesis reciever does not work in Yarn
> --
>
> Key: SPARK-15039
> URL: https://issues.apache.org/jira/browse/SPARK-15039
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.6.0
> Environment: YARN
> HDP 2.4.0
>Reporter: Tsai Li Ming
>
> Hi,
> Using the pyspark kinesis example, it does not receive any messages from 
> Kinesis when submitting to a YARN cluster, though it is working fine when 
> using local mode. 
> {code}
> spark-submit \
> --executor-cores 4 \
> --num-executors 4 \
> --packages 
> com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
>  
> {code}
> I had to downgrade the package to 1.5.1. 1.6.1 does not work too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15039) Kinesis reciever does not work in Yarn

2016-04-30 Thread Tsai Li Ming (JIRA)
Tsai Li Ming created SPARK-15039:


 Summary: Kinesis reciever does not work in Yarn
 Key: SPARK-15039
 URL: https://issues.apache.org/jira/browse/SPARK-15039
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.6.0
 Environment: YARN
HDP 2.4.0
Reporter: Tsai Li Ming


Hi,

Using the pyspark kinesis example, it does not receive any messages from 
Kinesis when submitting to a YARN cluster, though it is working when using 
local mode. 

```
spark-submit \
--executor-cores 4 \
--num-executors 4 \
--packages 
com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1
 
```

I had to downgrade the package to 1.5.1 before it can work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623
 ] 

Tsai Li Ming commented on SPARK-3220:
-

I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib. In 
both cases, there was enough memory to cache everything.

> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-10 Thread Tsai Li Ming (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623
 ] 

Tsai Li Ming edited comment on SPARK-3220 at 2/10/16 11:01 AM:
---

I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib, 
though it slowed down during _reduceByKeyLocally_ phase. In both cases, there 
was enough memory to cache everything.


was (Author: ltsai):
I built Derrick's kmeans against Spark 1.6.0 and ran

{code}
import com.massivedatascience.clusterer.KMeans
val clusters = KMeans.train(parsedData, numClusters, numIterations)
{code}

It took 41mins with the same dataset/settings compared to 1hr using Mllib. In 
both cases, there was enough memory to cache everything.

> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel

2016-02-09 Thread Tsai Li Ming (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140434#comment-15140434
 ] 

Tsai Li Ming commented on SPARK-3220:
-

[~derrickburns], Is your private fork at 
https://github.com/derrickburns/generalized-kmeans-clustering ?

I am having the same problem here:
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Kmeans-using-1-core-only-Was-Slowness-in-Kmeans-calculating-fastSquaredDistance-td16304.html



> K-Means clusterer should perform K-Means initialization in parallel
> ---
>
> Key: SPARK-3220
> URL: https://issues.apache.org/jira/browse/SPARK-3220
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Derrick Burns
>  Labels: clustering
>
> The LocalKMeans method should be replaced with a parallel implementation.  As 
> it stands now, it becomes a bottleneck for large data sets. 
> I have implemented this functionality in my version of the clusterer.  
> However, I see that there are hundreds of outstanding pull requests.  If 
> someone on the team wants to sponsor the pull request, I will create one.  
> Otherwise, I will just maintain my own private fork of the clusterer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org