[jira] [Comment Edited] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming edited comment on SPARK-15039 at 5/3/16 1:16 PM: -- [~zsxwing] Nothing suspicious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. I will test in Standalone mode. was (Author: ltsai): [~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268675#comment-15268675 ] Tsai Li Ming commented on SPARK-15039: -- [~zsxwing] Nothing suspcisious in the logs. The streaming tab has 1 receiver but has 0 events/sec [~jerryshao] Have not tested standalone mode. But `--master loca[*]` works. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. Not sure whether this is related to SPARK-12453 was: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. > Not sure whether this is related to SPARK-12453 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15039) Kinesis reciever does not work in Yarn
[ https://issues.apache.org/jira/browse/SPARK-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsai Li Ming updated SPARK-15039: - Description: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working fine when using local mode. {code} spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 {code} I had to downgrade the package to 1.5.1. 1.6.1 does not work too. was: Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working when using local mode. ``` spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 ``` I had to downgrade the package to 1.5.1 before it can work. > Kinesis reciever does not work in Yarn > -- > > Key: SPARK-15039 > URL: https://issues.apache.org/jira/browse/SPARK-15039 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 > Environment: YARN > HDP 2.4.0 >Reporter: Tsai Li Ming > > Hi, > Using the pyspark kinesis example, it does not receive any messages from > Kinesis when submitting to a YARN cluster, though it is working fine when > using local mode. > {code} > spark-submit \ > --executor-cores 4 \ > --num-executors 4 \ > --packages > com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 > > {code} > I had to downgrade the package to 1.5.1. 1.6.1 does not work too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15039) Kinesis reciever does not work in Yarn
Tsai Li Ming created SPARK-15039: Summary: Kinesis reciever does not work in Yarn Key: SPARK-15039 URL: https://issues.apache.org/jira/browse/SPARK-15039 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.6.0 Environment: YARN HDP 2.4.0 Reporter: Tsai Li Ming Hi, Using the pyspark kinesis example, it does not receive any messages from Kinesis when submitting to a YARN cluster, though it is working when using local mode. ``` spark-submit \ --executor-cores 4 \ --num-executors 4 \ --packages com.databricks:spark-redshift_2.10:0.6.0,com.databricks:spark-csv_2.10:1.4.0,org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.1 ``` I had to downgrade the package to 1.5.1 before it can work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming commented on SPARK-3220: - I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib. In both cases, there was enough memory to cache everything. > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140623#comment-15140623 ] Tsai Li Ming edited comment on SPARK-3220 at 2/10/16 11:01 AM: --- I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib, though it slowed down during _reduceByKeyLocally_ phase. In both cases, there was enough memory to cache everything. was (Author: ltsai): I built Derrick's kmeans against Spark 1.6.0 and ran {code} import com.massivedatascience.clusterer.KMeans val clusters = KMeans.train(parsedData, numClusters, numIterations) {code} It took 41mins with the same dataset/settings compared to 1hr using Mllib. In both cases, there was enough memory to cache everything. > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3220) K-Means clusterer should perform K-Means initialization in parallel
[ https://issues.apache.org/jira/browse/SPARK-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140434#comment-15140434 ] Tsai Li Ming commented on SPARK-3220: - [~derrickburns], Is your private fork at https://github.com/derrickburns/generalized-kmeans-clustering ? I am having the same problem here: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Kmeans-using-1-core-only-Was-Slowness-in-Kmeans-calculating-fastSquaredDistance-td16304.html > K-Means clusterer should perform K-Means initialization in parallel > --- > > Key: SPARK-3220 > URL: https://issues.apache.org/jira/browse/SPARK-3220 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Derrick Burns > Labels: clustering > > The LocalKMeans method should be replaced with a parallel implementation. As > it stands now, it becomes a bottleneck for large data sets. > I have implemented this functionality in my version of the clusterer. > However, I see that there are hundreds of outstanding pull requests. If > someone on the team wants to sponsor the pull request, I will create one. > Otherwise, I will just maintain my own private fork of the clusterer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org