[jira] [Created] (SPARK-7940) Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH
Reynold Xin created SPARK-7940: -- Summary: Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH Key: SPARK-7940 URL: https://issues.apache.org/jira/browse/SPARK-7940 Project: Spark Issue Type: Improvement Components: Build Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7938) Use errorprone in Spark
[ https://issues.apache.org/jira/browse/SPARK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564300#comment-14564300 ] Josh Rosen commented on SPARK-7938: --- If possible, we should also integrate this into our SBT build so that these tests are run in pull request builders. If it's not possible to do that yet (e.g. we'd need to wait for someone to write a SBT plugin), then we can just do Maven for now and leave SBT to future work. > Use errorprone in Spark > --- > > Key: SPARK-7938 > URL: https://issues.apache.org/jira/browse/SPARK-7938 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin > Labels: starter > > We have quite a bit of low level code written in Java (e.g. unsafe module). > One nice thing about Java is that we can use better tools for finding common > errors, e.g. Google's error prone. > This is a ticket to integrate error pone into our Maven build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7541) Check model save/load for MLlib 1.4
[ https://issues.apache.org/jira/browse/SPARK-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564296#comment-14564296 ] yuhao yang commented on SPARK-7541: --- Oh, "checked" means I found no python support for save/load for the model. > Check model save/load for MLlib 1.4 > --- > > Key: SPARK-7541 > URL: https://issues.apache.org/jira/browse/SPARK-7541 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: yuhao yang > > For each model which supports save/load methods, we need to verify: > * These methods are tested in unit tests in Scala and Python (if save/load is > supported in Python). > * If a model's name, data members, or constructors have changed _at all_, > then we likely need to support a new save/load format version. Different > versions must be tested in unit tests to ensure backwards compatibility > (i.e., verify we can load old model formats). > * Examples in the programming guide should include save/load when available. > It's important to try running each example in the guide whenever it is > modified (since there are no automated tests). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7541) Check model save/load for MLlib 1.4
[ https://issues.apache.org/jira/browse/SPARK-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564263#comment-14564263 ] yuhao yang edited comment on SPARK-7541 at 5/29/15 6:40 AM: ||model||Scala UT || python UT || changes ||backwards Compatibility|| |LogisticRegressionModel| LogisticRegressionSuite| LogisticRegressionModel doctests|no public change| y |NaiveBayesModel| NaiveBayesSuite| NaiveBayesModel doctests| save/load 2.0| y| |SVMModel| SVMSuite| SVMModel doctests | no public change| y| |GaussianMixtureModel| GaussianMixtureSuite| checked | New Saveable in 1.4 |New Saveable in 1.4| |KMeansModel| KMeansSuite | KMeansModel doctests| New Saveable in 1.4 |New Saveable in 1.4| |PowerIterationClusteringModel |PowerIterationClusteringSuite| checked | New Saveable in 1.4|New Savable in 1.4| |Word2VecModel | Word2VecSuite | checked | New Saveable in 1.4|New Saveable in 1.4| |MatrixFactorizationModel |MatrixFactorizationModelSuite | MatrixFactorizationModel doctests | no public change | y| |IsotonicRegressionModel| IsotonicRegressionSuite | IsotonicRegressionModel | New Saveable in 1.4 | New Saveable in 1.4| |LassoModel | LassoSuite | LassoModel doctests | no public change| y| |LinearRegressionModel | LinearRegressionSuite | LinearRegressionModel doctests | no public change|y| |RidgeRegressionModel | RidgeRegressionSuite| RidgeRegressionModel doctests | no public change|y| |DecisionTreeModel | DecisionTreeSuite| dt_model.save| no public change| y| |RandomForestModel| RandomForestSuite | rf_model.save | no public change| y| |GradientBoostedTreesModel |GradientBoostedTreesSuite |gbt_model.sav | no public change| y| Above contents have been checked and no obvious issue detected. And Joseph, do you think we should add save/load wherever available in the example documents? was (Author: yuhaoyan): ||model||Scala UT || python UT || changes ||backwards Compatibility|| |LogisticRegressionModel| LogisticRegressionSuite| LogisticRegressionModel doctests|no public change| y |NaiveBayesModel| NaiveBayesSuite| NaiveBayesModel doctests| save/load 2.0| y| |SVMModel| SVMSuite| SVMModel doctests | no public change| y| |GaussianMixtureModel| GaussianMixtureSuite| checked | New Savable in 1.4 |New Savable in 1.4| |KMeansModel| KMeansSuite | KMeansModel doctests| New Savable in 1.4 |New Savable in 1.4| |PowerIterationClusteringModel |PowerIterationClusteringSuite| checked | New Savable in 1.4| New Savable in 1.4| |Word2VecModel | Word2VecSuite | checked | New Savable in 1.4| New Savable in 1.4| |MatrixFactorizationModel |MatrixFactorizationModelSuite | MatrixFactorizationModel doctests | no public change | y| |IsotonicRegressionModel| IsotonicRegressionSuite | IsotonicRegressionModel | New Savable in 1.4 |New Savable in 1.4| |LassoModel | LassoSuite | LassoModel doctests | no public change| y| |LinearRegressionModel | LinearRegressionSuite | LinearRegressionModel doctests | no public change|y| |RidgeRegressionModel | RidgeRegressionSuite| RidgeRegressionModel doctests | no public change|y| |DecisionTreeModel | DecisionTreeSuite| dt_model.save| no public change| y| |RandomForestModel| RandomForestSuite | rf_model.save | no public change| y| |GradientBoostedTreesModel |GradientBoostedTreesSuite |gbt_model.sav |
[jira] [Commented] (SPARK-6806) SparkR examples in programming guide
[ https://issues.apache.org/jira/browse/SPARK-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564293#comment-14564293 ] Apache Spark commented on SPARK-6806: - User 'shivaram' has created a pull request for this issue: https://github.com/apache/spark/pull/6490 > SparkR examples in programming guide > > > Key: SPARK-6806 > URL: https://issues.apache.org/jira/browse/SPARK-6806 > Project: Spark > Issue Type: New Feature > Components: Documentation, SparkR >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > Fix For: 1.4.0 > > > Add R examples for Spark Core and DataFrame programming guide -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7938) Use errorprone in Spark
[ https://issues.apache.org/jira/browse/SPARK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564292#comment-14564292 ] Yijie Shen commented on SPARK-7938: --- I'd love to take this :) > Use errorprone in Spark > --- > > Key: SPARK-7938 > URL: https://issues.apache.org/jira/browse/SPARK-7938 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin > Labels: starter > > We have quite a bit of low level code written in Java (e.g. unsafe module). > One nice thing about Java is that we can use better tools for finding common > errors, e.g. Google's error prone. > This is a ticket to integrate error pone into our Maven build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7541) Check model save/load for MLlib 1.4
[ https://issues.apache.org/jira/browse/SPARK-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564283#comment-14564283 ] Joseph K. Bradley commented on SPARK-7541: -- Awesome, thank you for the careful check! Q: In the "python UT" column, I understand what the "doctests" are, but what do the other entries mean? E.g., what does "checked" mean? Good point about having save/load in all relevant example docs. Would you mind putting together a PR for adding that to example code in the Markdown programming guide docs? > Check model save/load for MLlib 1.4 > --- > > Key: SPARK-7541 > URL: https://issues.apache.org/jira/browse/SPARK-7541 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: yuhao yang > > For each model which supports save/load methods, we need to verify: > * These methods are tested in unit tests in Scala and Python (if save/load is > supported in Python). > * If a model's name, data members, or constructors have changed _at all_, > then we likely need to support a new save/load format version. Different > versions must be tested in unit tests to ensure backwards compatibility > (i.e., verify we can load old model formats). > * Examples in the programming guide should include save/load when available. > It's important to try running each example in the guide whenever it is > modified (since there are no automated tests). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7936) Add configuration for initial size and limit of hash for aggregation
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated SPARK-7936: - Summary: Add configuration for initial size and limit of hash for aggregation (was: Add configuration for initial size of hash for aggregation and limit) > Add configuration for initial size and limit of hash for aggregation > > > Key: SPARK-7936 > URL: https://issues.apache.org/jira/browse/SPARK-7936 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Priority: Minor > > Partial aggregation takes a lot of memory and mostly cannot be completed if > it's not sliced into very small partitions (large in count). This patch is > for limiting entry size for partial aggregation. Initial size for hash is > just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7939) Make URL partition recognition return String by default for all partition column types and values
[ https://issues.apache.org/jira/browse/SPARK-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianshi Huang updated SPARK-7939: - Summary: Make URL partition recognition return String by default for all partition column types and values (was: Make URL partition recognition return String by default for all partition column values) > Make URL partition recognition return String by default for all partition > column types and values > - > > Key: SPARK-7939 > URL: https://issues.apache.org/jira/browse/SPARK-7939 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Jianshi Huang > > Imagine the following HDFS paths: > /data/split=00 > /data/split=01 > ... > /data/split=FF > If I have less than or equal to 10 partitions (00, 01, ... 09), currently > partition recognition will treat column 'split' as integer column. > If I have more than 10 partitions, column 'split' will be recognized as > String... > This is very confusing. *So I'm suggesting to treat partition columns as > String by default*, and allow user to specify types if needed. > Another example is date: > /data/date=2015-04-01 => 'date' is String > /data/date=20150401 => 'date' is Int > Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7939) Make URL partition recognition return String by default for all partition column values
Jianshi Huang created SPARK-7939: Summary: Make URL partition recognition return String by default for all partition column values Key: SPARK-7939 URL: https://issues.apache.org/jira/browse/SPARK-7939 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Jianshi Huang Imagine the following HDFS paths: /data/split=00 /data/split=01 ... /data/split=FF If I have less than or equal to 10 partitions (00, 01, ... 09), currently partition recognition will treat column 'split' as integer column. If I have more than 10 partitions, column 'split' will be recognized as String... This is very confusing. *So I'm suggesting to treat partition columns as String by default*, and allow user to specify types if needed. Another example is date: /data/date=2015-04-01 => 'date' is String /data/date=20150401 => 'date' is Int Jianshi -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7938) Use errorprone in Spark
[ https://issues.apache.org/jira/browse/SPARK-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-7938: --- Labels: starter (was: ) > Use errorprone in Spark > --- > > Key: SPARK-7938 > URL: https://issues.apache.org/jira/browse/SPARK-7938 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin > Labels: starter > > We have quite a bit of low level code written in Java (e.g. unsafe module). > One nice thing about Java is that we can use better tools for finding common > errors, e.g. Google's error prone. > This is a ticket to integrate error pone into our Maven build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564271#comment-14564271 ] Navis commented on SPARK-7936: -- Added two configuration 1. spark.sql.aggregation.hash.initSize : initialize size of hash. Applied to both(final, partial) aggregation 2. spark.sql.partial.aggregation.maxEntry : max size of hash for partial aggregation. should not be used for final aggregation > Add configuration for initial size of hash for aggregation and limit > > > Key: SPARK-7936 > URL: https://issues.apache.org/jira/browse/SPARK-7936 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Priority: Minor > > Partial aggregation takes a lot of memory and mostly cannot be completed if > it's not sliced into very small partitions (large in count). This patch is > for limiting entry size for partial aggregation. Initial size for hash is > just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7936: --- Assignee: Apache Spark > Add configuration for initial size of hash for aggregation and limit > > > Key: SPARK-7936 > URL: https://issues.apache.org/jira/browse/SPARK-7936 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Assignee: Apache Spark >Priority: Minor > > Partial aggregation takes a lot of memory and mostly cannot be completed if > it's not sliced into very small partitions (large in count). This patch is > for limiting entry size for partial aggregation. Initial size for hash is > just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564270#comment-14564270 ] Apache Spark commented on SPARK-7936: - User 'navis' has created a pull request for this issue: https://github.com/apache/spark/pull/6488 > Add configuration for initial size of hash for aggregation and limit > > > Key: SPARK-7936 > URL: https://issues.apache.org/jira/browse/SPARK-7936 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Priority: Minor > > Partial aggregation takes a lot of memory and mostly cannot be completed if > it's not sliced into very small partitions (large in count). This patch is > for limiting entry size for partial aggregation. Initial size for hash is > just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
[ https://issues.apache.org/jira/browse/SPARK-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7936: --- Assignee: (was: Apache Spark) > Add configuration for initial size of hash for aggregation and limit > > > Key: SPARK-7936 > URL: https://issues.apache.org/jira/browse/SPARK-7936 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Navis >Priority: Minor > > Partial aggregation takes a lot of memory and mostly cannot be completed if > it's not sliced into very small partitions (large in count). This patch is > for limiting entry size for partial aggregation. Initial size for hash is > just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7938) Use errorprone in Spark
Reynold Xin created SPARK-7938: -- Summary: Use errorprone in Spark Key: SPARK-7938 URL: https://issues.apache.org/jira/browse/SPARK-7938 Project: Spark Issue Type: Improvement Components: Build Reporter: Reynold Xin We have quite a bit of low level code written in Java (e.g. unsafe module). One nice thing about Java is that we can use better tools for finding common errors, e.g. Google's error prone. This is a ticket to integrate error pone into our Maven build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7937) Cannot compare Hive named_struct. (when using argmax, argmin)
[ https://issues.apache.org/jira/browse/SPARK-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564267#comment-14564267 ] Jianshi Huang commented on SPARK-7937: -- Blog for describing Hive's argmax, argmin feature: https://www.joefkelley.com/?p=727 HIVE JIRA: https://issues.apache.org/jira/browse/HIVE-1128 Jianshi > Cannot compare Hive named_struct. (when using argmax, argmin) > - > > Key: SPARK-7937 > URL: https://issues.apache.org/jira/browse/SPARK-7937 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Jianshi Huang > > Imagine the following SQL: > Intention: get last used bank account country. > > {code:sql} > select bank_account_id, > max(named_struct( > 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D > HH:mm:ss'), > 'bank_country', bank_country)).bank_country > from bank_account_monthly > where year_month='201502' > group by bank_account_id > {code} > => > {noformat} > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in > stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type > StructType(StructField(src_row_update_ts,LongType,true), > StructField(bank_country,StringType,true)) does not support ordered operations > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) > at > org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) > at > org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) > at > org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) > at > org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564266#comment-14564266 ] Josh Rosen commented on SPARK-7708: --- Also, it looks like Chill is still using Kryo 2.2.1 instead of a newer version because of some Storm incompatibilities or dependency problems or something: https://github.com/twitter/chill/commit/3869b0122660c908e189ff08b615bd7221956224#commitcomment-8362755. Therefore, it might be an uphill battle to do a version bump since it might require community involvement from both the Kryo and/or Chill developers. If the only blocker for Chill is Storm compatibility issues that don't affect us, we might consider publishing our own fork of Chill under the org.apache.spark namespace, similar to how we used to publish custom versions of Pyrolite. If possible, though, I'd like to avoid that option and only use it as a last resort. I can't really spend much more time investigating this myself right now, but would really appreciate it if someone would dig into these issues in more detail and post a summary here. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7937) Cannot compare Hive named_struct. (when using argmax, argmin)
[ https://issues.apache.org/jira/browse/SPARK-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianshi Huang updated SPARK-7937: - Description: Imagine the following SQL: Intention: get last used bank account country. {code:sql} select bank_account_id, max(named_struct( 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D HH:mm:ss'), 'bank_country', bank_country)).bank_country from bank_account_monthly where year_month='201502' group by bank_account_id {code} => {noformat} Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) {noformat} was: Imagine the following SQL: Intention: get last used bank account country. ``` sql select bank_account_id, max(named_struct( 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D HH:mm:ss'), 'bank_country', bank_country)).bank_country from bank_account_monthly where year_month='201502' group by bank_account_id ``` => ``` Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at
[jira] [Updated] (SPARK-7937) Cannot compare Hive named_struct. (when using argmax, argmin)
[ https://issues.apache.org/jira/browse/SPARK-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jianshi Huang updated SPARK-7937: - Description: Imagine the following SQL: Intention: get last used bank account country. ``` sql select bank_account_id, max(named_struct( 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D HH:mm:ss'), 'bank_country', bank_country)).bank_country from bank_account_monthly where year_month='201502' group by bank_account_id ``` => ``` Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) ``` was: Imagine the following SQL: Intention: get last used bank account country. select bank_account_id, max(named_struct( 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D HH:mm:ss'), 'bank_country', bank_country)).bank_country from bank_account_monthly where year_month='201502' group by bank_account_id => Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.ru
[jira] [Created] (SPARK-7937) Cannot compare Hive named_struct. (when using argmax, argmin)
Jianshi Huang created SPARK-7937: Summary: Cannot compare Hive named_struct. (when using argmax, argmin) Key: SPARK-7937 URL: https://issues.apache.org/jira/browse/SPARK-7937 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Jianshi Huang Imagine the following SQL: Intention: get last used bank account country. select bank_account_id, max(named_struct( 'src_row_update_ts', unix_timestamp(src_row_update_ts,'/M/D HH:mm:ss'), 'bank_country', bank_country)).bank_country from bank_account_monthly where year_month='201502' group by bank_account_id => Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, ): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222) at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215) at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235) at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165) at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7541) Check model save/load for MLlib 1.4
[ https://issues.apache.org/jira/browse/SPARK-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564263#comment-14564263 ] yuhao yang commented on SPARK-7541: --- ||model||Scala UT || python UT || changes ||backwards Compatibility|| |LogisticRegressionModel| LogisticRegressionSuite| LogisticRegressionModel doctests|no public change| y |NaiveBayesModel| NaiveBayesSuite| NaiveBayesModel doctests| save/load 2.0| y| |SVMModel| SVMSuite| SVMModel doctests | no public change| y| |GaussianMixtureModel| GaussianMixtureSuite| checked | New Savable in 1.4 |New Savable in 1.4| |KMeansModel| KMeansSuite | KMeansModel doctests| New Savable in 1.4 |New Savable in 1.4| |PowerIterationClusteringModel |PowerIterationClusteringSuite| checked | New Savable in 1.4| New Savable in 1.4| |Word2VecModel | Word2VecSuite | checked | New Savable in 1.4| New Savable in 1.4| |MatrixFactorizationModel |MatrixFactorizationModelSuite | MatrixFactorizationModel doctests | no public change | y| |IsotonicRegressionModel| IsotonicRegressionSuite | IsotonicRegressionModel | New Savable in 1.4 |New Savable in 1.4| |LassoModel | LassoSuite | LassoModel doctests | no public change| y| |LinearRegressionModel | LinearRegressionSuite | LinearRegressionModel doctests | no public change|y| |RidgeRegressionModel | RidgeRegressionSuite| RidgeRegressionModel doctests | no public change|y| |DecisionTreeModel | DecisionTreeSuite| dt_model.save| no public change| y| |RandomForestModel| RandomForestSuite | rf_model.save | no public change| y| |GradientBoostedTreesModel |GradientBoostedTreesSuite |gbt_model.sav | no public change| y| Above contents have been checked and no obvious issue detected. And Joseph, do you think we should add save/load wherever available in the example documents? > Check model save/load for MLlib 1.4 > --- > > Key: SPARK-7541 > URL: https://issues.apache.org/jira/browse/SPARK-7541 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib, PySpark >Reporter: Joseph K. Bradley >Assignee: yuhao yang > > For each model which supports save/load methods, we need to verify: > * These methods are tested in unit tests in Scala and Python (if save/load is > supported in Python). > * If a model's name, data members, or constructors have changed _at all_, > then we likely need to support a new save/load format version. Different > versions must be tested in unit tests to ensure backwards compatibility > (i.e., verify we can load old model formats). > * Examples in the programming guide should include save/load when available. > It's important to try running each example in the guide whenever it is > modified (since there are no automated tests). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564251#comment-14564251 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6487 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.4.0 > > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-7927. Resolution: Fixed Fix Version/s: 1.4.0 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.4.0 > > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7929) Remove Bagel examples
[ https://issues.apache.org/jira/browse/SPARK-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-7929. Resolution: Fixed Fix Version/s: 1.4.0 > Remove Bagel examples > - > > Key: SPARK-7929 > URL: https://issues.apache.org/jira/browse/SPARK-7929 > Project: Spark > Issue Type: Task > Components: Examples, GraphX >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 1.4.0 > > > Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7922) ALSModel in the pipeline API should return DataFrames for factors
[ https://issues.apache.org/jira/browse/SPARK-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7922. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6468 [https://github.com/apache/spark/pull/6468] > ALSModel in the pipeline API should return DataFrames for factors > - > > Key: SPARK-7922 > URL: https://issues.apache.org/jira/browse/SPARK-7922 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.3.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.4.0 > > > This is to be more consistent with the pipeline API. It also helps maintain > consistent APIs across languages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7890) Document that Spark 2.11 now supports Kafka
[ https://issues.apache.org/jira/browse/SPARK-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7890: --- Assignee: Sean Owen (was: Iulian Dragos) > Document that Spark 2.11 now supports Kafka > --- > > Key: SPARK-7890 > URL: https://issues.apache.org/jira/browse/SPARK-7890 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Sean Owen >Priority: Critical > > The building-spark.html page needs to be updated. It's a simple fix, just > remove the caveat about Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7890) Document that Spark 2.11 now supports Kafka
[ https://issues.apache.org/jira/browse/SPARK-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564231#comment-14564231 ] Patrick Wendell commented on SPARK-7890: No - the JDBC component is not supported. > Document that Spark 2.11 now supports Kafka > --- > > Key: SPARK-7890 > URL: https://issues.apache.org/jira/browse/SPARK-7890 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Sean Owen >Priority: Critical > > The building-spark.html page needs to be updated. It's a simple fix, just > remove the caveat about Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7930. Resolution: Fixed Fix Version/s: 1.4.0 > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Critical > Fix For: 1.4.0 > > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7936) Add configuration for initial size of hash for aggregation and limit
Navis created SPARK-7936: Summary: Add configuration for initial size of hash for aggregation and limit Key: SPARK-7936 URL: https://issues.apache.org/jira/browse/SPARK-7936 Project: Spark Issue Type: Improvement Components: SQL Reporter: Navis Priority: Minor Partial aggregation takes a lot of memory and mostly cannot be completed if it's not sliced into very small partitions (large in count). This patch is for limiting entry size for partial aggregation. Initial size for hash is just a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564219#comment-14564219 ] Josh Rosen commented on SPARK-7708: --- I invested the time to dig into this because I was worried that this issue might impact us in 1.4 due to our increased serializer reuse. On closer analysis, though, I think we're safe. In 1.3.x, it appears that there are some cases where the old could _would_ re-use the same SerializerInstance and make multiple `serialize()` calls using the same `Output`. If the bug didn’t manifest in those older versions and we didn’t introduce any new cases of this pattern in 1.4.0, then I don’t think we need to take any additional action for 1.4. It might be good to have someone else confirm this, though, but my quick glances through IntelliJ suggest that things are okay. Regarding upgrading Kryo, there may be some considerations due to our use of Chill. I'm not sure whether chill supports Kryo 3.x. We also need to be careful to not introduce bugs / regressions by upgrading to 2.23. Definitely give 2.23.0 a try, though, and let me know if it fixes the problem. If it does, you can modify your PR to bump to that version and try to copy the code from my gist into a KryoSerializerSuite regression test. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7932) Scheduler delay shown in event timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout resolved SPARK-7932. --- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6484 [https://github.com/apache/spark/pull/6484] > Scheduler delay shown in event timeline is incorrect > > > Key: SPARK-7932 > URL: https://issues.apache.org/jira/browse/SPARK-7932 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 1.4.0 > > > In StagePage.scala, we round *down* to the nearest percent when computing the > proportion of a task's time spend in each phase of execution. Scheduler > delay is computed by taking 100 - sum(all other proportions), which means > that a few extra percent may go into the scheduler delay. As a result, > scheduler delay can appear larger in the visualization than it actually is. > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7926) Switch to the official Pyrolite release
[ https://issues.apache.org/jira/browse/SPARK-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-7926. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6472 [https://github.com/apache/spark/pull/6472] > Switch to the official Pyrolite release > --- > > Key: SPARK-7926 > URL: https://issues.apache.org/jira/browse/SPARK-7926 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.4.0 > > > Since there are official releases of Pyrolite on Maven Central, it is time > for us to switch to them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7909) spark-ec2 and associated tools not py3 ready
[ https://issues.apache.org/jira/browse/SPARK-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564168#comment-14564168 ] Shivaram Venkataraman commented on SPARK-7909: -- The packages will get to S3 once the 1.4 release is finalized. We are still testing / voting on release candidates and you can follow these on the Spark developer mailing list. BTW I also have a change open at spark-ec2 for substituting the Spark version based on pattern https://github.com/mesos/spark-ec2/pull/116/files#diff-1d040c3294246f2b59643d63868fc2ad, so that should take care of picking up the binary once its released. However, feel free to send out PRs for the other python3 print fixes you had to make in init.sh etc. > spark-ec2 and associated tools not py3 ready > > > Key: SPARK-7909 > URL: https://issues.apache.org/jira/browse/SPARK-7909 > Project: Spark > Issue Type: Improvement > Components: EC2 > Environment: ec2 python3 >Reporter: Matthew Goodman > > At present there is not a possible permutation of tools that supports Python3 > on both the launching computer and running cluster. There are a couple > problems involved: > - There is no prebuilt spark binary with python3 support. > - spark-ec2/spark/init.sh contains inline py3 unfriendly print statements > - Config files for cluster processes don't seem to make it to all nodes in a > working format. > I have fixes for some of this, but the config and running context debugging > remains elusive to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7909) spark-ec2 and associated tools not py3 ready
[ https://issues.apache.org/jira/browse/SPARK-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564166#comment-14564166 ] Matthew Goodman commented on SPARK-7909: Using the prebuilt binaries from the links provided yields a working cluster. Is there a timeline for when the spark 1.4.0 binaries make the s3 bucket? I can add the link to the spark/init.sh script, but it will bounce until the binary is actually place in the bucket. In either case I suspect the naming convention will be similar, so would a PR for the changes outlined above be a good step at this stage? > spark-ec2 and associated tools not py3 ready > > > Key: SPARK-7909 > URL: https://issues.apache.org/jira/browse/SPARK-7909 > Project: Spark > Issue Type: Improvement > Components: EC2 > Environment: ec2 python3 >Reporter: Matthew Goodman > > At present there is not a possible permutation of tools that supports Python3 > on both the launching computer and running cluster. There are a couple > problems involved: > - There is no prebuilt spark binary with python3 support. > - spark-ec2/spark/init.sh contains inline py3 unfriendly print statements > - Config files for cluster processes don't seem to make it to all nodes in a > working format. > I have fixes for some of this, but the config and running context debugging > remains elusive to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7935) sparkContext in SparkPlan is better to be define as val
[ https://issues.apache.org/jira/browse/SPARK-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7935: --- Assignee: Apache Spark > sparkContext in SparkPlan is better to be define as val > > > Key: SPARK-7935 > URL: https://issues.apache.org/jira/browse/SPARK-7935 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: baishuo >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7935) sparkContext in SparkPlan is better to be define as val
[ https://issues.apache.org/jira/browse/SPARK-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7935: --- Assignee: (was: Apache Spark) > sparkContext in SparkPlan is better to be define as val > > > Key: SPARK-7935 > URL: https://issues.apache.org/jira/browse/SPARK-7935 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: baishuo >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7935) sparkContext in SparkPlan is better to be define as val
[ https://issues.apache.org/jira/browse/SPARK-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564159#comment-14564159 ] Apache Spark commented on SPARK-7935: - User 'baishuo' has created a pull request for this issue: https://github.com/apache/spark/pull/6486 > sparkContext in SparkPlan is better to be define as val > > > Key: SPARK-7935 > URL: https://issues.apache.org/jira/browse/SPARK-7935 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: baishuo >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7929) Remove Bagel examples
[ https://issues.apache.org/jira/browse/SPARK-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564144#comment-14564144 ] Apache Spark commented on SPARK-7929: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6487 > Remove Bagel examples > - > > Key: SPARK-7929 > URL: https://issues.apache.org/jira/browse/SPARK-7929 > Project: Spark > Issue Type: Task > Components: Examples, GraphX >Reporter: Reynold Xin >Assignee: Reynold Xin > > Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7935) sparkContext in SparkPlan is better to be define as val
baishuo created SPARK-7935: -- Summary: sparkContext in SparkPlan is better to be define as val Key: SPARK-7935 URL: https://issues.apache.org/jira/browse/SPARK-7935 Project: Spark Issue Type: Improvement Components: SQL Reporter: baishuo Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7934) In some cases, Spark hangs in yarn-client mode.
Guoqiang Li created SPARK-7934: -- Summary: In some cases, Spark hangs in yarn-client mode. Key: SPARK-7934 URL: https://issues.apache.org/jira/browse/SPARK-7934 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1 Reporter: Guoqiang Li The log: {noformat} 15/05/29 10:20:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/29 10:20:20 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:20 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:20 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:20 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:20 INFO AbstractConnector: Started SocketConnector@0.0.0.0:54276 15/05/29 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 54276. 15/05/29 10:20:31 INFO SparkContext: Running Spark version 1.3.1 15/05/29 10:20:31 WARN SparkConf: The configuration option 'spark.yarn.user.classpath.first' has been replaced as of Spark 1.3 and may be removed in the future. Use spark.{driver,executor}.userClassPathFirst instead. 15/05/29 10:20:31 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:31 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark) 15/05/29 10:20:32 INFO Slf4jLogger: Slf4jLogger started 15/05/29 10:20:32 INFO Remoting: Starting remoting 15/05/29 10:20:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkdri...@10dian71.domain.test:55492] 15/05/29 10:20:33 INFO Utils: Successfully started service 'sparkDriver' on port 55492. 15/05/29 10:20:33 INFO SparkEnv: Registering MapOutputTracker 15/05/29 10:20:33 INFO SparkEnv: Registering BlockManagerMaster 15/05/29 10:20:33 INFO DiskBlockManager: Created local directory at /tmp/spark-94c41fce-1788-484e-9878-88d1bf8c7247/blockmgr-b3d7ba9d-6656-408f-b9e2-683784493f22 15/05/29 10:20:33 INFO MemoryStore: MemoryStore started with capacity 4.1 GB 15/05/29 10:20:34 INFO HttpFileServer: HTTP File server directory is /tmp/spark-271bab98-b4e8-4b02-8267-0020a38f355b/httpd-92bb8c15-51a7-4b40-9d01-2fb01cfbb148 15/05/29 10:20:34 INFO HttpServer: Starting HTTP Server 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SocketConnector@0.0.0.0:38530 15/05/29 10:20:34 INFO Utils: Successfully started service 'HTTP file server' on port 38530. 15/05/29 10:20:34 INFO SparkEnv: Registering OutputCommitCoordinator 15/05/29 10:20:34 INFO Server: jetty-8.y.z-SNAPSHOT 15/05/29 10:20:34 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/05/29 10:20:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/29 10:20:34 INFO SparkUI: Started SparkUI at http://10dian71.domain.test:4040 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/spark-1.3.0-cdh5/lib/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar at http://192.168.10.71:38530/jars/hadoop-lzo-0.4.15-gplextras5.0.1-SNAPSHOT.jar with timestamp 1432866034769 15/05/29 10:20:34 INFO SparkContext: Added JAR file:/opt/spark/classes/toona-assembly.jar at http://192.168.10.71:38530/jars/toona-assembly.jar with timestamp 1432866034972 15/05/29 10:20:35 INFO RMProxy: Connecting to ResourceManager at 10dian72/192.168.10.72:9080 15/05/29 10:20:36 INFO Client: Requesting a new application from cluster with 9 NodeManagers 15/05/29 10:20:36 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (10240 MB per container) 15/05/29 10:20:36 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/05/29 10:20:36 INFO Client: Setting up container launch context for our AM 15/05/29 10:20:36 INFO Client: Preparing resources for our AM container 15/05/29 10:20:37 INFO Client: Uploading resource file:/opt/spark/spark-1.3.0-cdh5/lib/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar -> hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/spark-assembly-1.3.2-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar 15/05/29 10:20:39 INFO Client: Uploading resource hdfs://ns1:8020/input/lbs/recommend/toona/spark/conf -> hdfs://ns1/user/spark/.sparkStaging/application_1429108701044_0881/conf 15/05/29 10:20:41 INFO Client: Setting up the launch environment for our AM container 15/05/29 10:20:42 INFO SecurityManager: Changing view acls to: spark 15/05/29 10:20:42 INFO SecurityManager: Changing modify acls to: spark 15/05/29 10:20:42 INF
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564092#comment-14564092 ] Akshat Aranya commented on SPARK-7708: -- Wow, that's some serious sleuthing! I will try the newer version of Kryo and see if the rest of my serialization problems go away. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7826) Suppress extra calling getCacheLocs.
[ https://issues.apache.org/jira/browse/SPARK-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout resolved SPARK-7826. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 6352 [https://github.com/apache/spark/pull/6352] > Suppress extra calling getCacheLocs. > > > Key: SPARK-7826 > URL: https://issues.apache.org/jira/browse/SPARK-7826 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.5.0 > > > There are too many extra call method {{getCacheLocs}} for {{DAGScheduler}}, > which includes Akka communication. > To improve {{DAGScheduler}} performance, suppress extra calling the method. > In my application with over 1200 stages, the execution time became 3.8 min > from 8.5 min with my patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7933) The default merge script JIRA username / password should be empty
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-7933: -- Description: It looks like this was changed accidentally a few months ago. (was: It looks like this was added accidentally when [~pwendell] merged a PR a few months ago.) Summary: The default merge script JIRA username / password should be empty (was: Patrick's username / password shouldn't be the defaults in the merge script) > The default merge script JIRA username / password should be empty > - > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 1.4.0 > > > It looks like this was changed accidentally a few months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564074#comment-14564074 ] Patrick Wendell commented on SPARK-7933: Thanks - this was a dummy password I added in there, but yeah fine to have it be the empty string. > Patrick's username / password shouldn't be the defaults in the merge script > --- > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 1.4.0 > > > It looks like this was added accidentally when [~pwendell] merged a PR a few > months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7933. Resolution: Fixed Fix Version/s: 1.4.0 > Patrick's username / password shouldn't be the defaults in the merge script > --- > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > Fix For: 1.4.0 > > > It looks like this was added accidentally when [~pwendell] merged a PR a few > months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7933: --- Assignee: Apache Spark (was: Kay Ousterhout) > Patrick's username / password shouldn't be the defaults in the merge script > --- > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Apache Spark >Priority: Minor > > It looks like this was added accidentally when [~pwendell] merged a PR a few > months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7933: --- Assignee: Kay Ousterhout (was: Apache Spark) > Patrick's username / password shouldn't be the defaults in the merge script > --- > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > > It looks like this was added accidentally when [~pwendell] merged a PR a few > months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
[ https://issues.apache.org/jira/browse/SPARK-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564073#comment-14564073 ] Apache Spark commented on SPARK-7933: - User 'kayousterhout' has created a pull request for this issue: https://github.com/apache/spark/pull/6485 > Patrick's username / password shouldn't be the defaults in the merge script > --- > > Key: SPARK-7933 > URL: https://issues.apache.org/jira/browse/SPARK-7933 > Project: Spark > Issue Type: Bug > Components: Project Infra >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > > It looks like this was added accidentally when [~pwendell] merged a PR a few > months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7933) Patrick's username / password shouldn't be the defaults in the merge script
Kay Ousterhout created SPARK-7933: - Summary: Patrick's username / password shouldn't be the defaults in the merge script Key: SPARK-7933 URL: https://issues.apache.org/jira/browse/SPARK-7933 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Kay Ousterhout Assignee: Kay Ousterhout Priority: Minor It looks like this was added accidentally when [~pwendell] merged a PR a few months ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7932) Scheduler delay shown in event timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7932: --- Assignee: Kay Ousterhout (was: Apache Spark) > Scheduler delay shown in event timeline is incorrect > > > Key: SPARK-7932 > URL: https://issues.apache.org/jira/browse/SPARK-7932 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > > In StagePage.scala, we round *down* to the nearest percent when computing the > proportion of a task's time spend in each phase of execution. Scheduler > delay is computed by taking 100 - sum(all other proportions), which means > that a few extra percent may go into the scheduler delay. As a result, > scheduler delay can appear larger in the visualization than it actually is. > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7932) Scheduler delay shown in event timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564065#comment-14564065 ] Apache Spark commented on SPARK-7932: - User 'kayousterhout' has created a pull request for this issue: https://github.com/apache/spark/pull/6484 > Scheduler delay shown in event timeline is incorrect > > > Key: SPARK-7932 > URL: https://issues.apache.org/jira/browse/SPARK-7932 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Kay Ousterhout >Priority: Minor > > In StagePage.scala, we round *down* to the nearest percent when computing the > proportion of a task's time spend in each phase of execution. Scheduler > delay is computed by taking 100 - sum(all other proportions), which means > that a few extra percent may go into the scheduler delay. As a result, > scheduler delay can appear larger in the visualization than it actually is. > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7932) Scheduler delay shown in event timeline is incorrect
[ https://issues.apache.org/jira/browse/SPARK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7932: --- Assignee: Apache Spark (was: Kay Ousterhout) > Scheduler delay shown in event timeline is incorrect > > > Key: SPARK-7932 > URL: https://issues.apache.org/jira/browse/SPARK-7932 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Kay Ousterhout >Assignee: Apache Spark >Priority: Minor > > In StagePage.scala, we round *down* to the nearest percent when computing the > proportion of a task's time spend in each phase of execution. Scheduler > delay is computed by taking 100 - sum(all other proportions), which means > that a few extra percent may go into the scheduler delay. As a result, > scheduler delay can appear larger in the visualization than it actually is. > cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7932) Scheduler delay shown in event timeline is incorrect
Kay Ousterhout created SPARK-7932: - Summary: Scheduler delay shown in event timeline is incorrect Key: SPARK-7932 URL: https://issues.apache.org/jira/browse/SPARK-7932 Project: Spark Issue Type: Bug Components: Web UI Reporter: Kay Ousterhout Assignee: Kay Ousterhout Priority: Minor In StagePage.scala, we round *down* to the nearest percent when computing the proportion of a task's time spend in each phase of execution. Scheduler delay is computed by taking 100 - sum(all other proportions), which means that a few extra percent may go into the scheduler delay. As a result, scheduler delay can appear larger in the visualization than it actually is. cc [~shivaram] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564054#comment-14564054 ] Josh Rosen commented on SPARK-7708: --- I've opened https://github.com/EsotericSoftware/kryo/issues/312 to discuss this with the Kryo developers. Updating to Kryo 2.23.0 fixes the symptoms that we've observed here, but it would still be good to get confirmation that our re-use of {{Output}} is something that Kryo intends to support. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6950) Spark master UI believes some applications are in progress when they are actually completed
[ https://issues.apache.org/jira/browse/SPARK-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564042#comment-14564042 ] Haopu Wang commented on SPARK-6950: --- I hit this issue on 1.3.0 and 1.3.1. It can be reproduced using a very simple application and a standalone cluster (1 master and 1 slave). > Spark master UI believes some applications are in progress when they are > actually completed > --- > > Key: SPARK-6950 > URL: https://issues.apache.org/jira/browse/SPARK-6950 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.3.0 >Reporter: Matt Cheah > Fix For: 1.3.1 > > > In Spark 1.2.x, I was able to set my spark event log directory to be a > different location from the default, and after the job finishes, I can replay > the UI by clicking on the appropriate link under "Completed Applications". > Now, on a non-deterministic basis (but seems to happen most of the time), > when I click on the link under "Completed Applications", I instead get a > webpage that says: > Application history not found (app-20150415052927-0014) > Application myApp is still in progress. > I am able to view the application's UI using the Spark history server, so > something regressed in the Spark master code between 1.2 and 1.3, but that > regression does not apply in the history server use case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564038#comment-14564038 ] Josh Rosen commented on SPARK-7708: --- Two hours later and I've now found where the state was hiding. I discovered this using the following isolated test project, which explains the 3-byte size difference: https://gist.github.com/JoshRosen/14ba69ef53af53ef2839 Intuitively, you might think that it's in {{Output}} because using a new {{Output}} solves the issue. However, it turns out that the state was hiding inside Kryo's {{JavaSerializer}} class: {code} public class JavaSerializer extends Serializer { private ObjectOutputStream objectStream; private Output lastOutput; public JavaSerializer() { } public void write(Kryo kryo, Output output, Object object) { try { if(output != this.lastOutput) { this.objectStream = new ObjectOutputStream(output); this.lastOutput = output; } else { this.objectStream.reset(); } this.objectStream.writeObject(object); this.objectStream.flush(); } catch (Exception var5) { throw new KryoException("Error during Java serialization.", var5); } } [...] {code} When you pass a new output, it opens a new ObjectOutputStream and write a new stream header, but when you reuse the output it only writes a reset flag. The header is two shorts, which is four bytes, whereas the reset is only one byte, leading to the 3-byte difference. I'm not sure whether this is a bug in Kryo or whether our re-use of Output is unsafe. I'll email the developers to ask. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-7930: - Priority: Critical (was: Blocker) > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Critical > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7931) Do not restart a socket receiver when the receiver is being shutdown
[ https://issues.apache.org/jira/browse/SPARK-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7931: --- Assignee: Apache Spark (was: Tathagata Das) > Do not restart a socket receiver when the receiver is being shutdown > > > Key: SPARK-7931 > URL: https://issues.apache.org/jira/browse/SPARK-7931 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Tathagata Das >Assignee: Apache Spark >Priority: Critical > > Attempts to restart the socket receiver when it is supposed to be stopped > causes undesirable error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7931) Do not restart a socket receiver when the receiver is being shutdown
[ https://issues.apache.org/jira/browse/SPARK-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564034#comment-14564034 ] Apache Spark commented on SPARK-7931: - User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/6483 > Do not restart a socket receiver when the receiver is being shutdown > > > Key: SPARK-7931 > URL: https://issues.apache.org/jira/browse/SPARK-7931 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Critical > > Attempts to restart the socket receiver when it is supposed to be stopped > causes undesirable error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7931) Do not restart a socket receiver when the receiver is being shutdown
[ https://issues.apache.org/jira/browse/SPARK-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7931: --- Assignee: Tathagata Das (was: Apache Spark) > Do not restart a socket receiver when the receiver is being shutdown > > > Key: SPARK-7931 > URL: https://issues.apache.org/jira/browse/SPARK-7931 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Critical > > Attempts to restart the socket receiver when it is supposed to be stopped > causes undesirable error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-7930: - Priority: Blocker (was: Major) > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Blocker > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7931) Do not restart a socket receiver when the receiver is being shutdown
Tathagata Das created SPARK-7931: Summary: Do not restart a socket receiver when the receiver is being shutdown Key: SPARK-7931 URL: https://issues.apache.org/jira/browse/SPARK-7931 Project: Spark Issue Type: Bug Components: Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical Attempts to restart the socket receiver when it is supposed to be stopped causes undesirable error messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564025#comment-14564025 ] Apache Spark commented on SPARK-7930: - User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/6482 > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7930: --- Assignee: Tathagata Das (was: Apache Spark) > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
[ https://issues.apache.org/jira/browse/SPARK-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7930: --- Assignee: Apache Spark (was: Tathagata Das) > Shutdown hook deletes rool local dir before SparkContext is stopped, throwing > errors > > > Key: SPARK-7930 > URL: https://issues.apache.org/jira/browse/SPARK-7930 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > Shutdown hook for temp directories had priority 100 while SparkContext was > 50. So the local root directory was deleted before SparkContext was shutdown. > This leads to scary errors on running jobs, at the time of shutdown. This is > especially a problem when running streaming examples, where Ctrl-C is the > only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7930) Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors
Tathagata Das created SPARK-7930: Summary: Shutdown hook deletes rool local dir before SparkContext is stopped, throwing errors Key: SPARK-7930 URL: https://issues.apache.org/jira/browse/SPARK-7930 Project: Spark Issue Type: Bug Components: Spark Core, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Shutdown hook for temp directories had priority 100 while SparkContext was 50. So the local root directory was deleted before SparkContext was shutdown. This leads to scary errors on running jobs, at the time of shutdown. This is especially a problem when running streaming examples, where Ctrl-C is the only way to shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563999#comment-14563999 ] Apache Spark commented on SPARK-7927: - User 'mengxr' has created a pull request for this issue: https://github.com/apache/spark/pull/6481 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7577) User guide update for Bucketizer
[ https://issues.apache.org/jira/browse/SPARK-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-7577. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6451 [https://github.com/apache/spark/pull/6451] > User guide update for Bucketizer > > > Key: SPARK-7577 > URL: https://issues.apache.org/jira/browse/SPARK-7577 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Joseph K. Bradley >Assignee: Xusen Yin > Fix For: 1.4.0 > > > Copied from [SPARK-7443]: > {quote} > Now that we have algorithms in spark.ml which are not in spark.mllib, we > should start making subsections for the spark.ml API as needed. We can follow > the structure of the spark.mllib user guide. > * The spark.ml user guide can provide: (a) code examples and (b) info on > algorithms which do not exist in spark.mllib. > * We should not duplicate info in the spark.ml guides. Since spark.mllib is > still the primary API, we should provide links to the corresponding > algorithms in the spark.mllib user guide for more info. > {quote} > Note: I created a new subsection for links to spark.ml-specific guides in > this JIRA's PR: [SPARK-7557]. This transformer can go within the new > subsection. I'll try to get that PR merged ASAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7929) Remove Bagel examples
[ https://issues.apache.org/jira/browse/SPARK-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7929: --- Assignee: Apache Spark (was: Reynold Xin) > Remove Bagel examples > - > > Key: SPARK-7929 > URL: https://issues.apache.org/jira/browse/SPARK-7929 > Project: Spark > Issue Type: Task > Components: Examples, GraphX >Reporter: Reynold Xin >Assignee: Apache Spark > > Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7929) Remove Bagel examples
[ https://issues.apache.org/jira/browse/SPARK-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7929: --- Assignee: Reynold Xin (was: Apache Spark) > Remove Bagel examples > - > > Key: SPARK-7929 > URL: https://issues.apache.org/jira/browse/SPARK-7929 > Project: Spark > Issue Type: Task > Components: Examples, GraphX >Reporter: Reynold Xin >Assignee: Reynold Xin > > Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7929) Remove Bagel examples
[ https://issues.apache.org/jira/browse/SPARK-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563989#comment-14563989 ] Apache Spark commented on SPARK-7929: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6480 > Remove Bagel examples > - > > Key: SPARK-7929 > URL: https://issues.apache.org/jira/browse/SPARK-7929 > Project: Spark > Issue Type: Task > Components: Examples, GraphX >Reporter: Reynold Xin >Assignee: Reynold Xin > > Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7929) Remove Bagel examples
Reynold Xin created SPARK-7929: -- Summary: Remove Bagel examples Key: SPARK-7929 URL: https://issues.apache.org/jira/browse/SPARK-7929 Project: Spark Issue Type: Task Components: Examples, GraphX Reporter: Reynold Xin Assignee: Reynold Xin Bagel has been deprecated for a while. We should remove the example code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7038) [Streaming] Spark Sink requires spark assembly in classpath
[ https://issues.apache.org/jira/browse/SPARK-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563985#comment-14563985 ] Hari Shreedharan commented on SPARK-7038: - [~vanzin] - Does adding the shade plugin to the pom for the sink fix this issue? > [Streaming] Spark Sink requires spark assembly in classpath > --- > > Key: SPARK-7038 > URL: https://issues.apache.org/jira/browse/SPARK-7038 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1 >Reporter: Hari Shreedharan > > In 1.3.0 Spark, we shaded Guava, which meant that the Spark Sink's guava > dependency is not standard guava anymore - thus the one from Flume's > classpath does not work and can throw a NoClassDefFoundError while using > Spark Sink. > We must pull in the guava dependency into the Spark Sink jar. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563978#comment-14563978 ] Yin Huai commented on SPARK-7819: - btw, I fix I did is https://github.com/apache/spark/commit/572b62cafe4bc7b1d464c9dcfb449c9d53456826. > Isolated Hive Client Loader appears to cause Native Library > libMapRClient.4.0.2-mapr.so already loaded in another classloader error > --- > > Key: SPARK-7819 > URL: https://issues.apache.org/jira/browse/SPARK-7819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Fi >Priority: Critical > Attachments: stacktrace.txt, test.py > > > In reference to the pull request: https://github.com/apache/spark/pull/5876 > I have been running the Spark 1.3 branch for some time with no major hiccups, > and recently switched to the Spark 1.4 branch. > I build my spark distribution with the following build command: > {noformat} > make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive > -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver > {noformat} > When running a python script containing a series of smoke tests I use to > validate the build, I encountered an error under the following conditions: > * start a spark context > * start a hive context > * run any hive query > * stop the spark context > * start a second spark context > * run any hive query > ** ERROR > From what I can tell, the Isolated Class Loader is hitting a MapR class that > is loading its native library (presumedly as part of a static initializer). > Unfortunately, the JVM prohibits this the second time around. > I would think that shutting down the SparkContext would clear out any > vestigials of the JVM, so I'm surprised that this would even be a problem. > Note: all other smoke tests we are running passes fine. > I will attach the stacktrace and a python script reproducing the issue (at > least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-7819: Target Version/s: 1.4.1, 1.5.0 (was: 1.4.1) > Isolated Hive Client Loader appears to cause Native Library > libMapRClient.4.0.2-mapr.so already loaded in another classloader error > --- > > Key: SPARK-7819 > URL: https://issues.apache.org/jira/browse/SPARK-7819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Fi >Priority: Critical > Attachments: stacktrace.txt, test.py > > > In reference to the pull request: https://github.com/apache/spark/pull/5876 > I have been running the Spark 1.3 branch for some time with no major hiccups, > and recently switched to the Spark 1.4 branch. > I build my spark distribution with the following build command: > {noformat} > make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive > -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver > {noformat} > When running a python script containing a series of smoke tests I use to > validate the build, I encountered an error under the following conditions: > * start a spark context > * start a hive context > * run any hive query > * stop the spark context > * start a second spark context > * run any hive query > ** ERROR > From what I can tell, the Isolated Class Loader is hitting a MapR class that > is loading its native library (presumedly as part of a static initializer). > Unfortunately, the JVM prohibits this the second time around. > I would think that shutting down the SparkContext would clear out any > vestigials of the JVM, so I'm surprised that this would even be a problem. > Note: all other smoke tests we are running passes fine. > I will attach the stacktrace and a python script reproducing the issue (at > least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7837) NPE when save as parquet in speculative tasks
[ https://issues.apache.org/jira/browse/SPARK-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563977#comment-14563977 ] Yin Huai commented on SPARK-7837: - We have made the parquet reader side robust to files left in _temporary. So, this problem should have a much smaller impact. I am re-targeting it to 1.5. Will keep an eye on it and investigate the root cause. > NPE when save as parquet in speculative tasks > - > > Key: SPARK-7837 > URL: https://issues.apache.org/jira/browse/SPARK-7837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Yin Huai >Priority: Critical > > The query is like {{df.orderBy(...).saveAsTable(...)}}. > When there is no partitioning columns and there is a skewed key, I found the > following exception in speculative tasks. After these failures, seems we > could not call {{SparkHadoopMapRedUtil.commitTask}} correctly. > {code} > java.lang.NullPointerException > at > parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) > at > parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) > at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) > at > org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:115) > at > org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:385) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:150) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:122) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:122) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-7819: Target Version/s: 1.4.1 (was: 1.4.0) > Isolated Hive Client Loader appears to cause Native Library > libMapRClient.4.0.2-mapr.so already loaded in another classloader error > --- > > Key: SPARK-7819 > URL: https://issues.apache.org/jira/browse/SPARK-7819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Fi >Priority: Critical > Attachments: stacktrace.txt, test.py > > > In reference to the pull request: https://github.com/apache/spark/pull/5876 > I have been running the Spark 1.3 branch for some time with no major hiccups, > and recently switched to the Spark 1.4 branch. > I build my spark distribution with the following build command: > {noformat} > make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive > -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver > {noformat} > When running a python script containing a series of smoke tests I use to > validate the build, I encountered an error under the following conditions: > * start a spark context > * start a hive context > * run any hive query > * stop the spark context > * start a second spark context > * run any hive query > ** ERROR > From what I can tell, the Isolated Class Loader is hitting a MapR class that > is loading its native library (presumedly as part of a static initializer). > Unfortunately, the JVM prohibits this the second time around. > I would think that shutting down the SparkContext would clear out any > vestigials of the JVM, so I'm surprised that this would even be a problem. > Note: all other smoke tests we are running passes fine. > I will attach the stacktrace and a python script reproducing the issue (at > least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7837) NPE when save as parquet in speculative tasks
[ https://issues.apache.org/jira/browse/SPARK-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-7837: Target Version/s: 1.5.0 (was: 1.4.0) > NPE when save as parquet in speculative tasks > - > > Key: SPARK-7837 > URL: https://issues.apache.org/jira/browse/SPARK-7837 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Yin Huai >Priority: Critical > > The query is like {{df.orderBy(...).saveAsTable(...)}}. > When there is no partitioning columns and there is a skewed key, I found the > following exception in speculative tasks. After these failures, seems we > could not call {{SparkHadoopMapRedUtil.commitTask}} correctly. > {code} > java.lang.NullPointerException > at > parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) > at > parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) > at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) > at > org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:115) > at > org.apache.spark.sql.sources.DefaultWriterContainer.abortTask(commands.scala:385) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:150) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:122) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:122) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7819) Isolated Hive Client Loader appears to cause Native Library libMapRClient.4.0.2-mapr.so already loaded in another classloader error
[ https://issues.apache.org/jira/browse/SPARK-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563973#comment-14563973 ] Yin Huai commented on SPARK-7819: - [~coderfi] I just checked in a bug fix related to the class loader and the spark sql conf set in spark conf (e.g. spark-default). Can you try to the latest 1.4 branch and put the following entry in the {{conf/spark-defaults.conf}} {{spark.sql.hive.metastore.sharedPrefixes com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni}} Basically, it contains a few packages for JDBC drivers and a few mapr packages. We will try to figure out a way to let JNI libs work with our two classloaders. > Isolated Hive Client Loader appears to cause Native Library > libMapRClient.4.0.2-mapr.so already loaded in another classloader error > --- > > Key: SPARK-7819 > URL: https://issues.apache.org/jira/browse/SPARK-7819 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Fi >Priority: Critical > Attachments: stacktrace.txt, test.py > > > In reference to the pull request: https://github.com/apache/spark/pull/5876 > I have been running the Spark 1.3 branch for some time with no major hiccups, > and recently switched to the Spark 1.4 branch. > I build my spark distribution with the following build command: > {noformat} > make-distribution.sh --tgz --skip-java-test --with-tachyon -Phive > -Phive-0.13.1 -Pmapr4 -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver > {noformat} > When running a python script containing a series of smoke tests I use to > validate the build, I encountered an error under the following conditions: > * start a spark context > * start a hive context > * run any hive query > * stop the spark context > * start a second spark context > * run any hive query > ** ERROR > From what I can tell, the Isolated Class Loader is hitting a MapR class that > is loading its native library (presumedly as part of a static initializer). > Unfortunately, the JVM prohibits this the second time around. > I would think that shutting down the SparkContext would clear out any > vestigials of the JVM, so I'm surprised that this would even be a problem. > Note: all other smoke tests we are running passes fine. > I will attach the stacktrace and a python script reproducing the issue (at > least for my environment and build). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7853) ClassNotFoundException for SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-7853. - Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6459 [https://github.com/apache/spark/pull/6459] > ClassNotFoundException for SparkSQL > --- > > Key: SPARK-7853 > URL: https://issues.apache.org/jira/browse/SPARK-7853 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Hao >Assignee: Yin Huai >Priority: Blocker > Fix For: 1.4.0 > > > Reproduce steps: > {code} > bin/spark-sql --jars > ./sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar > CREATE TABLE t1(a string, b string) ROW FORMAT SERDE > 'org.apache.hive.hcatalog.data.JsonSerDe'; > {code} > Throws Exception like: > {noformat} > 15/05/26 00:16:33 ERROR SparkSQLDriver: Failed in [CREATE TABLE t1(a string, > b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'] > org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution > Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot > validate serde: org.apache.hive.hcatalog.data.JsonSerDe > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139) > at > org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310) > at > org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457) > at > org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:147) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:131) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727) > at > org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:283) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:218) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7198) VectorAssembler should carry ML metadata
[ https://issues.apache.org/jira/browse/SPARK-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-7198. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6452 [https://github.com/apache/spark/pull/6452] > VectorAssembler should carry ML metadata > > > Key: SPARK-7198 > URL: https://issues.apache.org/jira/browse/SPARK-7198 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Fix For: 1.4.0 > > > Now it only outputs assembled vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7910) Expose partitioner information in JavaRDD
[ https://issues.apache.org/jira/browse/SPARK-7910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7910: --- Assignee: (was: Apache Spark) > Expose partitioner information in JavaRDD > - > > Key: SPARK-7910 > URL: https://issues.apache.org/jira/browse/SPARK-7910 > Project: Spark > Issue Type: Improvement > Components: Java API >Reporter: holdenk >Priority: Minor > > It would be useful to expose the partitioner info in the Java & Python APIs > for RDDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7910) Expose partitioner information in JavaRDD
[ https://issues.apache.org/jira/browse/SPARK-7910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7910: --- Assignee: Apache Spark > Expose partitioner information in JavaRDD > - > > Key: SPARK-7910 > URL: https://issues.apache.org/jira/browse/SPARK-7910 > Project: Spark > Issue Type: Improvement > Components: Java API >Reporter: holdenk >Assignee: Apache Spark >Priority: Minor > > It would be useful to expose the partitioner info in the Java & Python APIs > for RDDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7890) Document that Spark 2.11 now supports Kafka
[ https://issues.apache.org/jira/browse/SPARK-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7890: --- Assignee: Iulian Dragos (was: Apache Spark) > Document that Spark 2.11 now supports Kafka > --- > > Key: SPARK-7890 > URL: https://issues.apache.org/jira/browse/SPARK-7890 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Iulian Dragos >Priority: Critical > > The building-spark.html page needs to be updated. It's a simple fix, just > remove the caveat about Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7890) Document that Spark 2.11 now supports Kafka
[ https://issues.apache.org/jira/browse/SPARK-7890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7890: --- Assignee: Apache Spark (was: Iulian Dragos) > Document that Spark 2.11 now supports Kafka > --- > > Key: SPARK-7890 > URL: https://issues.apache.org/jira/browse/SPARK-7890 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Patrick Wendell >Assignee: Apache Spark >Priority: Critical > > The building-spark.html page needs to be updated. It's a simple fix, just > remove the caveat about Kafka. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563831#comment-14563831 ] Josh Rosen commented on SPARK-7708: --- I think that there might be some state inside of the Kryo {{Output}} which isn't being reset properly after {{clear()}}. I'm investigating now. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7928) Yarn App Master Logs are not displayed in the spark historyserver UI
[ https://issues.apache.org/jira/browse/SPARK-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Shreedharan resolved SPARK-7928. - Resolution: Duplicate This issue was fixed by https://github.com/apache/spark/pull/6166 Can you try that patch and see if it works for you? > Yarn App Master Logs are not displayed in the spark historyserver UI > > > Key: SPARK-7928 > URL: https://issues.apache.org/jira/browse/SPARK-7928 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.2.1, 1.3.1 > Environment: yarn hadoop 2.7.0 >Reporter: Aditya Rao > > in hadoop 2.7.0 the link to App Master Log has been disabled as the YARN Job > History Server would show the app master logs in the Resource Manager UI, > but Spark Historyserver doesn't show the app master logs. > So anyone who is running a spark job in yarn-cluster mode has no way to know > the results other than checking the userlogs manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Fregly updated SPARK-6101: Description: similar to https://github.com/databricks/spark-avro and https://github.com/databricks/spark-csv (was: similar to https://github.com/databricks/spark-avro) > Create a SparkSQL DataSource API implementation for DynamoDB > > > Key: SPARK-6101 > URL: https://issues.apache.org/jira/browse/SPARK-6101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.2.0 >Reporter: Chris Fregly >Assignee: Chris Fregly > Fix For: 1.5.0 > > > similar to https://github.com/databricks/spark-avro and > https://github.com/databricks/spark-csv -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563820#comment-14563820 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6478 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7928) Yarn App Master Logs are not displayed in the spark historyserver UI
[ https://issues.apache.org/jira/browse/SPARK-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Rao updated SPARK-7928: -- Description: in hadoop 2.7.0 the link to App Master Log has been disabled as the YARN Job History Server would show the app master logs in the Resource Manager UI, but Spark Historyserver doesn't show the app master logs. So anyone who is running a spark job in yarn-cluster mode has no way to know the results other than checking the userlogs manually. was: in hadoop 2.7.0 the link to App Master Log has been disabled as the YARN Job History Server would show the app master logs, but Spark Historyserver doesn't show the app master logs. So anyone who is running a spark job in yarn-cluster mode has no way to know the results other than checking the userlogs manually. > Yarn App Master Logs are not displayed in the spark historyserver UI > > > Key: SPARK-7928 > URL: https://issues.apache.org/jira/browse/SPARK-7928 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.2.1, 1.3.1 > Environment: yarn hadoop 2.7.0 >Reporter: Aditya Rao > > in hadoop 2.7.0 the link to App Master Log has been disabled as the YARN Job > History Server would show the app master logs in the Resource Manager UI, > but Spark Historyserver doesn't show the app master logs. > So anyone who is running a spark job in yarn-cluster mode has no way to know > the results other than checking the userlogs manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7928) Yarn App Master Logs are not displayed in the spark historyserver UI
Aditya Rao created SPARK-7928: - Summary: Yarn App Master Logs are not displayed in the spark historyserver UI Key: SPARK-7928 URL: https://issues.apache.org/jira/browse/SPARK-7928 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.1, 1.2.1 Environment: yarn hadoop 2.7.0 Reporter: Aditya Rao in hadoop 2.7.0 the link to App Master Log has been disabled as the YARN Job History Server would show the app master logs, but Spark Historyserver doesn't show the app master logs. So anyone who is running a spark job in yarn-cluster mode has no way to know the results other than checking the userlogs manually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563812#comment-14563812 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6477 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7921) Change includeFirst to dropLast in OneHotEncoder
[ https://issues.apache.org/jira/browse/SPARK-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7921: --- Assignee: Xiangrui Meng (was: Apache Spark) > Change includeFirst to dropLast in OneHotEncoder > > > Key: SPARK-7921 > URL: https://issues.apache.org/jira/browse/SPARK-7921 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > Change includeFirst to dropLast and leave the default to true. There are > couple benefits: > a. consistent with other tutorials of one-hot encoding (or dummy coding) > (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm) > b. keep the indices unmodified in the output vector. If we drop the first, > all indices will be shifted by 1. > c. If users use StringIndex, the last element is the least frequent one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-7921) Change includeFirst to dropLast in OneHotEncoder
[ https://issues.apache.org/jira/browse/SPARK-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-7921: --- Assignee: Apache Spark (was: Xiangrui Meng) > Change includeFirst to dropLast in OneHotEncoder > > > Key: SPARK-7921 > URL: https://issues.apache.org/jira/browse/SPARK-7921 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Assignee: Apache Spark > > Change includeFirst to dropLast and leave the default to true. There are > couple benefits: > a. consistent with other tutorials of one-hot encoding (or dummy coding) > (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm) > b. keep the indices unmodified in the output vector. If we drop the first, > all indices will be shifted by 1. > c. If users use StringIndex, the last element is the least frequent one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7925) Address inconsistencies in capturing appName in different Metrics Sources
[ https://issues.apache.org/jira/browse/SPARK-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563808#comment-14563808 ] Tathagata Das commented on SPARK-7925: -- [~jerryshao] I remember you implemented some of these sources for streaming, do you know why the naming has this inconsistency between core and streaming? > Address inconsistencies in capturing appName in different Metrics Sources > - > > Key: SPARK-7925 > URL: https://issues.apache.org/jira/browse/SPARK-7925 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1 >Reporter: Bharat Venkat > > StreamingSource and ApplicationSource captures the appName, however the rest > of the sources (DAGSchedulerSource, ExecutorSource etc.) do not. Capturing > the appName allows us to automate monitoring the metrics for an application. > It would be good if appName is consistent captured across all spark metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563801#comment-14563801 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6476 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563787#comment-14563787 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6475 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7708) Incorrect task serialization with Kryo closure serializer
[ https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563786#comment-14563786 ] Akshat Aranya commented on SPARK-7708: -- [~joshrosen] I tried the test once again with your new code merged in, and it seems like it's not a problem with resetting the Kryo object. In my test, I serialize same object twice with the same KryoSerializerInstance, but I end up with two different serialized buffers: {noformat} serialized.limit=369 serialized.limit=366 {noformat} Clearly, there is some state inside the serializer that isn't reset even after calling {{reset()}}. > Incorrect task serialization with Kryo closure serializer > - > > Key: SPARK-7708 > URL: https://issues.apache.org/jira/browse/SPARK-7708 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.2 >Reporter: Akshat Aranya > > I've been investigating the use of Kryo for closure serialization with Spark > 1.2, and it seems like I've hit upon a bug: > When a task is serialized before scheduling, the following log message is > generated: > [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, > , PROCESS_LOCAL, 302 bytes) > This message comes from TaskSetManager which serializes the task using the > closure serializer. Before the message is sent out, the TaskDescription > (which included the original task as a byte array), is serialized again into > a byte array with the closure serializer. I added a log message for this in > CoarseGrainedSchedulerBackend, which produces the following output: > [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132 > The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ > than serialized task that it contains (302 bytes). This implies that > TaskDescription.buffer is not getting serialized correctly. > On the executor side, the deserialization produces a null value for > TaskDescription.buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7927) Enforce whitespace for more tokens in style checker
[ https://issues.apache.org/jira/browse/SPARK-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563780#comment-14563780 ] Apache Spark commented on SPARK-7927: - User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/6474 > Enforce whitespace for more tokens in style checker > --- > > Key: SPARK-7927 > URL: https://issues.apache.org/jira/browse/SPARK-7927 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Reynold Xin >Assignee: Reynold Xin > > Enforce whitespace on comma, colon, if, while, etc ... so we don't need to > keep spending time on this in code reviews. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org