[jira] [Commented] (SPARK-12452) Add exception details to TaskCompletionListener/TaskContext
[ https://issues.apache.org/jira/browse/SPARK-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451171#comment-15451171 ] Jagadeesan A S commented on SPARK-12452: [~srowen] any suggestion from your side..? can i take it up..? > Add exception details to TaskCompletionListener/TaskContext > --- > > Key: SPARK-12452 > URL: https://issues.apache.org/jira/browse/SPARK-12452 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Neelesh Shastry >Priority: Minor > > TaskCompletionListeners are called without success/failure details. > If we change this > {code} > trait TaskCompletionListener extends EventListener { > def onTaskCompletion(context: TaskContext) > } > class TaskContextImpl { > > private[spark] def markTaskCompleted(throwable:Option[Throwable]): Unit > > listener.onTaskCompletion(this,throwable) > } > {code} > to something like > {code} > trait TaskCompletionListener extends EventListener { > def onTaskCompletion(context: TaskContext, throwable:Option[Throwable]=None) > } > {code} > .. and in Task.scala > {code} >val results = Try(runTask(context)) >var throwable:Option[Throwable] = None > try { > runTask(context) > > }catch{ > case t:Throwable => throwable=t > } > finally { > context.markTaskCompleted(throwable) > TaskContext.unset() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15985) Reduce runtime overhead of a program that reads an primitive array in Dataset
[ https://issues.apache.org/jira/browse/SPARK-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15985. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 13704 [https://github.com/apache/spark/pull/13704] > Reduce runtime overhead of a program that reads an primitive array in Dataset > - > > Key: SPARK-15985 > URL: https://issues.apache.org/jira/browse/SPARK-15985 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki > Fix For: 2.1.0 > > > When a program read an array in Dataset, the code generator create some copy > operations. If an array is for primitive type, there are some opportunities > for optimizations in generated code to reduce runtime overhead. > {code} > val ds = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDS() > ds.map(p => { > var s = 0.0 > for (i <- 0 to 2) { s += p(i) } > s >}).show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15985) Reduce runtime overhead of a program that reads an primitive array in Dataset
[ https://issues.apache.org/jira/browse/SPARK-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15985: Assignee: Kazuaki Ishizaki > Reduce runtime overhead of a program that reads an primitive array in Dataset > - > > Key: SPARK-15985 > URL: https://issues.apache.org/jira/browse/SPARK-15985 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki > Fix For: 2.1.0 > > > When a program read an array in Dataset, the code generator create some copy > operations. If an array is for primitive type, there are some opportunities > for optimizations in generated code to reduce runtime overhead. > {code} > val ds = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDS() > ds.map(p => { > var s = 0.0 > for (i <- 0 to 2) { s += p(i) } > s >}).show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451091#comment-15451091 ] Joseph K. Bradley commented on SPARK-5992: -- Awesome, thanks! I made some comments, but it looks like a good approach overall. > Locality Sensitive Hashing (LSH) for MLlib > -- > > Key: SPARK-5992 > URL: https://issues.apache.org/jira/browse/SPARK-5992 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Joseph K. Bradley > > Locality Sensitive Hashing (LSH) would be very useful for ML. It would be > great to discuss some possible algorithms here, choose an API, and make a PR > for an initial algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17325) Inconsistent Spillable threshold and AppendOnlyMap growing threshold may trigger out-of-memory errors
Lijie Xu created SPARK-17325: Summary: Inconsistent Spillable threshold and AppendOnlyMap growing threshold may trigger out-of-memory errors Key: SPARK-17325 URL: https://issues.apache.org/jira/browse/SPARK-17325 Project: Spark Issue Type: Bug Components: Shuffle, Spark Core Affects Versions: 2.0.0, 1.6.2 Reporter: Lijie Xu I am reading the shuffle source code and guessing that there may be a potential out-of-memory error in ExternalSorter. The problem is that the memory usage of AppendOnlyMap (i.e., PartitionedAppendOnlyMap in ExternalSorter) can greatly exceed its spillable threshold (i.e., `currentMemory` can be 2 times the size of `myMemoryThreshold` in `Spillable.maybeSpill()`). This means that the task's current execution memory usage (AppendOnlyMap) has greatly exceeded its defined execution memory limit ((1 - spark.memory.storageFraction) * 1 / #taskNum), which will lead to potential out-of-memory errors. Example: Current spillable threshold has become 250MB, while the AppendOnlyMap is 200MB. At this time, an incoming key/value record triggers AppendOnlyMap's size expansion (AppendOnlyMap is full). After expansion, the AppendOnlyMap may become 400MB (or slightly smaller), which is greatly larger than the spillable threshold and execution memory limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17324: Assignee: (was: Apache Spark) > Remove Direct Usage of HiveClient in InsertIntoHiveTable > > > Key: SPARK-17324 > URL: https://issues.apache.org/jira/browse/SPARK-17324 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li > > This is another step to get rid of HiveClient from `HiveSessionState`. All > the metastore interactions should be through `ExternalCatalog` interface. > However, the existing implementation of `InsertIntoHiveTable ` still requires > Hive clients. Thus, we can remove HiveClient by moving the metastore > interactions into `ExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17324: Assignee: Apache Spark > Remove Direct Usage of HiveClient in InsertIntoHiveTable > > > Key: SPARK-17324 > URL: https://issues.apache.org/jira/browse/SPARK-17324 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li >Assignee: Apache Spark > > This is another step to get rid of HiveClient from `HiveSessionState`. All > the metastore interactions should be through `ExternalCatalog` interface. > However, the existing implementation of `InsertIntoHiveTable ` still requires > Hive clients. Thus, we can remove HiveClient by moving the metastore > interactions into `ExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable
[ https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450913#comment-15450913 ] Apache Spark commented on SPARK-17324: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/14888 > Remove Direct Usage of HiveClient in InsertIntoHiveTable > > > Key: SPARK-17324 > URL: https://issues.apache.org/jira/browse/SPARK-17324 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li > > This is another step to get rid of HiveClient from `HiveSessionState`. All > the metastore interactions should be through `ExternalCatalog` interface. > However, the existing implementation of `InsertIntoHiveTable ` still requires > Hive clients. Thus, we can remove HiveClient by moving the metastore > interactions into `ExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable
Xiao Li created SPARK-17324: --- Summary: Remove Direct Usage of HiveClient in InsertIntoHiveTable Key: SPARK-17324 URL: https://issues.apache.org/jira/browse/SPARK-17324 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Xiao Li This is another step to get rid of HiveClient from `HiveSessionState`. All the metastore interactions should be through `ExternalCatalog` interface. However, the existing implementation of `InsertIntoHiveTable ` still requires Hive clients. Thus, we can remove HiveClient by moving the metastore interactions into `ExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
[ https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-17318. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.1.0 2.0.1 > Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class > defined in repl > > > Key: SPARK-17318 > URL: https://issues.apache.org/jira/browse/SPARK-17318 > Project: Spark > Issue Type: Test >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.1, 2.1.0 > > > There are a lot of failures recently: > http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?
[ https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450836#comment-15450836 ] Vladimir Feinberg commented on SPARK-15575: --- Some of the biggest issues with Breeze perf I've experienced is that a lot of operations you'd expect it to be fast for are not; and it's pretty syntax and heavy use of implicits makes it easy to accidentally use this. For instance: 1. Mixed dense/sparse operations frequently resort to a generic implementation in breeze that uses its Scala iterators. 2. Creation of vectors, under certain operations, will result in unnecessary boxing of doubles (and integers, for sparse vectors). 3. Slice vectors have no support for efficient operations. They are implemented in breeze in a way that makes them no better than Array[Double], which again makes us use Scala iterators whenever we want a fast, vectorized dot product, for instance. Usability is tough sometimes. Even though a Vector[Double] interface seems flexible, a lot of implementations require an explicit knowledge of the vector type (Sparse/dense), else breeze silently uses the slow Scala interface. Heavy use of implicits is also a problem here, since they're not implemented for all permutations of vector types. It's also easy to do, e.g. val `vec1 += vec2 * a * b`. This will create two intermediate vectors. I think the biggest issue is that `ml.linalg.Vector` is Breeze-backed. We should use our own linear algebra (we do have `BLAS`, though to support slicing this interface would have to be expanded) and move around `ArrayView[Double]` inside the vector instead. Breeze as a dependency, as mentioned below, is very useful still for optimization. I think we can keep it around for that, as long as it's only for that. > Remove breeze from dependencies? > > > Key: SPARK-15575 > URL: https://issues.apache.org/jira/browse/SPARK-15575 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > This JIRA is for discussing whether we should remove Breeze from the > dependencies of MLlib. The main issues with Breeze are Scala 2.12 support > and performance issues. > There are a few paths: > # Keep dependency. This could be OK, especially if the Scala version issues > are fixed within Breeze. > # Remove dependency > ## Implement our own linear algebra operators as needed > ## Design a way to build Spark using custom linalg libraries of the user's > choice. E.g., you could build MLlib using Breeze, or any other library > supporting the required operations. This might require significant work. > See [SPARK-6442] for related discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
[ https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450742#comment-15450742 ] Apache Spark commented on SPARK-17323: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14874 > ALTER VIEW AS should keep the previous table properties, comment, > create_time, etc. > --- > > Key: SPARK-17323 > URL: https://issues.apache.org/jira/browse/SPARK-17323 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
[ https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17323: Assignee: Wenchen Fan (was: Apache Spark) > ALTER VIEW AS should keep the previous table properties, comment, > create_time, etc. > --- > > Key: SPARK-17323 > URL: https://issues.apache.org/jira/browse/SPARK-17323 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
[ https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17323: Assignee: Apache Spark (was: Wenchen Fan) > ALTER VIEW AS should keep the previous table properties, comment, > create_time, etc. > --- > > Key: SPARK-17323 > URL: https://issues.apache.org/jira/browse/SPARK-17323 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
Wenchen Fan created SPARK-17323: --- Summary: ALTER VIEW AS should keep the previous table properties, comment, create_time, etc. Key: SPARK-17323 URL: https://issues.apache.org/jira/browse/SPARK-17323 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16283) Implement percentile_approx SQL function
[ https://issues.apache.org/jira/browse/SPARK-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450696#comment-15450696 ] Apache Spark commented on SPARK-16283: -- User 'clockfly' has created a pull request for this issue: https://github.com/apache/spark/pull/14868 > Implement percentile_approx SQL function > > > Key: SPARK-16283 > URL: https://issues.apache.org/jira/browse/SPARK-16283 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location
[ https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450632#comment-15450632 ] Shivaram Venkataraman commented on SPARK-14742: --- To follow up my guess was right - there is a ec2-scripts.html in https://github.com/apache/spark-website/tree/asf-site/site/docs/2.0.0-preview but the one in https://github.com/apache/spark-website/tree/asf-site/site/docs/2.0.0 is only a markdown file. I don't know if there is a simple way to generate just a single html file though instead of updating all of the docs. Also [~srowen] are we doing PRs for the website now ? > Redirect spark-ec2 doc to new location > -- > > Key: SPARK-14742 > URL: https://issues.apache.org/jira/browse/SPARK-14742 > Project: Spark > Issue Type: Documentation > Components: Documentation, EC2 >Reporter: Nicholas Chammas >Assignee: Sean Owen >Priority: Trivial > Fix For: 2.0.0 > > > See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453 > We need to redirect this page > http://spark.apache.org/docs/latest/ec2-scripts.html > to this page > https://github.com/amplab/spark-ec2#readme -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location
[ https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450626#comment-15450626 ] Nicholas Chammas commented on SPARK-14742: -- {quote} Otherwise the only way to get to this link is if you have it bookmarked. {quote} Or any page that has linked to it in the past. All those links are now broken. That's my main concern. > Redirect spark-ec2 doc to new location > -- > > Key: SPARK-14742 > URL: https://issues.apache.org/jira/browse/SPARK-14742 > Project: Spark > Issue Type: Documentation > Components: Documentation, EC2 >Reporter: Nicholas Chammas >Assignee: Sean Owen >Priority: Trivial > Fix For: 2.0.0 > > > See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453 > We need to redirect this page > http://spark.apache.org/docs/latest/ec2-scripts.html > to this page > https://github.com/amplab/spark-ec2#readme -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location
[ https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450619#comment-15450619 ] Shivaram Venkataraman commented on SPARK-14742: --- Couple of things I noticed - It seems to work fine with 2.0.0-preview (i.e. http://spark.apache.org/docs/2.0.0-preview/ec2-scripts.html) -- So I'm wondering if its just an issue of some files not copied correctly ? - Can we bring back 'Amazon EC2' in the drop down for deploying ? Otherwise the only way to get to this link is if you have it bookmarked. > Redirect spark-ec2 doc to new location > -- > > Key: SPARK-14742 > URL: https://issues.apache.org/jira/browse/SPARK-14742 > Project: Spark > Issue Type: Documentation > Components: Documentation, EC2 >Reporter: Nicholas Chammas >Assignee: Sean Owen >Priority: Trivial > Fix For: 2.0.0 > > > See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453 > We need to redirect this page > http://spark.apache.org/docs/latest/ec2-scripts.html > to this page > https://github.com/amplab/spark-ec2#readme -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14155) Hide UserDefinedType in Spark 2.0
[ https://issues.apache.org/jira/browse/SPARK-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450612#comment-15450612 ] Robert Conrad commented on SPARK-14155: --- [~r...@databricks.com] echoing Maciej above - is there any progress on UDTs for datasets or a jira ticket we can follow? > Hide UserDefinedType in Spark 2.0 > - > > Key: SPARK-14155 > URL: https://issues.apache.org/jira/browse/SPARK-14155 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > UserDefinedType is a developer API in Spark 1.x. > With very high probability we will create a new API for user-defined type > that also works well with column batches as well as encoders (datasets). In > Spark 2.0, let's make UserDefinedType private[spark] first. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location
[ https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450602#comment-15450602 ] Nicholas Chammas commented on SPARK-14742: -- http://spark.apache.org/docs/latest/ec2-scripts.html I am seeing that this URL is not redirecting to the new spark-ec2 location on GitHub. [~srowen] - Can you fix that? I can see that we have some kind of redirect setup, but I guess it's not working. https://github.com/apache/spark/blob/master/docs/ec2-scripts.md > Redirect spark-ec2 doc to new location > -- > > Key: SPARK-14742 > URL: https://issues.apache.org/jira/browse/SPARK-14742 > Project: Spark > Issue Type: Documentation > Components: Documentation, EC2 >Reporter: Nicholas Chammas >Assignee: Sean Owen >Priority: Trivial > Fix For: 2.0.0 > > > See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453 > We need to redirect this page > http://spark.apache.org/docs/latest/ec2-scripts.html > to this page > https://github.com/amplab/spark-ec2#readme -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17322) 'ANY n' clause for SQL queries to increase the ease of use of WHERE clause predicates
Suman Somasundar created SPARK-17322: Summary: 'ANY n' clause for SQL queries to increase the ease of use of WHERE clause predicates Key: SPARK-17322 URL: https://issues.apache.org/jira/browse/SPARK-17322 Project: Spark Issue Type: Improvement Components: SQL Reporter: Suman Somasundar Priority: Minor If the user is interested in getting the results that meet 'any n' criteria out of m where clause predicates (m > n), then the 'any n' clause greatly simplifies writing a SQL query. An example is given below: select symbol from stocks where (market_cap > 5.7b, analysts_recommend > 10, moving_avg > 49.2, pe_ratio >15.4) ANY 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17321: Assignee: (was: Apache Spark) > YARN shuffle service should use good disk from yarn.nodemanager.local-dirs > -- > > Key: SPARK-17321 > URL: https://issues.apache.org/jira/browse/SPARK-17321 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.2, 2.0.0 >Reporter: yunjiong zhao > > We run spark on yarn, after enabled spark dynamic allocation, we notice some > spark application failed randomly due to YarnShuffleService. > From log I found > {quote} > 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: > Error while initializing Netty pipeline > java.lang.NullPointerException > at > org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) > at > org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) > at > org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) > at > io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {quote} > Which caused by the first disk in yarn.nodemanager.local-dirs was broken. > If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost > hundred nodes which is unacceptable. > We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good > disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450424#comment-15450424 ] Apache Spark commented on SPARK-17321: -- User 'zhaoyunjiong' has created a pull request for this issue: https://github.com/apache/spark/pull/14887 > YARN shuffle service should use good disk from yarn.nodemanager.local-dirs > -- > > Key: SPARK-17321 > URL: https://issues.apache.org/jira/browse/SPARK-17321 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.2, 2.0.0 >Reporter: yunjiong zhao > > We run spark on yarn, after enabled spark dynamic allocation, we notice some > spark application failed randomly due to YarnShuffleService. > From log I found > {quote} > 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: > Error while initializing Netty pipeline > java.lang.NullPointerException > at > org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) > at > org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) > at > org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) > at > io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {quote} > Which caused by the first disk in yarn.nodemanager.local-dirs was broken. > If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost > hundred nodes which is unacceptable. > We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good > disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
[ https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17321: Assignee: Apache Spark > YARN shuffle service should use good disk from yarn.nodemanager.local-dirs > -- > > Key: SPARK-17321 > URL: https://issues.apache.org/jira/browse/SPARK-17321 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.2, 2.0.0 >Reporter: yunjiong zhao >Assignee: Apache Spark > > We run spark on yarn, after enabled spark dynamic allocation, we notice some > spark application failed randomly due to YarnShuffleService. > From log I found > {quote} > 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: > Error while initializing Netty pipeline > java.lang.NullPointerException > at > org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) > at > org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) > at > org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) > at > org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) > at > io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > {quote} > Which caused by the first disk in yarn.nodemanager.local-dirs was broken. > If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost > hundred nodes which is unacceptable. > We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good > disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
yunjiong zhao created SPARK-17321: - Summary: YARN shuffle service should use good disk from yarn.nodemanager.local-dirs Key: SPARK-17321 URL: https://issues.apache.org/jira/browse/SPARK-17321 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 2.0.0, 1.6.2 Reporter: yunjiong zhao We run spark on yarn, after enabled spark dynamic allocation, we notice some spark application failed randomly due to YarnShuffleService. >From log I found {quote} 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: Error while initializing Netty pipeline java.lang.NullPointerException at org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77) at org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159) at org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135) at org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123) at org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116) at io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119) at io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733) at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450) at io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378) at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) {quote} Which caused by the first disk in yarn.nodemanager.local-dirs was broken. If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost hundred nodes which is unacceptable. We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good disks if the first one is broken? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450270#comment-15450270 ] Apache Spark commented on SPARK-17243: -- User 'ajbozarth' has created a pull request for this issue: https://github.com/apache/spark/pull/14886 > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu >Assignee: Alex Bozarth > Fix For: 2.1.0 > > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-17243. --- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.1.0 > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu >Assignee: Alex Bozarth > Fix For: 2.1.0 > > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450194#comment-15450194 ] Ewan Leith commented on SPARK-17313: I think Apache Zeppelin and Spark Notebook both cover this requirement better than the Spark shell ever will? The installation requirements for either are fairly minimal and give you all sorts of additional benefits over the raw shell. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17320) Spark Mesos module not building on PRs
[ https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17320: Assignee: Apache Spark > Spark Mesos module not building on PRs > -- > > Key: SPARK-17320 > URL: https://issues.apache.org/jira/browse/SPARK-17320 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17320) Spark Mesos module not building on PRs
[ https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-17320: --- Fix Version/s: (was: 2.0.1) > Spark Mesos module not building on PRs > -- > > Key: SPARK-17320 > URL: https://issues.apache.org/jira/browse/SPARK-17320 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17320) Spark Mesos module not building on PRs
[ https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-17320: --- Affects Version/s: (was: 2.0.0) 2.1.0 > Spark Mesos module not building on PRs > -- > > Key: SPARK-17320 > URL: https://issues.apache.org/jira/browse/SPARK-17320 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17320) Spark Mesos module not building on PRs
[ https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450113#comment-15450113 ] Apache Spark commented on SPARK-17320: -- User 'mgummelt' has created a pull request for this issue: https://github.com/apache/spark/pull/14885 > Spark Mesos module not building on PRs > -- > > Key: SPARK-17320 > URL: https://issues.apache.org/jira/browse/SPARK-17320 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17320) Spark Mesos module not building on PRs
[ https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17320: Assignee: (was: Apache Spark) > Spark Mesos module not building on PRs > -- > > Key: SPARK-17320 > URL: https://issues.apache.org/jira/browse/SPARK-17320 > Project: Spark > Issue Type: Task > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17320) Spark Mesos module not building on PRs
Michael Gummelt created SPARK-17320: --- Summary: Spark Mesos module not building on PRs Key: SPARK-17320 URL: https://issues.apache.org/jira/browse/SPARK-17320 Project: Spark Issue Type: Task Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Fix For: 2.0.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
[ https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17318: Assignee: (was: Apache Spark) > Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class > defined in repl > > > Key: SPARK-17318 > URL: https://issues.apache.org/jira/browse/SPARK-17318 > Project: Spark > Issue Type: Test >Reporter: Shixiong Zhu > > There are a lot of failures recently: > http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
[ https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450101#comment-15450101 ] Apache Spark commented on SPARK-17318: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/14884 > Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class > defined in repl > > > Key: SPARK-17318 > URL: https://issues.apache.org/jira/browse/SPARK-17318 > Project: Spark > Issue Type: Test >Reporter: Shixiong Zhu > > There are a lot of failures recently: > http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
[ https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17318: Assignee: Apache Spark > Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class > defined in repl > > > Key: SPARK-17318 > URL: https://issues.apache.org/jira/browse/SPARK-17318 > Project: Spark > Issue Type: Test >Reporter: Shixiong Zhu >Assignee: Apache Spark > > There are a lot of failures recently: > http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450094#comment-15450094 ] Apache Spark commented on SPARK-17319: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/14883 > Move addJar from HiveSessionState to HiveExternalCatalog > > > Key: SPARK-17319 > URL: https://issues.apache.org/jira/browse/SPARK-17319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li > > This is another step to remove Hive client usage in `HiveSessionState`. > Different sessions are sharing the same class loader, and thus, > `metadataHive.addJar(path)` basically loads the JARs for all the sessions. > Thus, no impact is made if we move `addJar` from `HiveSessionState` to > `HiveExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17319: Assignee: (was: Apache Spark) > Move addJar from HiveSessionState to HiveExternalCatalog > > > Key: SPARK-17319 > URL: https://issues.apache.org/jira/browse/SPARK-17319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li > > This is another step to remove Hive client usage in `HiveSessionState`. > Different sessions are sharing the same class loader, and thus, > `metadataHive.addJar(path)` basically loads the JARs for all the sessions. > Thus, no impact is made if we move `addJar` from `HiveSessionState` to > `HiveExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog
[ https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17319: Assignee: Apache Spark > Move addJar from HiveSessionState to HiveExternalCatalog > > > Key: SPARK-17319 > URL: https://issues.apache.org/jira/browse/SPARK-17319 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li >Assignee: Apache Spark > > This is another step to remove Hive client usage in `HiveSessionState`. > Different sessions are sharing the same class loader, and thus, > `metadataHive.addJar(path)` basically loads the JARs for all the sessions. > Thus, no impact is made if we move `addJar` from `HiveSessionState` to > `HiveExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog
Xiao Li created SPARK-17319: --- Summary: Move addJar from HiveSessionState to HiveExternalCatalog Key: SPARK-17319 URL: https://issues.apache.org/jira/browse/SPARK-17319 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Xiao Li This is another step to remove Hive client usage in `HiveSessionState`. Different sessions are sharing the same class loader, and thus, `metadataHive.addJar(path)` basically loads the JARs for all the sessions. Thus, no impact is made if we move `addJar` from `HiveSessionState` to `HiveExternalCatalog`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
[ https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17318: - Description: There are a lot of failures recently: http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl > Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class > defined in repl > > > Key: SPARK-17318 > URL: https://issues.apache.org/jira/browse/SPARK-17318 > Project: Spark > Issue Type: Test >Reporter: Shixiong Zhu > > There are a lot of failures recently: > http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
Shixiong Zhu created SPARK-17318: Summary: Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl Key: SPARK-17318 URL: https://issues.apache.org/jira/browse/SPARK-17318 Project: Spark Issue Type: Test Reporter: Shixiong Zhu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16456) Reuse the uncorrelated scalar subqueries with the same logical plan in a query
[ https://issues.apache.org/jira/browse/SPARK-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-16456. --- Resolution: Duplicate > Reuse the uncorrelated scalar subqueries with the same logical plan in a query > -- > > Key: SPARK-16456 > URL: https://issues.apache.org/jira/browse/SPARK-16456 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Lianhui Wang > > In TPCDS Q14, the same physical plan of uncorrelated scalar subqueries from a > CTE could be executed multiple times, we should re-use the same result to > avoid the duplicated computing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16419) EnsureRequirements adds extra Sort to already sorted cached table
[ https://issues.apache.org/jira/browse/SPARK-16419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-16419. --- Resolution: Duplicate > EnsureRequirements adds extra Sort to already sorted cached table > - > > Key: SPARK-16419 > URL: https://issues.apache.org/jira/browse/SPARK-16419 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Mitesh >Priority: Minor > > EnsureRequirements compares the required and given sort ordering, but uses > Scala equals instead of a semantic equals, so column capitalization isn't > considered, and also fails for a cached table. This results in a > SortMergeJoin of a cached already-sorted table to add an extra sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
[ https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-17314. -- Resolution: Fixed Fix Version/s: 2.1.0 > Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl > > > Key: SPARK-17314 > URL: https://issues.apache.org/jira/browse/SPARK-17314 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.1.0 > > > Netty will use its fast ThreadLocal implementation when a thread is a > FastThreadLocalThread. This patch just switches to Netty's > DefaultThreadFactory to trigger it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17304) TaskSetManager.abortIfCompletelyBlacklisted is a perf. hotspot in scheduler benchmark
[ https://issues.apache.org/jira/browse/SPARK-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-17304. Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14871 [https://github.com/apache/spark/pull/14871] > TaskSetManager.abortIfCompletelyBlacklisted is a perf. hotspot in scheduler > benchmark > - > > Key: SPARK-17304 > URL: https://issues.apache.org/jira/browse/SPARK-17304 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Minor > Fix For: 2.1.0 > > > If you run > {code} > sc.parallelize(1 to 10, 10).map(identity).count() > {code} > then {{TaskSetManager.abortIfCompletelyBlacklisted()}} is the number-one > performance hotspot in the scheduler, accounting for over half of the time. > This method was introduced in SPARK-15865, so this is a performance > regression in 2.1.0-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17317) Add package vignette to SparkR
[ https://issues.apache.org/jira/browse/SPARK-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450013#comment-15450013 ] Junyang Qian commented on SPARK-17317: -- WIP > Add package vignette to SparkR > -- > > Key: SPARK-17317 > URL: https://issues.apache.org/jira/browse/SPARK-17317 > Project: Spark > Issue Type: Improvement >Reporter: Junyang Qian > > In publishing SparkR to CRAN, it would be nice to have a vignette as a user > guide that > * describes the big picture > * introduces the use of various methods > This is important for new users because they may not even know which method > to look up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17317) Add package vignette to SparkR
Junyang Qian created SPARK-17317: Summary: Add package vignette to SparkR Key: SPARK-17317 URL: https://issues.apache.org/jira/browse/SPARK-17317 Project: Spark Issue Type: Improvement Reporter: Junyang Qian In publishing SparkR to CRAN, it would be nice to have a vignette as a user guide that * describes the big picture * introduces the use of various methods This is important for new users because they may not even know which method to look up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved
[ https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17316: Assignee: Apache Spark > Don't block StandaloneSchedulerBackend.executorRemoved > -- > > Key: SPARK-17316 > URL: https://issues.apache.org/jira/browse/SPARK-17316 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark > > StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It > may cause some deadlock since it's called inside > StandaloneAppClient.ClientEndpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved
[ https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17316: Assignee: (was: Apache Spark) > Don't block StandaloneSchedulerBackend.executorRemoved > -- > > Key: SPARK-17316 > URL: https://issues.apache.org/jira/browse/SPARK-17316 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shixiong Zhu > > StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It > may cause some deadlock since it's called inside > StandaloneAppClient.ClientEndpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved
[ https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449985#comment-15449985 ] Apache Spark commented on SPARK-17316: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/14882 > Don't block StandaloneSchedulerBackend.executorRemoved > -- > > Key: SPARK-17316 > URL: https://issues.apache.org/jira/browse/SPARK-17316 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shixiong Zhu > > StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It > may cause some deadlock since it's called inside > StandaloneAppClient.ClientEndpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved
Shixiong Zhu created SPARK-17316: Summary: Don't block StandaloneSchedulerBackend.executorRemoved Key: SPARK-17316 URL: https://issues.apache.org/jira/browse/SPARK-17316 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.0 Reporter: Shixiong Zhu StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR
[ https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449889#comment-15449889 ] Apache Spark commented on SPARK-17315: -- User 'junyangq' has created a pull request for this issue: https://github.com/apache/spark/pull/14881 > Add Kolmogorov-Smirnov Test to SparkR > - > > Key: SPARK-17315 > URL: https://issues.apache.org/jira/browse/SPARK-17315 > Project: Spark > Issue Type: New Feature >Reporter: Junyang Qian > > Kolmogorov-Smirnov Test is a popular nonparametric test of equality of > distributions. There is implementation in MLlib. It will be nice if we can > expose that in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR
[ https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17315: Assignee: (was: Apache Spark) > Add Kolmogorov-Smirnov Test to SparkR > - > > Key: SPARK-17315 > URL: https://issues.apache.org/jira/browse/SPARK-17315 > Project: Spark > Issue Type: New Feature >Reporter: Junyang Qian > > Kolmogorov-Smirnov Test is a popular nonparametric test of equality of > distributions. There is implementation in MLlib. It will be nice if we can > expose that in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR
[ https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17315: Assignee: Apache Spark > Add Kolmogorov-Smirnov Test to SparkR > - > > Key: SPARK-17315 > URL: https://issues.apache.org/jira/browse/SPARK-17315 > Project: Spark > Issue Type: New Feature >Reporter: Junyang Qian >Assignee: Apache Spark > > Kolmogorov-Smirnov Test is a popular nonparametric test of equality of > distributions. There is implementation in MLlib. It will be nice if we can > expose that in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR
Junyang Qian created SPARK-17315: Summary: Add Kolmogorov-Smirnov Test to SparkR Key: SPARK-17315 URL: https://issues.apache.org/jira/browse/SPARK-17315 Project: Spark Issue Type: New Feature Reporter: Junyang Qian Kolmogorov-Smirnov Test is a popular nonparametric test of equality of distributions. There is implementation in MLlib. It will be nice if we can expose that in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model
[ https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449830#comment-15449830 ] Steve Loughran commented on SPARK-17307: I think this a subset of SPARK-7481, where I am doing the docs https://github.com/steveloughran/spark/blob/f39018eee40ef463ebfdfb0f6a7ba6384b46c459/docs/cloud-integration.md I haven't done the bit on authentication setup through; I'm planning to point to the [Hadoop docs there|https://hadoop.apache.org/docs/stable2/hadoop-aws/tools/hadoop-aws/index.html], because as well as the details on how to configure the latest hadoop s3x clients, it's got a troubleshooting section. Looking at the code, # It's dangerous to put AWS secrets in the source file —it's too easy to leak them. Stick them in your spark configuration file, prefixed with {{spark.hadoop}} # if you are using Hadoop 2.7+ as the Hadoop version, please use s3a:// paths instead of s3n://. Your life will be better. Anyway, can you have a look at the cloud integration doc I've linked to, comment on the [pull request|https://github.com/apache/spark/pull/12004] where it could be improvedI'll do my best > Document what all access is needed on S3 bucket when trying to save a model > --- > > Key: SPARK-17307 > URL: https://issues.apache.org/jira/browse/SPARK-17307 > Project: Spark > Issue Type: Documentation >Reporter: Aseem Bansal >Priority: Minor > > I faced this lack of documentation when I was trying to save a model to S3. > Initially I thought it should be only write. Then I found it also needs > delete to delete temporary files. Now I requested access for delete and tried > again and I am get the error > Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: > org.jets3t.service.S3ServiceException: S3 PUT failed for > '/dev-qa_%24folder%24' XML Error Message > To reproduce this error the below can be used > {code} > SparkSession sparkSession = SparkSession > .builder() > .appName("my app") > .master("local") > .getOrCreate(); > JavaSparkContext jsc = new > JavaSparkContext(sparkSession.sparkContext()); > jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ); > jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", ACCESS KEY>); > //Create a Pipelinemode > > pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest"); > {code} > This back and forth could be avoided if it was clearly mentioned what all > access spark needs to write to S3. Also would be great if why all of the > access is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449790#comment-15449790 ] Alex Bozarth commented on SPARK-17243: -- [~ste...@apache.org] [~tgraves] The issues you mentioned are what I'm hoping to work on next month (what I mentioned above) when I'm given the bandwidth to do so. When that comes I'll file a JIRA and loop you two in to discuss implementation ideas. (Unless some brave soul decides to give it a try before then) > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449684#comment-15449684 ] Sean Owen commented on SPARK-17313: --- The problem is: how are you going to interact with a shell on your local machine when the driver is somewhere else? It's not impossible but not clear it's worthwhile. The driver is in general not doing much work, or shouldn't be; the shell is more for exploration that production. > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17312) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17312. --- Resolution: Duplicate > Support spark-shell on cluster mode > --- > > Key: SPARK-17312 > URL: https://issues.apache.org/jira/browse/SPARK-17312 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-17312) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-17312: --- > Support spark-shell on cluster mode > --- > > Key: SPARK-17312 > URL: https://issues.apache.org/jira/browse/SPARK-17312 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
[ https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17314: Assignee: Shixiong Zhu (was: Apache Spark) > Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl > > > Key: SPARK-17314 > URL: https://issues.apache.org/jira/browse/SPARK-17314 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > Netty will use its fast ThreadLocal implementation when a thread is a > FastThreadLocalThread. This patch just switches to Netty's > DefaultThreadFactory to trigger it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
[ https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17314: Assignee: Apache Spark (was: Shixiong Zhu) > Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl > > > Key: SPARK-17314 > URL: https://issues.apache.org/jira/browse/SPARK-17314 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Minor > > Netty will use its fast ThreadLocal implementation when a thread is a > FastThreadLocalThread. This patch just switches to Netty's > DefaultThreadFactory to trigger it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
[ https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449670#comment-15449670 ] Apache Spark commented on SPARK-17314: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/14879 > Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl > > > Key: SPARK-17314 > URL: https://issues.apache.org/jira/browse/SPARK-17314 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > Netty will use its fast ThreadLocal implementation when a thread is a > FastThreadLocalThread. This patch just switches to Netty's > DefaultThreadFactory to trigger it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
Shixiong Zhu created SPARK-17314: Summary: Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl Key: SPARK-17314 URL: https://issues.apache.org/jira/browse/SPARK-17314 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor Netty will use its fast ThreadLocal implementation when a thread is a FastThreadLocalThread. This patch just switches to Netty's DefaultThreadFactory to trigger it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmoud Elgamal reopened SPARK-17313: - > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-17312) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmoud Elgamal closed SPARK-17312. --- Resolution: Fixed > Support spark-shell on cluster mode > --- > > Key: SPARK-17312 > URL: https://issues.apache.org/jira/browse/SPARK-17312 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17313) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-17313. Resolution: Duplicate > Support spark-shell on cluster mode > --- > > Key: SPARK-17313 > URL: https://issues.apache.org/jira/browse/SPARK-17313 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17312) Support spark-shell on cluster mode
[ https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahmoud Elgamal updated SPARK-17312: Issue Type: New Feature (was: Bug) > Support spark-shell on cluster mode > --- > > Key: SPARK-17312 > URL: https://issues.apache.org/jira/browse/SPARK-17312 > Project: Spark > Issue Type: New Feature >Reporter: Mahmoud Elgamal > > The main issue with the current spark shell is that the driver is running on > the user machine. If the driver resource requirement is beyond user machine > capacity, then spark shell will be useless. If we are to add the cluster > mode(Yarn or Mesos ) for spark shell via some sort of proxy where user > machine only hosts a rest client to the running driver at the cluster, the > shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17312) Support spark-shell on cluster mode
Mahmoud Elgamal created SPARK-17312: --- Summary: Support spark-shell on cluster mode Key: SPARK-17312 URL: https://issues.apache.org/jira/browse/SPARK-17312 Project: Spark Issue Type: Bug Reporter: Mahmoud Elgamal The main issue with the current spark shell is that the driver is running on the user machine. If the driver resource requirement is beyond user machine capacity, then spark shell will be useless. If we are to add the cluster mode(Yarn or Mesos ) for spark shell via some sort of proxy where user machine only hosts a rest client to the running driver at the cluster, the shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17313) Support spark-shell on cluster mode
Mahmoud Elgamal created SPARK-17313: --- Summary: Support spark-shell on cluster mode Key: SPARK-17313 URL: https://issues.apache.org/jira/browse/SPARK-17313 Project: Spark Issue Type: New Feature Reporter: Mahmoud Elgamal The main issue with the current spark shell is that the driver is running on the user machine. If the driver resource requirement is beyond user machine capacity, then spark shell will be useless. If we are to add the cluster mode(Yarn or Mesos ) for spark shell via some sort of proxy where user machine only hosts a rest client to the running driver at the cluster, the shell will be more powerful -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17306) Memory leak in QuantileSummaries
[ https://issues.apache.org/jira/browse/SPARK-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449512#comment-15449512 ] Tim Hunter commented on SPARK-17306: [~srowen] yes I had a discussion yesterday with [~clockfly]. The issue is performance, not correctness, by the way. The fix is to add a call to the compression: the compression threshold should be used in insert() after inserting the head buffer at this line: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala#L66 Unless someone else wants to step in, I will be happy to fix this issue. > Memory leak in QuantileSummaries > > > Key: SPARK-17306 > URL: https://issues.apache.org/jira/browse/SPARK-17306 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong > > compressThreshold was not referenced anywhere > {code} > class QuantileSummaries( > val compressThreshold: Int, > val relativeError: Double, > val sampled: ArrayBuffer[Stats] = ArrayBuffer.empty, > private[stat] var count: Long = 0L, > val headSampled: ArrayBuffer[Double] = ArrayBuffer.empty) extends > Serializable > {code} > And, it causes memory leak, QuantileSummaries takes unbounded memory > {code} > val summary = new QuantileSummaries(1, relativeError = 0.001) > // Results in creating an array of size 1 !!! > (1 to 1).foreach(summary.insert(_)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16402) JDBC source: Implement save API
[ https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449509#comment-15449509 ] Dragisa Krsmanovic commented on SPARK-16402: Any progress on this ? > JDBC source: Implement save API > --- > > Key: SPARK-16402 > URL: https://issues.apache.org/jira/browse/SPARK-16402 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > Currently, we are unable to call the `save` API of `DataFrameWriter` when the > source is JDBC. For example, > {noformat} > df.write > .format("jdbc") > .option("url", url1) > .option("dbtable", "TEST.TRUNCATETEST") > .option("user", "testUser") > .option("password", "testPass") > .save() > {noformat} > The error message users will get is like > {noformat} > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not > allow create table as select. > java.lang.RuntimeException: > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not > allow create table as select. > {noformat} > However, the `save` API is very common for all the data sources, like parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6460) Implement OpensslAesCtrCryptoCodec to enable encrypted shuffle algorithms which openssl provides
[ https://issues.apache.org/jira/browse/SPARK-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-6460. --- Resolution: Duplicate This was covered by SPARK-5682. > Implement OpensslAesCtrCryptoCodec to enable encrypted shuffle algorithms > which openssl provides > > > Key: SPARK-6460 > URL: https://issues.apache.org/jira/browse/SPARK-6460 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Reporter: liyunzhang_intel > > SPARK-5682 only implements the encrypted shuffle algorithm provided by JCE. > OpensslAesCtrCryptoCodec need implement algorithm provided by openssl. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10771) Implement the shuffle encryption with AES-CTR crypto using JCE key provider.
[ https://issues.apache.org/jira/browse/SPARK-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-10771. Resolution: Duplicate This was covered by SPARK-5682. > Implement the shuffle encryption with AES-CTR crypto using JCE key provider. > > > Key: SPARK-10771 > URL: https://issues.apache.org/jira/browse/SPARK-10771 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Reporter: Ferdinand Xu >Priority: Minor > > We will use the credentials stored in user group information to encrypt/ > decrypt shuffle data. We will use JCE key provider to implement AES-CTR > crypto. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17299) TRIM/LTRIM/RTRIM strips characters other than spaces
[ https://issues.apache.org/jira/browse/SPARK-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449425#comment-15449425 ] Dongjoon Hyun commented on SPARK-17299: --- Hi, [~jbeard] and [~srowen]. For the compatibility, it seems we had better fix this in SQL. Could you make a PR for this, [~jbeard]? > TRIM/LTRIM/RTRIM strips characters other than spaces > > > Key: SPARK-17299 > URL: https://issues.apache.org/jira/browse/SPARK-17299 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 2.0.0 >Reporter: Jeremy Beard >Priority: Minor > > TRIM/LTRIM/RTRIM docs state that they only strip spaces: > http://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html#trim(org.apache.spark.sql.Column) > But the implementation strips all characters of ASCII value 20 or less: > https://github.com/apache/spark/blob/v2.0.0/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L468-L470 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12333) Support shuffle spill encryption in Spark
[ https://issues.apache.org/jira/browse/SPARK-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-12333. Resolution: Fixed This should be covered by SPARK-5682, reopen if that's not correct. > Support shuffle spill encryption in Spark > - > > Key: SPARK-12333 > URL: https://issues.apache.org/jira/browse/SPARK-12333 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: Ferdinand Xu > > Like shuffle file encryption in SPARK-5682, spills data should also be > encrypted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-5682. --- Resolution: Fixed Assignee: Ferdinand Xu Fix Version/s: 2.1.0 > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel >Assignee: Ferdinand Xu > Fix For: 2.1.0 > > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7445) StringIndexer should handle binary labels properly
[ https://issues.apache.org/jira/browse/SPARK-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449390#comment-15449390 ] Sean Owen commented on SPARK-7445: -- I think that if you want a specific mapping from labels to integers then you can just transform it directly. It's easier than trying to make a new API for such a simple thing. > StringIndexer should handle binary labels properly > -- > > Key: SPARK-7445 > URL: https://issues.apache.org/jira/browse/SPARK-7445 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Priority: Minor > > StringIndexer orders labels by their counts. However, for binary labels, we > should really map negatives to 0 and positive to 1. So can put special rules > for binary labels: > 1. "+1"/"-1", "1"/"-1", "1"/"0" > 2. "yes"/"no" > 3. "true"/"false" > Another option is to allow users to provide a list or labels and we use the > ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7445) StringIndexer should handle binary labels properly
[ https://issues.apache.org/jira/browse/SPARK-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449353#comment-15449353 ] Ruben Janssen commented on SPARK-7445: -- [~mengxr] Could you please update on this? > StringIndexer should handle binary labels properly > -- > > Key: SPARK-7445 > URL: https://issues.apache.org/jira/browse/SPARK-7445 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng >Priority: Minor > > StringIndexer orders labels by their counts. However, for binary labels, we > should really map negatives to 0 and positive to 1. So can put special rules > for binary labels: > 1. "+1"/"-1", "1"/"-1", "1"/"0" > 2. "yes"/"no" > 3. "true"/"false" > Another option is to allow users to provide a list or labels and we use the > ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449250#comment-15449250 ] Thomas Graves commented on SPARK-17243: --- I agree, there are a ton of ways to improve the history server. I think these should be separate jiras though. Ideally it is much faster to load all the apps and get the initial list very quickly. Only load the entire application as a user requests or in the background to fill the cache. Like you mention could have summary file written after loaded. They could be stored differently so basic data is in dir or file path (like MapReduce history server), etc. I just haven't had time to do this myself. Right now this seems like a good workaround and as I mention in Pr spark.history.retainedApplications used to do this limiting in of the display but things have changed and I guess it broke/wasn't updated. > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
[ https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448732#comment-15448732 ] Steve Loughran commented on SPARK-17243: One thing to consider here whether there are any ways to improve incremental loading of histories; start at the most recent and work backwards. There's also the fact that the entire history is loaded just to get the final summary info (success/failure). Once parsed once, this could just be saved in a summary file alongside the original. That'd reduce load time from O(files * events) to O(files) > Spark 2.0 history server summary page gets stuck at "loading history summary" > with 10K+ application history > --- > > Key: SPARK-17243 > URL: https://issues.apache.org/jira/browse/SPARK-17243 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Gang Wu > > The summary page of Spark 2.0 history server web UI keep displaying "Loading > history summary..." all the time and crashes the browser when there are more > than 10K application history event logs on HDFS. > I did some investigation, "historypage.js" file sends a REST request to > /api/v1/applications endpoint of history server REST endpoint and gets back > json response. When there are more than 10K applications inside the event log > directory it takes forever to parse them and render the page. When there are > only hundreds or thousands of application history it is running fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases
[ https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17311: Assignee: Apache Spark (was: Sean Owen) > Standardize Python-Java MLlib API to accept optional long seeds in all cases > > > Key: SPARK-17311 > URL: https://issues.apache.org/jira/browse/SPARK-17311 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Minor > > (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 ) > There are a few seed-related issues in the Pyspark-MLLib bridge: > - {{PythonMLlibAPI}} methods that take a seed don't always take a > {{java.lang.Long}} consistently, allowing the Python API to specify "no seed" > - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it > picks its own random seed. Instead it should default to None, meaning, > letting the Scala implementation pick a seed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases
[ https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448700#comment-15448700 ] Apache Spark commented on SPARK-17311: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/14826 > Standardize Python-Java MLlib API to accept optional long seeds in all cases > > > Key: SPARK-17311 > URL: https://issues.apache.org/jira/browse/SPARK-17311 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 ) > There are a few seed-related issues in the Pyspark-MLLib bridge: > - {{PythonMLlibAPI}} methods that take a seed don't always take a > {{java.lang.Long}} consistently, allowing the Python API to specify "no seed" > - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it > picks its own random seed. Instead it should default to None, meaning, > letting the Scala implementation pick a seed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases
[ https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17311: Assignee: Sean Owen (was: Apache Spark) > Standardize Python-Java MLlib API to accept optional long seeds in all cases > > > Key: SPARK-17311 > URL: https://issues.apache.org/jira/browse/SPARK-17311 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 ) > There are a few seed-related issues in the Pyspark-MLLib bridge: > - {{PythonMLlibAPI}} methods that take a seed don't always take a > {{java.lang.Long}} consistently, allowing the Python API to specify "no seed" > - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it > picks its own random seed. Instead it should default to None, meaning, > letting the Scala implementation pick a seed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed
[ https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16832. --- Resolution: Won't Fix Per [~mengxr] I think this is WontFix, and the least confusing thing to do for my follow on change is make a new JIRA: https://issues.apache.org/jira/browse/SPARK-17311 > CrossValidator and TrainValidationSplit are not random without seed > --- > > Key: SPARK-16832 > URL: https://issues.apache.org/jira/browse/SPARK-16832 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Max Moroz >Priority: Minor > > Repeatedly running CrossValidator or TrainValidationSplit without an explicit > seed parameter does not change results. It is supposed to be seeded with a > random seed, but it seems to be instead seeded with some constant. (If seed > is explicitly provided, the two classes behave as expected.) > {code} > dataset = spark.createDataFrame( > [(Vectors.dense([0.0]), 0.0), >(Vectors.dense([0.4]), 1.0), >(Vectors.dense([0.5]), 0.0), >(Vectors.dense([0.6]), 1.0), >(Vectors.dense([1.0]), 1.0)] * 1000, > ["features", "label"]).cache() > paramGrid = pyspark.ml.tuning.ParamGridBuilder().build() > tvs = > pyspark.ml.tuning.TrainValidationSplit(estimator=pyspark.ml.regression.LinearRegression(), > >estimatorParamMaps=paramGrid, > > evaluator=pyspark.ml.evaluation.RegressionEvaluator(), >trainRatio=0.8) > model = tvs.fit(train) > print(model.validationMetrics) > for folds in (3, 5, 10): > cv = > pyspark.ml.tuning.CrossValidator(estimator=pyspark.ml.regression.LinearRegression(), > > estimatorParamMaps=paramGrid, > > evaluator=pyspark.ml.evaluation.RegressionEvaluator(), > numFolds=folds > ) > cvModel = cv.fit(dataset) > print(folds, cvModel.avgMetrics) > {code} > This code produces identical results upon repeated calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases
Sean Owen created SPARK-17311: - Summary: Standardize Python-Java MLlib API to accept optional long seeds in all cases Key: SPARK-17311 URL: https://issues.apache.org/jira/browse/SPARK-17311 Project: Spark Issue Type: Bug Components: MLlib, PySpark Affects Versions: 2.0.0 Reporter: Sean Owen Assignee: Sean Owen Priority: Minor (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 ) There are a few seed-related issues in the Pyspark-MLLib bridge: - {{PythonMLlibAPI}} methods that take a seed don't always take a {{java.lang.Long}} consistently, allowing the Python API to specify "no seed" - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it picks its own random seed. Instead it should default to None, meaning, letting the Scala implementation pick a seed -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448672#comment-15448672 ] Andrew Duffy commented on SPARK-17310: -- +1 to this, see comments on https://github.com/apache/spark/pull/14671, particularly rdblue's comment. We need to wait for next release of Parquet to be able to be able to set {{parquet.filter.record-level.enabled}} config > Disable Parquet's record-by-record filter in normal parquet reader and do it > in Spark-side > -- > > Key: SPARK-17310 > URL: https://issues.apache.org/jira/browse/SPARK-17310 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon > > Currently, we are pushing filters down for normal Parquet reader which also > filters record-by-record. > It seems Spark-side codegen row-by-row filtering might be faster than > Parquet's one in general due to type-boxing and virtual function calls which > Spark's one tries to avoid. > Maybe we should perform a benchmark and disable this. This ticket was from > https://github.com/apache/spark/pull/14671 > Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17308) Replace all pattern match on boolean value by if/else block.
[ https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17308: Assignee: (was: Apache Spark) > Replace all pattern match on boolean value by if/else block. > > > Key: SPARK-17308 > URL: https://issues.apache.org/jira/browse/SPARK-17308 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shivansh >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17308) Replace all pattern match on boolean value by if/else block.
[ https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17308: Assignee: Apache Spark > Replace all pattern match on boolean value by if/else block. > > > Key: SPARK-17308 > URL: https://issues.apache.org/jira/browse/SPARK-17308 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shivansh >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17308) Replace all pattern match on boolean value by if/else block.
[ https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448671#comment-15448671 ] Apache Spark commented on SPARK-17308: -- User 'shiv4nsh' has created a pull request for this issue: https://github.com/apache/spark/pull/14873 > Replace all pattern match on boolean value by if/else block. > > > Key: SPARK-17308 > URL: https://issues.apache.org/jira/browse/SPARK-17308 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Shivansh >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17264) DataStreamWriter should document that it only supports Parquet for now
[ https://issues.apache.org/jira/browse/SPARK-17264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17264. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull request 14860 [https://github.com/apache/spark/pull/14860] > DataStreamWriter should document that it only supports Parquet for now > -- > > Key: SPARK-17264 > URL: https://issues.apache.org/jira/browse/SPARK-17264 > Project: Spark > Issue Type: Bug > Components: Documentation, Input/Output >Affects Versions: 2.0.0 > Environment: Mac OSX >Reporter: Bill Reed >Assignee: Sean Owen >Priority: Trivial > Fix For: 2.0.1, 2.1.0 > > > The API documentations for DataStreamWriter.format states "Specifies the > underlying output data source. Built-in options include "parquet", "json", > etc." but when specifying "json" or "text" for the format. the following > exception is thrown: > Exception in thread "main" java.lang.UnsupportedOperationException: Data > source json does not support streamed writing > at > org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:273) > at > org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:291) > The only format that works is .format("parquet") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17276) Stop environment parameters flooding Jenkins build output
[ https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17276: -- Assignee: Xin Ren > Stop environment parameters flooding Jenkins build output > - > > Key: SPARK-17276 > URL: https://issues.apache.org/jira/browse/SPARK-17276 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests >Affects Versions: 2.0.0 >Reporter: Xin Ren >Assignee: Xin Ren >Priority: Minor > Fix For: 2.1.0 > > Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png > > > When I was trying to find error msg in a failed Jenkins build job, annoyed by > the huge env output. > The env parameter output should be muted. > {code} > [info] PipedRDDSuite: > [info] - basic pipe (51 milliseconds) > 0 0 0 > [info] - basic pipe with tokenization (60 milliseconds) > [info] - failure in iterating over pipe input (49 milliseconds) > [info] - advanced pipe (100 milliseconds) > [info] - pipe with empty partition (117 milliseconds) > PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin > BUILD_CAUSE_GHPRBCAUSE=true > SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl > -Phive-thriftserver > HUDSON_HOME=/var/lib/jenkins > AWS_SECRET_ACCESS_KEY= > JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ > HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752 > LINES=24 > CURRENT_BLOCK=18 > ANDROID_HOME=/home/android-sdk/ > ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2 > ghprbSourceBranch=codeWalkThroughML > GITHUB_OAUTH_KEY= > MAIL=/var/mail/jenkins > AMPLAB_JENKINS=1 > JENKINS_SERVER_COOKIE=472906e9832aeb79 > ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import > LOGNAME=jenkins > PWD=/home/jenkins/workspace/SparkPullRequestBuilder > JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2 > ROOT_BUILD_CAUSE_GHPRBCAUSE=true > ghprbActualCommitAuthorEmail=iamsh...@126.com > ghprbTargetBranch=master > BUILD_TAG=jenkins-SparkPullRequestBuilder-64504 > SHELL=/bin/bash > ROOT_BUILD_CAUSE=GHPRBCAUSE > SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 > -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2 > JENKINS_HOME=/var/lib/jenkins > sha1=origin/pr/14836/merge > ghprbPullDescription=GitHub pull request #14836 of commit > 70a751c6959048e65c083ab775b01523da4578a2 automatically merged. > NODE_NAME=amp-jenkins-worker-02 > BUILD_DISPLAY_NAME=#64504 > JAVA_7_HOME=/usr/java/jdk1.7.0_79 > GIT_BRANCH=codeWalkThroughML > SHLVL=3 > AMP_JENKINS_PRB=true > JAVA_HOME=/usr/java/jdk1.8.0_60 > JENKINS_MASTER_HOSTNAME=amp-jenkins-master > BUILD_ID=64504 > XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836 > JOB_NAME=SparkPullRequestBuilder > BUILD_CAUSE=GHPRBCAUSE > SPARK_SCALA_VERSION=2.11 > AWS_ACCESS_KEY_ID= > NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test > HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/ > SPARK_PREPEND_CLASSES=1 > COLUMNS=80 > WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder > SPARK_TESTING=1 > _=/usr/java/jdk1.8.0_60/bin/java > GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc > ghprbPullId=14836 > EXECUTOR_NUMBER=9 > SSH_CLIENT=192.168.10.10 44762 22 > HUDSON_SERVER_COOKIE=472906e9832aeb79 > cat: nonexistent_file: No such file or directory > cat: nonexistent_file: No such file or directory >
[jira] [Created] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
Hyukjin Kwon created SPARK-17310: Summary: Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side Key: SPARK-17310 URL: https://issues.apache.org/jira/browse/SPARK-17310 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Hyukjin Kwon Currently, we are pushing filters down for normal Parquet reader which also filters record-by-record. It seems Spark-side codegen row-by-row filtering might be faster than Parquet's one in general due to type-boxing and virtual function calls which Spark's one tries to avoid. Maybe we should perform a benchmark and disable this. This ticket was from https://github.com/apache/spark/pull/14671 Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17309) ALTER VIEW should throw exception if view not exist
[ https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17309: Assignee: Apache Spark (was: Wenchen Fan) > ALTER VIEW should throw exception if view not exist > --- > > Key: SPARK-17309 > URL: https://issues.apache.org/jira/browse/SPARK-17309 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17309) ALTER VIEW should throw exception if view not exist
[ https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17309: Assignee: Wenchen Fan (was: Apache Spark) > ALTER VIEW should throw exception if view not exist > --- > > Key: SPARK-17309 > URL: https://issues.apache.org/jira/browse/SPARK-17309 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17309) ALTER VIEW should throw exception if view not exist
[ https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448588#comment-15448588 ] Apache Spark commented on SPARK-17309: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14874 > ALTER VIEW should throw exception if view not exist > --- > > Key: SPARK-17309 > URL: https://issues.apache.org/jira/browse/SPARK-17309 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17180) Unable to Alter the Temporary View Using ALTER VIEW command
[ https://issues.apache.org/jira/browse/SPARK-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448587#comment-15448587 ] Apache Spark commented on SPARK-17180: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14874 > Unable to Alter the Temporary View Using ALTER VIEW command > --- > > Key: SPARK-17180 > URL: https://issues.apache.org/jira/browse/SPARK-17180 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > In the current master branch, when users do not specify the database name in > the `ALTER VIEW AS SELECT` command, we always try to alter the permanent view > even if the temporary view exists. > The expected behavior of `ALTER VIEW AS SELECT` should be like: alters the > temporary view if the temp view exists; otherwise, try to alter the permanent > view. This order is consistent with another command `DROP VIEW`, since users > are unable to specify the keyword TEMPORARY. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org