[jira] [Commented] (SPARK-12452) Add exception details to TaskCompletionListener/TaskContext

2016-08-30 Thread Jagadeesan A S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451171#comment-15451171
 ] 

Jagadeesan A S commented on SPARK-12452:


[~srowen] any suggestion from your side..? can i take it up..?

> Add exception details to TaskCompletionListener/TaskContext
> ---
>
> Key: SPARK-12452
> URL: https://issues.apache.org/jira/browse/SPARK-12452
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Neelesh Shastry
>Priority: Minor
>
> TaskCompletionListeners are called without success/failure details. 
> If we change this
> {code}
> trait TaskCompletionListener extends EventListener {
>   def onTaskCompletion(context: TaskContext)
> }
> class TaskContextImpl {
>  
> private[spark] def markTaskCompleted(throwable:Option[Throwable]): Unit
> 
> listener.onTaskCompletion(this,throwable)
> }
> {code}
> to something like
> {code}
> trait TaskCompletionListener extends EventListener {
>   def onTaskCompletion(context: TaskContext, throwable:Option[Throwable]=None)
> }
> {code}
> .. and  in Task.scala
> {code}
>val results = Try(runTask(context))
>var throwable:Option[Throwable] = None
> try {
>   runTask(context)
>
> }catch{
>   case t:Throwable => throwable=t
> }
>  finally {
>   context.markTaskCompleted(throwable)
>   TaskContext.unset()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15985) Reduce runtime overhead of a program that reads an primitive array in Dataset

2016-08-30 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-15985.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 13704
[https://github.com/apache/spark/pull/13704]

> Reduce runtime overhead of a program that reads an primitive array in Dataset
> -
>
> Key: SPARK-15985
> URL: https://issues.apache.org/jira/browse/SPARK-15985
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.1.0
>
>
> When a program read an array in Dataset, the code generator create some copy 
> operations. If an array is for primitive type, there are some opportunities 
> for optimizations in generated code to reduce runtime overhead.
> {code}
> val ds = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDS()
> ds.map(p => {
>  var s = 0.0
>  for (i <- 0 to 2) { s += p(i) }
>  s
>}).show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15985) Reduce runtime overhead of a program that reads an primitive array in Dataset

2016-08-30 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-15985:

Assignee: Kazuaki Ishizaki

> Reduce runtime overhead of a program that reads an primitive array in Dataset
> -
>
> Key: SPARK-15985
> URL: https://issues.apache.org/jira/browse/SPARK-15985
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.1.0
>
>
> When a program read an array in Dataset, the code generator create some copy 
> operations. If an array is for primitive type, there are some opportunities 
> for optimizations in generated code to reduce runtime overhead.
> {code}
> val ds = Seq(Array(1.0, 2.0, 3.0), Array(4.0, 5.0, 6.0)).toDS()
> ds.map(p => {
>  var s = 0.0
>  for (i <- 0 to 2) { s += p(i) }
>  s
>}).show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-08-30 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451091#comment-15451091
 ] 

Joseph K. Bradley commented on SPARK-5992:
--

Awesome, thanks!  I made some comments, but it looks like a good approach 
overall.

> Locality Sensitive Hashing (LSH) for MLlib
> --
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17325) Inconsistent Spillable threshold and AppendOnlyMap growing threshold may trigger out-of-memory errors

2016-08-30 Thread Lijie Xu (JIRA)
Lijie Xu created SPARK-17325:


 Summary: Inconsistent Spillable threshold and AppendOnlyMap 
growing threshold may trigger out-of-memory errors
 Key: SPARK-17325
 URL: https://issues.apache.org/jira/browse/SPARK-17325
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, Spark Core
Affects Versions: 2.0.0, 1.6.2
Reporter: Lijie Xu


I am reading the shuffle source code and guessing that there may be a potential 
out-of-memory error in ExternalSorter.

The problem is that the memory usage of AppendOnlyMap (i.e., 
PartitionedAppendOnlyMap in ExternalSorter) can greatly exceed its spillable 
threshold (i.e., `currentMemory` can be 2 times the size of `myMemoryThreshold` 
in `Spillable.maybeSpill()`). This means that the task's current execution 
memory usage (AppendOnlyMap) has greatly exceeded its defined execution memory 
limit ((1 - spark.memory.storageFraction) * 1 / #taskNum), which will lead to 
potential out-of-memory errors.


Example: Current spillable threshold has become 250MB, while the AppendOnlyMap 
is 200MB. At this time, an incoming key/value record triggers AppendOnlyMap's 
size expansion (AppendOnlyMap is full). After expansion, the AppendOnlyMap may 
become 400MB (or slightly smaller), which is greatly larger than the spillable 
threshold and execution memory limit.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17324:


Assignee: (was: Apache Spark)

> Remove Direct Usage of HiveClient in InsertIntoHiveTable
> 
>
> Key: SPARK-17324
> URL: https://issues.apache.org/jira/browse/SPARK-17324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>
> This is another step to get rid of HiveClient from `HiveSessionState`. All 
> the metastore interactions should be through `ExternalCatalog` interface. 
> However, the existing implementation of `InsertIntoHiveTable ` still requires 
> Hive clients. Thus, we can remove HiveClient by moving the metastore 
> interactions into `ExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17324:


Assignee: Apache Spark

> Remove Direct Usage of HiveClient in InsertIntoHiveTable
> 
>
> Key: SPARK-17324
> URL: https://issues.apache.org/jira/browse/SPARK-17324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> This is another step to get rid of HiveClient from `HiveSessionState`. All 
> the metastore interactions should be through `ExternalCatalog` interface. 
> However, the existing implementation of `InsertIntoHiveTable ` still requires 
> Hive clients. Thus, we can remove HiveClient by moving the metastore 
> interactions into `ExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450913#comment-15450913
 ] 

Apache Spark commented on SPARK-17324:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14888

> Remove Direct Usage of HiveClient in InsertIntoHiveTable
> 
>
> Key: SPARK-17324
> URL: https://issues.apache.org/jira/browse/SPARK-17324
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>
> This is another step to get rid of HiveClient from `HiveSessionState`. All 
> the metastore interactions should be through `ExternalCatalog` interface. 
> However, the existing implementation of `InsertIntoHiveTable ` still requires 
> Hive clients. Thus, we can remove HiveClient by moving the metastore 
> interactions into `ExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17324) Remove Direct Usage of HiveClient in InsertIntoHiveTable

2016-08-30 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17324:
---

 Summary: Remove Direct Usage of HiveClient in InsertIntoHiveTable
 Key: SPARK-17324
 URL: https://issues.apache.org/jira/browse/SPARK-17324
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Xiao Li


This is another step to get rid of HiveClient from `HiveSessionState`. All the 
metastore interactions should be through `ExternalCatalog` interface. However, 
the existing implementation of `InsertIntoHiveTable ` still requires Hive 
clients. Thus, we can remove HiveClient by moving the metastore interactions 
into `ExternalCatalog`.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-17318.
--
   Resolution: Fixed
 Assignee: Shixiong Zhu
Fix Version/s: 2.1.0
   2.0.1

> Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class 
> defined in repl
> 
>
> Key: SPARK-17318
> URL: https://issues.apache.org/jira/browse/SPARK-17318
> Project: Spark
>  Issue Type: Test
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.1, 2.1.0
>
>
> There are a lot of failures recently: 
> http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15575) Remove breeze from dependencies?

2016-08-30 Thread Vladimir Feinberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450836#comment-15450836
 ] 

Vladimir Feinberg commented on SPARK-15575:
---

Some of the biggest issues with Breeze perf I've experienced is that a lot of 
operations you'd expect it to be fast for are not; and it's pretty syntax and 
heavy use of implicits makes it easy to accidentally use this.

For instance:
1. Mixed dense/sparse operations frequently resort to a generic implementation 
in breeze that uses its Scala iterators.
2. Creation of vectors, under certain operations, will result in unnecessary 
boxing of doubles (and integers, for sparse vectors).
3. Slice vectors have no support for efficient operations. They are implemented 
in breeze in a way that makes them no better than Array[Double], which again 
makes us use Scala iterators whenever we want a fast, vectorized dot product, 
for instance.

Usability is tough sometimes. Even though a Vector[Double] interface seems 
flexible, a lot of implementations require an explicit knowledge of the vector 
type (Sparse/dense), else breeze silently uses the slow Scala interface. Heavy 
use of implicits is also a problem here, since they're not implemented for all 
permutations of vector types.

It's also easy to do, e.g. val `vec1 += vec2 * a * b`. This will create two 
intermediate vectors.

I think the biggest issue is that `ml.linalg.Vector` is Breeze-backed. We 
should use our own linear algebra (we do have `BLAS`, though to support slicing 
this interface would have to be expanded) and move around `ArrayView[Double]` 
inside the vector instead.

Breeze as a dependency, as mentioned below, is very useful still for 
optimization. I think we can keep it around for that, as long as it's only for 
that.

> Remove breeze from dependencies?
> 
>
> Key: SPARK-15575
> URL: https://issues.apache.org/jira/browse/SPARK-15575
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450742#comment-15450742
 ] 

Apache Spark commented on SPARK-17323:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14874

> ALTER VIEW AS should keep the previous table properties, comment, 
> create_time, etc.
> ---
>
> Key: SPARK-17323
> URL: https://issues.apache.org/jira/browse/SPARK-17323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17323:


Assignee: Wenchen Fan  (was: Apache Spark)

> ALTER VIEW AS should keep the previous table properties, comment, 
> create_time, etc.
> ---
>
> Key: SPARK-17323
> URL: https://issues.apache.org/jira/browse/SPARK-17323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17323:


Assignee: Apache Spark  (was: Wenchen Fan)

> ALTER VIEW AS should keep the previous table properties, comment, 
> create_time, etc.
> ---
>
> Key: SPARK-17323
> URL: https://issues.apache.org/jira/browse/SPARK-17323
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17323) ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.

2016-08-30 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-17323:
---

 Summary: ALTER VIEW AS should keep the previous table properties, 
comment, create_time, etc.
 Key: SPARK-17323
 URL: https://issues.apache.org/jira/browse/SPARK-17323
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16283) Implement percentile_approx SQL function

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450696#comment-15450696
 ] 

Apache Spark commented on SPARK-16283:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/14868

> Implement percentile_approx SQL function
> 
>
> Key: SPARK-16283
> URL: https://issues.apache.org/jira/browse/SPARK-16283
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location

2016-08-30 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450632#comment-15450632
 ] 

Shivaram Venkataraman commented on SPARK-14742:
---

To follow up my guess was right - there is a ec2-scripts.html in 
https://github.com/apache/spark-website/tree/asf-site/site/docs/2.0.0-preview 
but the one in 
https://github.com/apache/spark-website/tree/asf-site/site/docs/2.0.0 is only a 
markdown file. 

I don't know if there is a simple way to generate just a single html file 
though instead of updating all of the docs. Also [~srowen] are we doing PRs for 
the website now ?

> Redirect spark-ec2 doc to new location
> --
>
> Key: SPARK-14742
> URL: https://issues.apache.org/jira/browse/SPARK-14742
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, EC2
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Trivial
> Fix For: 2.0.0
>
>
> See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453
> We need to redirect this page
> http://spark.apache.org/docs/latest/ec2-scripts.html
> to this page
> https://github.com/amplab/spark-ec2#readme



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location

2016-08-30 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450626#comment-15450626
 ] 

Nicholas Chammas commented on SPARK-14742:
--

{quote}
Otherwise the only way to get to this link is if you have it bookmarked.
{quote}

Or any page that has linked to it in the past. All those links are now broken. 
That's my main concern.

> Redirect spark-ec2 doc to new location
> --
>
> Key: SPARK-14742
> URL: https://issues.apache.org/jira/browse/SPARK-14742
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, EC2
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Trivial
> Fix For: 2.0.0
>
>
> See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453
> We need to redirect this page
> http://spark.apache.org/docs/latest/ec2-scripts.html
> to this page
> https://github.com/amplab/spark-ec2#readme



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location

2016-08-30 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450619#comment-15450619
 ] 

Shivaram Venkataraman commented on SPARK-14742:
---

Couple of things I noticed
- It seems to work fine with 2.0.0-preview (i.e. 
http://spark.apache.org/docs/2.0.0-preview/ec2-scripts.html) -- So I'm 
wondering if its just an issue of some files not copied correctly ?
- Can we bring back 'Amazon EC2' in the drop down for deploying ? Otherwise the 
only way to get to this link is if you have it bookmarked.

> Redirect spark-ec2 doc to new location
> --
>
> Key: SPARK-14742
> URL: https://issues.apache.org/jira/browse/SPARK-14742
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, EC2
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Trivial
> Fix For: 2.0.0
>
>
> See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453
> We need to redirect this page
> http://spark.apache.org/docs/latest/ec2-scripts.html
> to this page
> https://github.com/amplab/spark-ec2#readme



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14155) Hide UserDefinedType in Spark 2.0

2016-08-30 Thread Robert Conrad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450612#comment-15450612
 ] 

Robert Conrad commented on SPARK-14155:
---

[~r...@databricks.com] echoing Maciej above - is there any progress on UDTs for 
datasets or a jira ticket we can follow?

> Hide UserDefinedType in Spark 2.0
> -
>
> Key: SPARK-14155
> URL: https://issues.apache.org/jira/browse/SPARK-14155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> UserDefinedType is a developer API in Spark 1.x.
> With very high probability we will create a new API for user-defined type 
> that also works well with column batches as well as encoders (datasets). In 
> Spark 2.0, let's make UserDefinedType private[spark] first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14742) Redirect spark-ec2 doc to new location

2016-08-30 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450602#comment-15450602
 ] 

Nicholas Chammas commented on SPARK-14742:
--

http://spark.apache.org/docs/latest/ec2-scripts.html

I am seeing that this URL is not redirecting to the new spark-ec2 location on 
GitHub. [~srowen] - Can you fix that? 

I can see that we have some kind of redirect setup, but I guess it's not 
working.

https://github.com/apache/spark/blob/master/docs/ec2-scripts.md

> Redirect spark-ec2 doc to new location
> --
>
> Key: SPARK-14742
> URL: https://issues.apache.org/jira/browse/SPARK-14742
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, EC2
>Reporter: Nicholas Chammas
>Assignee: Sean Owen
>Priority: Trivial
> Fix For: 2.0.0
>
>
> See: https://github.com/amplab/spark-ec2/pull/24#issuecomment-212033453
> We need to redirect this page
> http://spark.apache.org/docs/latest/ec2-scripts.html
> to this page
> https://github.com/amplab/spark-ec2#readme



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17322) 'ANY n' clause for SQL queries to increase the ease of use of WHERE clause predicates

2016-08-30 Thread Suman Somasundar (JIRA)
Suman Somasundar created SPARK-17322:


 Summary: 'ANY n' clause for SQL queries to increase the ease of 
use of WHERE clause predicates
 Key: SPARK-17322
 URL: https://issues.apache.org/jira/browse/SPARK-17322
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Suman Somasundar
Priority: Minor


If the user is interested in getting the results that meet 'any n' criteria out 
of m where clause predicates (m > n), then the 'any n' clause greatly 
simplifies writing a SQL query.

An example is given below:

select symbol from stocks where (market_cap > 5.7b, analysts_recommend > 10, 
moving_avg > 49.2, pe_ratio >15.4) ANY 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17321:


Assignee: (was: Apache Spark)

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450424#comment-15450424
 ] 

Apache Spark commented on SPARK-17321:
--

User 'zhaoyunjiong' has created a pull request for this issue:
https://github.com/apache/spark/pull/14887

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0
>Reporter: yunjiong zhao
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17321:


Assignee: Apache Spark

> YARN shuffle service should use good disk from yarn.nodemanager.local-dirs
> --
>
> Key: SPARK-17321
> URL: https://issues.apache.org/jira/browse/SPARK-17321
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.2, 2.0.0
>Reporter: yunjiong zhao
>Assignee: Apache Spark
>
> We run spark on yarn, after enabled spark dynamic allocation, we notice some 
> spark application failed randomly due to YarnShuffleService.
> From log I found
> {quote}
> 2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: 
> Error while initializing Netty pipeline
> java.lang.NullPointerException
> at 
> org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
> at 
> org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
> at 
> org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
> at 
> org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
> at 
> io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
> at 
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> {quote} 
> Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
> If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
> hundred nodes which is unacceptable.
> We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
> disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17321) YARN shuffle service should use good disk from yarn.nodemanager.local-dirs

2016-08-30 Thread yunjiong zhao (JIRA)
yunjiong zhao created SPARK-17321:
-

 Summary: YARN shuffle service should use good disk from 
yarn.nodemanager.local-dirs
 Key: SPARK-17321
 URL: https://issues.apache.org/jira/browse/SPARK-17321
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.0.0, 1.6.2
Reporter: yunjiong zhao


We run spark on yarn, after enabled spark dynamic allocation, we notice some 
spark application failed randomly due to YarnShuffleService.
>From log I found
{quote}
2016-08-29 11:33:03,450 ERROR org.apache.spark.network.TransportContext: Error 
while initializing Netty pipeline
java.lang.NullPointerException
at 
org.apache.spark.network.server.TransportRequestHandler.(TransportRequestHandler.java:77)
at 
org.apache.spark.network.TransportContext.createChannelHandler(TransportContext.java:159)
at 
org.apache.spark.network.TransportContext.initializePipeline(TransportContext.java:135)
at 
org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:123)
at 
org.apache.spark.network.server.TransportServer$1.initChannel(TransportServer.java:116)
at 
io.netty.channel.ChannelInitializer.channelRegistered(ChannelInitializer.java:69)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRegistered(AbstractChannelHandlerContext.java:133)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRegistered(AbstractChannelHandlerContext.java:119)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRegistered(DefaultChannelPipeline.java:733)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:450)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.access$100(AbstractChannel.java:378)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:424)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
{quote} 
Which caused by the first disk in yarn.nodemanager.local-dirs was broken.
If we enabled spark.yarn.shuffle.stopOnFailure(SPARK-16505) we might lost 
hundred nodes which is unacceptable.
We have 12 disks in yarn.nodemanager.local-dirs, so why not use other good 
disks if the first one is broken?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450270#comment-15450270
 ] 

Apache Spark commented on SPARK-17243:
--

User 'ajbozarth' has created a pull request for this issue:
https://github.com/apache/spark/pull/14886

> Spark 2.0 history server summary page gets stuck at "loading history summary" 
> with 10K+ application history
> ---
>
> Key: SPARK-17243
> URL: https://issues.apache.org/jira/browse/SPARK-17243
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Gang Wu
>Assignee: Alex Bozarth
> Fix For: 2.1.0
>
>
> The summary page of Spark 2.0 history server web UI keep displaying "Loading 
> history summary..." all the time and crashes the browser when there are more 
> than 10K application history event logs on HDFS. 
> I did some investigation, "historypage.js" file sends a REST request to 
> /api/v1/applications endpoint of history server REST endpoint and gets back 
> json response. When there are more than 10K applications inside the event log 
> directory it takes forever to parse them and render the page. When there are 
> only hundreds or thousands of application history it is running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history

2016-08-30 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-17243.
---
   Resolution: Fixed
 Assignee: Alex Bozarth
Fix Version/s: 2.1.0

> Spark 2.0 history server summary page gets stuck at "loading history summary" 
> with 10K+ application history
> ---
>
> Key: SPARK-17243
> URL: https://issues.apache.org/jira/browse/SPARK-17243
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Gang Wu
>Assignee: Alex Bozarth
> Fix For: 2.1.0
>
>
> The summary page of Spark 2.0 history server web UI keep displaying "Loading 
> history summary..." all the time and crashes the browser when there are more 
> than 10K application history event logs on HDFS. 
> I did some investigation, "historypage.js" file sends a REST request to 
> /api/v1/applications endpoint of history server REST endpoint and gets back 
> json response. When there are more than 10K applications inside the event log 
> directory it takes forever to parse them and render the page. When there are 
> only hundreds or thousands of application history it is running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Ewan Leith (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450194#comment-15450194
 ] 

Ewan Leith commented on SPARK-17313:


I think Apache Zeppelin and Spark Notebook both cover this requirement better 
than the Spark shell ever will? The installation requirements for either are 
fairly minimal and give you all sorts of additional benefits over the raw shell.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17320:


Assignee: Apache Spark

> Spark Mesos module not building on PRs
> --
>
> Key: SPARK-17320
> URL: https://issues.apache.org/jira/browse/SPARK-17320
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-17320:
---
Fix Version/s: (was: 2.0.1)

> Spark Mesos module not building on PRs
> --
>
> Key: SPARK-17320
> URL: https://issues.apache.org/jira/browse/SPARK-17320
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-17320:
---
Affects Version/s: (was: 2.0.0)
   2.1.0

> Spark Mesos module not building on PRs
> --
>
> Key: SPARK-17320
> URL: https://issues.apache.org/jira/browse/SPARK-17320
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450113#comment-15450113
 ] 

Apache Spark commented on SPARK-17320:
--

User 'mgummelt' has created a pull request for this issue:
https://github.com/apache/spark/pull/14885

> Spark Mesos module not building on PRs
> --
>
> Key: SPARK-17320
> URL: https://issues.apache.org/jira/browse/SPARK-17320
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17320:


Assignee: (was: Apache Spark)

> Spark Mesos module not building on PRs
> --
>
> Key: SPARK-17320
> URL: https://issues.apache.org/jira/browse/SPARK-17320
> Project: Spark
>  Issue Type: Task
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Michael Gummelt (JIRA)
Michael Gummelt created SPARK-17320:
---

 Summary: Spark Mesos module not building on PRs
 Key: SPARK-17320
 URL: https://issues.apache.org/jira/browse/SPARK-17320
 Project: Spark
  Issue Type: Task
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt
 Fix For: 2.0.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17318:


Assignee: (was: Apache Spark)

> Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class 
> defined in repl
> 
>
> Key: SPARK-17318
> URL: https://issues.apache.org/jira/browse/SPARK-17318
> Project: Spark
>  Issue Type: Test
>Reporter: Shixiong Zhu
>
> There are a lot of failures recently: 
> http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450101#comment-15450101
 ] 

Apache Spark commented on SPARK-17318:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/14884

> Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class 
> defined in repl
> 
>
> Key: SPARK-17318
> URL: https://issues.apache.org/jira/browse/SPARK-17318
> Project: Spark
>  Issue Type: Test
>Reporter: Shixiong Zhu
>
> There are a lot of failures recently: 
> http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17318:


Assignee: Apache Spark

> Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class 
> defined in repl
> 
>
> Key: SPARK-17318
> URL: https://issues.apache.org/jira/browse/SPARK-17318
> Project: Spark
>  Issue Type: Test
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> There are a lot of failures recently: 
> http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450094#comment-15450094
 ] 

Apache Spark commented on SPARK-17319:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14883

> Move addJar from HiveSessionState to HiveExternalCatalog
> 
>
> Key: SPARK-17319
> URL: https://issues.apache.org/jira/browse/SPARK-17319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>
> This is another step to remove Hive client usage in `HiveSessionState`.
> Different sessions are sharing the same class loader, and thus, 
> `metadataHive.addJar(path)` basically loads the JARs for all the sessions. 
> Thus, no impact is made if we move `addJar` from `HiveSessionState` to 
> `HiveExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17319:


Assignee: (was: Apache Spark)

> Move addJar from HiveSessionState to HiveExternalCatalog
> 
>
> Key: SPARK-17319
> URL: https://issues.apache.org/jira/browse/SPARK-17319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>
> This is another step to remove Hive client usage in `HiveSessionState`.
> Different sessions are sharing the same class loader, and thus, 
> `metadataHive.addJar(path)` basically loads the JARs for all the sessions. 
> Thus, no impact is made if we move `addJar` from `HiveSessionState` to 
> `HiveExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17319:


Assignee: Apache Spark

> Move addJar from HiveSessionState to HiveExternalCatalog
> 
>
> Key: SPARK-17319
> URL: https://issues.apache.org/jira/browse/SPARK-17319
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> This is another step to remove Hive client usage in `HiveSessionState`.
> Different sessions are sharing the same class loader, and thus, 
> `metadataHive.addJar(path)` basically loads the JARs for all the sessions. 
> Thus, no impact is made if we move `addJar` from `HiveSessionState` to 
> `HiveExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17319) Move addJar from HiveSessionState to HiveExternalCatalog

2016-08-30 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17319:
---

 Summary: Move addJar from HiveSessionState to HiveExternalCatalog
 Key: SPARK-17319
 URL: https://issues.apache.org/jira/browse/SPARK-17319
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Xiao Li


This is another step to remove Hive client usage in `HiveSessionState`.

Different sessions are sharing the same class loader, and thus, 
`metadataHive.addJar(path)` basically loads the JARs for all the sessions. 
Thus, no impact is made if we move `addJar` from `HiveSessionState` to 
`HiveExternalCatalog`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-17318:
-
Description: There are a lot of failures recently: 
http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl

> Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class 
> defined in repl
> 
>
> Key: SPARK-17318
> URL: https://issues.apache.org/jira/browse/SPARK-17318
> Project: Spark
>  Issue Type: Test
>Reporter: Shixiong Zhu
>
> There are a lot of failures recently: 
> http://spark-tests.appspot.com/tests/org.apache.spark.repl.ReplSuite/replicating%20blocks%20of%20object%20with%20class%20defined%20in%20repl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17318) Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl

2016-08-30 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-17318:


 Summary: Fix flaky test: o.a.s.repl.ReplSuite replicating blocks 
of object with class defined in repl
 Key: SPARK-17318
 URL: https://issues.apache.org/jira/browse/SPARK-17318
 Project: Spark
  Issue Type: Test
Reporter: Shixiong Zhu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16456) Reuse the uncorrelated scalar subqueries with the same logical plan in a query

2016-08-30 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-16456.
---
Resolution: Duplicate

> Reuse the uncorrelated scalar subqueries with the same logical plan in a query
> --
>
> Key: SPARK-16456
> URL: https://issues.apache.org/jira/browse/SPARK-16456
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Lianhui Wang
>
> In TPCDS Q14, the same physical plan of uncorrelated scalar subqueries from a 
> CTE could be executed multiple times, we should re-use the same result  to 
> avoid the duplicated computing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16419) EnsureRequirements adds extra Sort to already sorted cached table

2016-08-30 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-16419.
---
Resolution: Duplicate

> EnsureRequirements adds extra Sort to already sorted cached table
> -
>
> Key: SPARK-16419
> URL: https://issues.apache.org/jira/browse/SPARK-16419
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Mitesh
>Priority: Minor
>
> EnsureRequirements compares the required and given sort ordering, but uses 
> Scala equals instead of a semantic equals, so column capitalization isn't 
> considered, and also fails for a cached table. This results in a 
> SortMergeJoin of a cached already-sorted table to add an extra sort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl

2016-08-30 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-17314.
--
   Resolution: Fixed
Fix Version/s: 2.1.0

> Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
> 
>
> Key: SPARK-17314
> URL: https://issues.apache.org/jira/browse/SPARK-17314
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.1.0
>
>
> Netty will use its fast ThreadLocal implementation when a thread is a 
> FastThreadLocalThread. This patch just switches to Netty's 
> DefaultThreadFactory to trigger it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17304) TaskSetManager.abortIfCompletelyBlacklisted is a perf. hotspot in scheduler benchmark

2016-08-30 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-17304.

   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14871
[https://github.com/apache/spark/pull/14871]

> TaskSetManager.abortIfCompletelyBlacklisted is a perf. hotspot in scheduler 
> benchmark
> -
>
> Key: SPARK-17304
> URL: https://issues.apache.org/jira/browse/SPARK-17304
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
> Fix For: 2.1.0
>
>
> If you run
> {code}
> sc.parallelize(1 to 10, 10).map(identity).count()
> {code}
> then {{TaskSetManager.abortIfCompletelyBlacklisted()}} is the number-one 
> performance hotspot in the scheduler, accounting for over half of the time. 
> This method was introduced in SPARK-15865, so this is a performance 
> regression in 2.1.0-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17317) Add package vignette to SparkR

2016-08-30 Thread Junyang Qian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450013#comment-15450013
 ] 

Junyang Qian commented on SPARK-17317:
--

WIP

> Add package vignette to SparkR
> --
>
> Key: SPARK-17317
> URL: https://issues.apache.org/jira/browse/SPARK-17317
> Project: Spark
>  Issue Type: Improvement
>Reporter: Junyang Qian
>
> In publishing SparkR to CRAN, it would be nice to have a vignette as a user 
> guide that
> * describes the big picture
> * introduces the use of various methods
> This is important for new users because they may not even know which method 
> to look up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17317) Add package vignette to SparkR

2016-08-30 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-17317:


 Summary: Add package vignette to SparkR
 Key: SPARK-17317
 URL: https://issues.apache.org/jira/browse/SPARK-17317
 Project: Spark
  Issue Type: Improvement
Reporter: Junyang Qian


In publishing SparkR to CRAN, it would be nice to have a vignette as a user 
guide that
* describes the big picture
* introduces the use of various methods

This is important for new users because they may not even know which method to 
look up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17316:


Assignee: Apache Spark

> Don't block StandaloneSchedulerBackend.executorRemoved
> --
>
> Key: SPARK-17316
> URL: https://issues.apache.org/jira/browse/SPARK-17316
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It 
> may cause some deadlock since it's called inside 
> StandaloneAppClient.ClientEndpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17316:


Assignee: (was: Apache Spark)

> Don't block StandaloneSchedulerBackend.executorRemoved
> --
>
> Key: SPARK-17316
> URL: https://issues.apache.org/jira/browse/SPARK-17316
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shixiong Zhu
>
> StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It 
> may cause some deadlock since it's called inside 
> StandaloneAppClient.ClientEndpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449985#comment-15449985
 ] 

Apache Spark commented on SPARK-17316:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/14882

> Don't block StandaloneSchedulerBackend.executorRemoved
> --
>
> Key: SPARK-17316
> URL: https://issues.apache.org/jira/browse/SPARK-17316
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shixiong Zhu
>
> StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It 
> may cause some deadlock since it's called inside 
> StandaloneAppClient.ClientEndpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17316) Don't block StandaloneSchedulerBackend.executorRemoved

2016-08-30 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-17316:


 Summary: Don't block StandaloneSchedulerBackend.executorRemoved
 Key: SPARK-17316
 URL: https://issues.apache.org/jira/browse/SPARK-17316
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.0.0
Reporter: Shixiong Zhu


StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may 
cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449889#comment-15449889
 ] 

Apache Spark commented on SPARK-17315:
--

User 'junyangq' has created a pull request for this issue:
https://github.com/apache/spark/pull/14881

> Add Kolmogorov-Smirnov Test to SparkR
> -
>
> Key: SPARK-17315
> URL: https://issues.apache.org/jira/browse/SPARK-17315
> Project: Spark
>  Issue Type: New Feature
>Reporter: Junyang Qian
>
> Kolmogorov-Smirnov Test is a popular nonparametric test of equality of 
> distributions. There is implementation in MLlib. It will be nice if we can 
> expose that in SparkR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17315:


Assignee: (was: Apache Spark)

> Add Kolmogorov-Smirnov Test to SparkR
> -
>
> Key: SPARK-17315
> URL: https://issues.apache.org/jira/browse/SPARK-17315
> Project: Spark
>  Issue Type: New Feature
>Reporter: Junyang Qian
>
> Kolmogorov-Smirnov Test is a popular nonparametric test of equality of 
> distributions. There is implementation in MLlib. It will be nice if we can 
> expose that in SparkR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17315:


Assignee: Apache Spark

> Add Kolmogorov-Smirnov Test to SparkR
> -
>
> Key: SPARK-17315
> URL: https://issues.apache.org/jira/browse/SPARK-17315
> Project: Spark
>  Issue Type: New Feature
>Reporter: Junyang Qian
>Assignee: Apache Spark
>
> Kolmogorov-Smirnov Test is a popular nonparametric test of equality of 
> distributions. There is implementation in MLlib. It will be nice if we can 
> expose that in SparkR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17315) Add Kolmogorov-Smirnov Test to SparkR

2016-08-30 Thread Junyang Qian (JIRA)
Junyang Qian created SPARK-17315:


 Summary: Add Kolmogorov-Smirnov Test to SparkR
 Key: SPARK-17315
 URL: https://issues.apache.org/jira/browse/SPARK-17315
 Project: Spark
  Issue Type: New Feature
Reporter: Junyang Qian


Kolmogorov-Smirnov Test is a popular nonparametric test of equality of 
distributions. There is implementation in MLlib. It will be nice if we can 
expose that in SparkR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model

2016-08-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449830#comment-15449830
 ] 

Steve Loughran commented on SPARK-17307:


I think this a subset of SPARK-7481, where I am doing the docs

https://github.com/steveloughran/spark/blob/f39018eee40ef463ebfdfb0f6a7ba6384b46c459/docs/cloud-integration.md

I haven't done the bit on authentication setup through; I'm planning to point 
to the [Hadoop docs 
there|https://hadoop.apache.org/docs/stable2/hadoop-aws/tools/hadoop-aws/index.html],
 because as well as the details on how to configure the latest hadoop s3x 
clients, it's got a troubleshooting section.

Looking at the code,

# It's dangerous to put AWS secrets in the source file —it's too easy to leak 
them. Stick them in your spark configuration file, prefixed with 
{{spark.hadoop}}
# if you are using Hadoop 2.7+ as the Hadoop version, please use s3a:// paths 
instead of s3n://. Your life will be better.

Anyway, can you have a look at the cloud integration doc I've linked to, 
comment on the [pull request|https://github.com/apache/spark/pull/12004] where 
it could be improvedI'll do my best


> Document what all access is needed on S3 bucket when trying to save a model
> ---
>
> Key: SPARK-17307
> URL: https://issues.apache.org/jira/browse/SPARK-17307
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>Priority: Minor
>
> I faced this lack of documentation when I was trying to save a model to S3. 
> Initially I thought it should be only write. Then I found it also needs 
> delete to delete temporary files. Now I requested access for delete and tried 
> again and I am get the error
> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: 
> org.jets3t.service.S3ServiceException: S3 PUT failed for 
> '/dev-qa_%24folder%24' XML Error Message
> To reproduce this error the below can be used
> {code}
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("my app")
> .master("local") 
> .getOrCreate();
> JavaSparkContext jsc = new 
> JavaSparkContext(sparkSession.sparkContext());
> jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", );
> jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",  ACCESS KEY>);
> //Create a Pipelinemode
> 
> pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest");
> {code}
> This back and forth could be avoided if it was clearly mentioned what all 
> access spark needs to write to S3. Also would be great if why all of the 
> access is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history

2016-08-30 Thread Alex Bozarth (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449790#comment-15449790
 ] 

Alex Bozarth commented on SPARK-17243:
--

[~ste...@apache.org] [~tgraves] The issues you mentioned are what I'm hoping to 
work on next month (what I mentioned above) when I'm given the bandwidth to do 
so. When that comes I'll file a JIRA and loop you two in to discuss 
implementation ideas. (Unless some brave soul decides to give it a try before 
then)

> Spark 2.0 history server summary page gets stuck at "loading history summary" 
> with 10K+ application history
> ---
>
> Key: SPARK-17243
> URL: https://issues.apache.org/jira/browse/SPARK-17243
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Gang Wu
>
> The summary page of Spark 2.0 history server web UI keep displaying "Loading 
> history summary..." all the time and crashes the browser when there are more 
> than 10K application history event logs on HDFS. 
> I did some investigation, "historypage.js" file sends a REST request to 
> /api/v1/applications endpoint of history server REST endpoint and gets back 
> json response. When there are more than 10K applications inside the event log 
> directory it takes forever to parse them and render the page. When there are 
> only hundreds or thousands of application history it is running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449684#comment-15449684
 ] 

Sean Owen commented on SPARK-17313:
---

The problem is: how are you going to interact with a shell on your local 
machine when the driver is somewhere else? It's not impossible but not clear 
it's worthwhile. The driver is in general not doing much work, or shouldn't be; 
the shell is more for exploration that production.

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17312) Support spark-shell on cluster mode

2016-08-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17312.
---
Resolution: Duplicate

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17312
> URL: https://issues.apache.org/jira/browse/SPARK-17312
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-17312) Support spark-shell on cluster mode

2016-08-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-17312:
---

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17312
> URL: https://issues.apache.org/jira/browse/SPARK-17312
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17314:


Assignee: Shixiong Zhu  (was: Apache Spark)

> Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
> 
>
> Key: SPARK-17314
> URL: https://issues.apache.org/jira/browse/SPARK-17314
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
>
> Netty will use its fast ThreadLocal implementation when a thread is a 
> FastThreadLocalThread. This patch just switches to Netty's 
> DefaultThreadFactory to trigger it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17314:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
> 
>
> Key: SPARK-17314
> URL: https://issues.apache.org/jira/browse/SPARK-17314
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>Priority: Minor
>
> Netty will use its fast ThreadLocal implementation when a thread is a 
> FastThreadLocalThread. This patch just switches to Netty's 
> DefaultThreadFactory to trigger it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449670#comment-15449670
 ] 

Apache Spark commented on SPARK-17314:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/14879

> Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl
> 
>
> Key: SPARK-17314
> URL: https://issues.apache.org/jira/browse/SPARK-17314
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
>
> Netty will use its fast ThreadLocal implementation when a thread is a 
> FastThreadLocalThread. This patch just switches to Netty's 
> DefaultThreadFactory to trigger it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17314) Use Netty's DefaultThreadFactory to enable its fast ThreadLocal impl

2016-08-30 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-17314:


 Summary: Use Netty's DefaultThreadFactory to enable its fast 
ThreadLocal impl
 Key: SPARK-17314
 URL: https://issues.apache.org/jira/browse/SPARK-17314
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor


Netty will use its fast ThreadLocal implementation when a thread is a 
FastThreadLocalThread. This patch just switches to Netty's DefaultThreadFactory 
to trigger it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Mahmoud Elgamal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmoud Elgamal reopened SPARK-17313:
-

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-17312) Support spark-shell on cluster mode

2016-08-30 Thread Mahmoud Elgamal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmoud Elgamal closed SPARK-17312.
---
Resolution: Fixed

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17312
> URL: https://issues.apache.org/jira/browse/SPARK-17312
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-17313.

Resolution: Duplicate

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17313
> URL: https://issues.apache.org/jira/browse/SPARK-17313
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17312) Support spark-shell on cluster mode

2016-08-30 Thread Mahmoud Elgamal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahmoud Elgamal updated SPARK-17312:

Issue Type: New Feature  (was: Bug)

> Support spark-shell on cluster mode
> ---
>
> Key: SPARK-17312
> URL: https://issues.apache.org/jira/browse/SPARK-17312
> Project: Spark
>  Issue Type: New Feature
>Reporter: Mahmoud Elgamal
>
> The main issue with the current spark shell is that the driver is running on 
> the user machine. If the driver resource requirement is beyond user machine 
> capacity, then spark shell will be useless. If we are to add the cluster 
> mode(Yarn or Mesos ) for spark shell via some sort of proxy where user 
> machine only hosts a rest client to the running driver at the cluster, the 
> shell will be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17312) Support spark-shell on cluster mode

2016-08-30 Thread Mahmoud Elgamal (JIRA)
Mahmoud Elgamal created SPARK-17312:
---

 Summary: Support spark-shell on cluster mode
 Key: SPARK-17312
 URL: https://issues.apache.org/jira/browse/SPARK-17312
 Project: Spark
  Issue Type: Bug
Reporter: Mahmoud Elgamal


The main issue with the current spark shell is that the driver is running on 
the user machine. If the driver resource requirement is beyond user machine 
capacity, then spark shell will be useless. If we are to add the cluster 
mode(Yarn or Mesos ) for spark shell via some sort of proxy where user machine 
only hosts a rest client to the running driver at the cluster, the shell will 
be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17313) Support spark-shell on cluster mode

2016-08-30 Thread Mahmoud Elgamal (JIRA)
Mahmoud Elgamal created SPARK-17313:
---

 Summary: Support spark-shell on cluster mode
 Key: SPARK-17313
 URL: https://issues.apache.org/jira/browse/SPARK-17313
 Project: Spark
  Issue Type: New Feature
Reporter: Mahmoud Elgamal


The main issue with the current spark shell is that the driver is running on 
the user machine. If the driver resource requirement is beyond user machine 
capacity, then spark shell will be useless. If we are to add the cluster 
mode(Yarn or Mesos ) for spark shell via some sort of proxy where user machine 
only hosts a rest client to the running driver at the cluster, the shell will 
be more powerful



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17306) Memory leak in QuantileSummaries

2016-08-30 Thread Tim Hunter (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449512#comment-15449512
 ] 

Tim Hunter commented on SPARK-17306:


[~srowen] yes I had a discussion yesterday with [~clockfly]. The issue is 
performance, not correctness, by the way. The fix is to add a call to the 
compression: the compression threshold should be
used in insert() after inserting the head buffer at this line:

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala#L66

Unless someone else wants to step in, I will be happy to fix this issue.

> Memory leak in QuantileSummaries
> 
>
> Key: SPARK-17306
> URL: https://issues.apache.org/jira/browse/SPARK-17306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>
> compressThreshold was not referenced anywhere
> {code}
> class QuantileSummaries(
> val compressThreshold: Int,
> val relativeError: Double,
> val sampled: ArrayBuffer[Stats] = ArrayBuffer.empty,
> private[stat] var count: Long = 0L,
> val headSampled: ArrayBuffer[Double] = ArrayBuffer.empty) extends 
> Serializable
> {code}
> And, it causes memory leak, QuantileSummaries takes unbounded memory
> {code}
> val summary = new QuantileSummaries(1, relativeError = 0.001)
> // Results in creating an array of size 1 !!! 
> (1 to 1).foreach(summary.insert(_))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16402) JDBC source: Implement save API

2016-08-30 Thread Dragisa Krsmanovic (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449509#comment-15449509
 ] 

Dragisa Krsmanovic commented on SPARK-16402:


Any progress on this ?

> JDBC source: Implement save API
> ---
>
> Key: SPARK-16402
> URL: https://issues.apache.org/jira/browse/SPARK-16402
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are unable to call the `save` API of `DataFrameWriter` when the 
> source is JDBC. For example, 
> {noformat}
> df.write
>   .format("jdbc")
>   .option("url", url1)
>   .option("dbtable", "TEST.TRUNCATETEST")
>   .option("user", "testUser")
>   .option("password", "testPass")
>   .save() 
> {noformat}
> The error message users will get is like
> {noformat}
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> {noformat}
> However, the `save` API is very common for all the data sources, like parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6460) Implement OpensslAesCtrCryptoCodec to enable encrypted shuffle algorithms which openssl provides

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-6460.
---
Resolution: Duplicate

This was covered by SPARK-5682.

> Implement OpensslAesCtrCryptoCodec to enable encrypted shuffle algorithms 
> which openssl provides
> 
>
> Key: SPARK-6460
> URL: https://issues.apache.org/jira/browse/SPARK-6460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Reporter: liyunzhang_intel
>
> SPARK-5682 only implements the encrypted shuffle algorithm provided by JCE. 
> OpensslAesCtrCryptoCodec need implement algorithm provided by openssl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10771) Implement the shuffle encryption with AES-CTR crypto using JCE key provider.

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-10771.

Resolution: Duplicate

This was covered by SPARK-5682.

> Implement the shuffle encryption with AES-CTR crypto using JCE key provider.
> 
>
> Key: SPARK-10771
> URL: https://issues.apache.org/jira/browse/SPARK-10771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Reporter: Ferdinand Xu
>Priority: Minor
>
> We will use the credentials stored in user group information to encrypt/ 
> decrypt shuffle data. We will use JCE key provider to implement AES-CTR 
> crypto.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17299) TRIM/LTRIM/RTRIM strips characters other than spaces

2016-08-30 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449425#comment-15449425
 ] 

Dongjoon Hyun commented on SPARK-17299:
---

Hi, [~jbeard] and [~srowen].
For the compatibility, it seems we had better fix this in SQL.
Could you make a PR for this, [~jbeard]?

> TRIM/LTRIM/RTRIM strips characters other than spaces
> 
>
> Key: SPARK-17299
> URL: https://issues.apache.org/jira/browse/SPARK-17299
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, SQL
>Affects Versions: 2.0.0
>Reporter: Jeremy Beard
>Priority: Minor
>
> TRIM/LTRIM/RTRIM docs state that they only strip spaces:
> http://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html#trim(org.apache.spark.sql.Column)
> But the implementation strips all characters of ASCII value 20 or less:
> https://github.com/apache/spark/blob/v2.0.0/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L468-L470



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12333) Support shuffle spill encryption in Spark

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-12333.

Resolution: Fixed

This should be covered by SPARK-5682, reopen if that's not correct.

> Support shuffle spill encryption in Spark
> -
>
> Key: SPARK-12333
> URL: https://issues.apache.org/jira/browse/SPARK-12333
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: Ferdinand Xu
>
> Like shuffle file encryption in SPARK-5682, spills data should also be 
> encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5682) Add encrypted shuffle in spark

2016-08-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-5682.
---
   Resolution: Fixed
 Assignee: Ferdinand Xu
Fix Version/s: 2.1.0

> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
>Assignee: Ferdinand Xu
> Fix For: 2.1.0
>
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7445) StringIndexer should handle binary labels properly

2016-08-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449390#comment-15449390
 ] 

Sean Owen commented on SPARK-7445:
--

I think that if you want a specific mapping from labels to integers then you 
can just transform it directly. It's easier than trying to make a new API for 
such a simple thing. 

> StringIndexer should handle binary labels properly
> --
>
> Key: SPARK-7445
> URL: https://issues.apache.org/jira/browse/SPARK-7445
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> StringIndexer orders labels by their counts. However, for binary labels, we 
> should really map negatives to 0 and positive to 1. So can put special rules 
> for binary labels:
> 1. "+1"/"-1", "1"/"-1", "1"/"0"
> 2. "yes"/"no"
> 3. "true"/"false"
> Another option is to allow users to provide a list or labels and we use the 
> ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7445) StringIndexer should handle binary labels properly

2016-08-30 Thread Ruben Janssen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449353#comment-15449353
 ] 

Ruben Janssen commented on SPARK-7445:
--

[~mengxr] Could you please update on this?

> StringIndexer should handle binary labels properly
> --
>
> Key: SPARK-7445
> URL: https://issues.apache.org/jira/browse/SPARK-7445
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>Priority: Minor
>
> StringIndexer orders labels by their counts. However, for binary labels, we 
> should really map negatives to 0 and positive to 1. So can put special rules 
> for binary labels:
> 1. "+1"/"-1", "1"/"-1", "1"/"0"
> 2. "yes"/"no"
> 3. "true"/"false"
> Another option is to allow users to provide a list or labels and we use the 
> ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history

2016-08-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449250#comment-15449250
 ] 

Thomas Graves commented on SPARK-17243:
---

I agree, there are a ton of ways to improve the history server. I think these 
should be separate jiras though. Ideally it is much faster to load all the apps 
and get the initial list very quickly. Only load the entire application as a 
user requests or in the background to fill the cache.   Like you mention could 
have summary file written after loaded.  They could be stored differently so 
basic data is in dir or file path (like MapReduce history server), etc.  I just 
haven't had time to do this myself.

Right now this seems like a good workaround and as I mention in Pr  
spark.history.retainedApplications used to do this limiting in of the display 
but things have changed and I guess it broke/wasn't updated.

> Spark 2.0 history server summary page gets stuck at "loading history summary" 
> with 10K+ application history
> ---
>
> Key: SPARK-17243
> URL: https://issues.apache.org/jira/browse/SPARK-17243
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Gang Wu
>
> The summary page of Spark 2.0 history server web UI keep displaying "Loading 
> history summary..." all the time and crashes the browser when there are more 
> than 10K application history event logs on HDFS. 
> I did some investigation, "historypage.js" file sends a REST request to 
> /api/v1/applications endpoint of history server REST endpoint and gets back 
> json response. When there are more than 10K applications inside the event log 
> directory it takes forever to parse them and render the page. When there are 
> only hundreds or thousands of application history it is running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17243) Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history

2016-08-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448732#comment-15448732
 ] 

Steve Loughran commented on SPARK-17243:


One thing to consider here whether there are any ways to improve incremental 
loading of histories; start at the most recent and work backwards.

There's also the fact that the entire history is loaded just to get the final 
summary info (success/failure). Once parsed once, this could just be saved in a 
summary file alongside the original. That'd reduce load time from O(files * 
events) to O(files)

> Spark 2.0 history server summary page gets stuck at "loading history summary" 
> with 10K+ application history
> ---
>
> Key: SPARK-17243
> URL: https://issues.apache.org/jira/browse/SPARK-17243
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
> Environment: Linux
>Reporter: Gang Wu
>
> The summary page of Spark 2.0 history server web UI keep displaying "Loading 
> history summary..." all the time and crashes the browser when there are more 
> than 10K application history event logs on HDFS. 
> I did some investigation, "historypage.js" file sends a REST request to 
> /api/v1/applications endpoint of history server REST endpoint and gets back 
> json response. When there are more than 10K applications inside the event log 
> directory it takes forever to parse them and render the page. When there are 
> only hundreds or thousands of application history it is running fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17311:


Assignee: Apache Spark  (was: Sean Owen)

> Standardize Python-Java MLlib API to accept optional long seeds in all cases
> 
>
> Key: SPARK-17311
> URL: https://issues.apache.org/jira/browse/SPARK-17311
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Minor
>
> (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 )
> There are a few seed-related issues in the Pyspark-MLLib bridge:
> - {{PythonMLlibAPI}} methods that take a seed don't always take a 
> {{java.lang.Long}} consistently, allowing the Python API to specify "no seed"
> - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it 
> picks its own random seed. Instead it should default to None, meaning, 
> letting the Scala implementation pick a seed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448700#comment-15448700
 ] 

Apache Spark commented on SPARK-17311:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/14826

> Standardize Python-Java MLlib API to accept optional long seeds in all cases
> 
>
> Key: SPARK-17311
> URL: https://issues.apache.org/jira/browse/SPARK-17311
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 )
> There are a few seed-related issues in the Pyspark-MLLib bridge:
> - {{PythonMLlibAPI}} methods that take a seed don't always take a 
> {{java.lang.Long}} consistently, allowing the Python API to specify "no seed"
> - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it 
> picks its own random seed. Instead it should default to None, meaning, 
> letting the Scala implementation pick a seed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17311:


Assignee: Sean Owen  (was: Apache Spark)

> Standardize Python-Java MLlib API to accept optional long seeds in all cases
> 
>
> Key: SPARK-17311
> URL: https://issues.apache.org/jira/browse/SPARK-17311
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> (Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 )
> There are a few seed-related issues in the Pyspark-MLLib bridge:
> - {{PythonMLlibAPI}} methods that take a seed don't always take a 
> {{java.lang.Long}} consistently, allowing the Python API to specify "no seed"
> - .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it 
> picks its own random seed. Instead it should default to None, meaning, 
> letting the Scala implementation pick a seed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16832) CrossValidator and TrainValidationSplit are not random without seed

2016-08-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16832.
---
Resolution: Won't Fix

Per [~mengxr] I think this is WontFix, and the least confusing thing to do for 
my follow on change is make a new JIRA: 
https://issues.apache.org/jira/browse/SPARK-17311


> CrossValidator and TrainValidationSplit are not random without seed
> ---
>
> Key: SPARK-16832
> URL: https://issues.apache.org/jira/browse/SPARK-16832
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> Repeatedly running CrossValidator or TrainValidationSplit without an explicit 
> seed parameter does not change results. It is supposed to be seeded with a 
> random seed, but it seems to be instead seeded with some constant. (If seed 
> is explicitly provided, the two classes behave as expected.)
> {code}
> dataset = spark.createDataFrame(
>   [(Vectors.dense([0.0]), 0.0),
>(Vectors.dense([0.4]), 1.0),
>(Vectors.dense([0.5]), 0.0),
>(Vectors.dense([0.6]), 1.0),
>(Vectors.dense([1.0]), 1.0)] * 1000,
>   ["features", "label"]).cache()
> paramGrid = pyspark.ml.tuning.ParamGridBuilder().build()
> tvs = 
> pyspark.ml.tuning.TrainValidationSplit(estimator=pyspark.ml.regression.LinearRegression(),
>  
>estimatorParamMaps=paramGrid,
>
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>trainRatio=0.8)
> model = tvs.fit(train)
> print(model.validationMetrics)
> for folds in (3, 5, 10):
>   cv = 
> pyspark.ml.tuning.CrossValidator(estimator=pyspark.ml.regression.LinearRegression(),
>  
>   estimatorParamMaps=paramGrid, 
>   
> evaluator=pyspark.ml.evaluation.RegressionEvaluator(),
>   numFolds=folds
>  )
>   cvModel = cv.fit(dataset)
>   print(folds, cvModel.avgMetrics)
> {code}
> This code produces identical results upon repeated calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17311) Standardize Python-Java MLlib API to accept optional long seeds in all cases

2016-08-30 Thread Sean Owen (JIRA)
Sean Owen created SPARK-17311:
-

 Summary: Standardize Python-Java MLlib API to accept optional long 
seeds in all cases
 Key: SPARK-17311
 URL: https://issues.apache.org/jira/browse/SPARK-17311
 Project: Spark
  Issue Type: Bug
  Components: MLlib, PySpark
Affects Versions: 2.0.0
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor


(Note this follows on https://issues.apache.org/jira/browse/SPARK-16832 )

There are a few seed-related issues in the Pyspark-MLLib bridge:

- {{PythonMLlibAPI}} methods that take a seed don't always take a 
{{java.lang.Long}} consistently, allowing the Python API to specify "no seed"
- .mllib's {{Word2VecModel}} seems to be an odd man out in .mllib in that it 
picks its own random seed. Instead it should default to None, meaning, letting 
the Scala implementation pick a seed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side

2016-08-30 Thread Andrew Duffy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448672#comment-15448672
 ] 

Andrew Duffy commented on SPARK-17310:
--

+1 to this, see comments on https://github.com/apache/spark/pull/14671, 
particularly rdblue's comment. We need to wait for next release of Parquet to 
be able to be able to set {{parquet.filter.record-level.enabled}} config

> Disable Parquet's record-by-record filter in normal parquet reader and do it 
> in Spark-side
> --
>
> Key: SPARK-17310
> URL: https://issues.apache.org/jira/browse/SPARK-17310
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently, we are pushing filters down for normal Parquet reader which also 
> filters record-by-record.
> It seems Spark-side codegen row-by-row filtering might be faster than 
> Parquet's one in general due to type-boxing and virtual function calls which 
> Spark's one tries to avoid.
> Maybe we should perform a benchmark and disable this. This ticket was from 
> https://github.com/apache/spark/pull/14671
> Please refer the discussion in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17308) Replace all pattern match on boolean value by if/else block.

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17308:


Assignee: (was: Apache Spark)

> Replace all pattern match on boolean value by if/else block.
> 
>
> Key: SPARK-17308
> URL: https://issues.apache.org/jira/browse/SPARK-17308
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shivansh
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17308) Replace all pattern match on boolean value by if/else block.

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17308:


Assignee: Apache Spark

> Replace all pattern match on boolean value by if/else block.
> 
>
> Key: SPARK-17308
> URL: https://issues.apache.org/jira/browse/SPARK-17308
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shivansh
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17308) Replace all pattern match on boolean value by if/else block.

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448671#comment-15448671
 ] 

Apache Spark commented on SPARK-17308:
--

User 'shiv4nsh' has created a pull request for this issue:
https://github.com/apache/spark/pull/14873

> Replace all pattern match on boolean value by if/else block.
> 
>
> Key: SPARK-17308
> URL: https://issues.apache.org/jira/browse/SPARK-17308
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Shivansh
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17264) DataStreamWriter should document that it only supports Parquet for now

2016-08-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17264.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14860
[https://github.com/apache/spark/pull/14860]

> DataStreamWriter should document that it only supports Parquet for now
> --
>
> Key: SPARK-17264
> URL: https://issues.apache.org/jira/browse/SPARK-17264
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Input/Output
>Affects Versions: 2.0.0
> Environment: Mac OSX
>Reporter: Bill Reed
>Assignee: Sean Owen
>Priority: Trivial
> Fix For: 2.0.1, 2.1.0
>
>
> The API documentations for DataStreamWriter.format states "Specifies the 
> underlying output data source. Built-in options include "parquet", "json", 
> etc." but when specifying "json" or "text" for the format. the following 
> exception is thrown:
> Exception in thread "main" java.lang.UnsupportedOperationException: Data 
> source json does not support streamed writing
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:273)
>   at 
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:291)
> The only format that works is .format("parquet")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17276) Stop environment parameters flooding Jenkins build output

2016-08-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17276:
--
Assignee: Xin Ren

> Stop environment parameters flooding Jenkins build output
> -
>
> Key: SPARK-17276
> URL: https://issues.apache.org/jira/browse/SPARK-17276
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 2.0.0
>Reporter: Xin Ren
>Assignee: Xin Ren
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Screen Shot 2016-08-26 at 10.52.07 PM.png
>
>
> When I was trying to find error msg in a failed Jenkins build job, annoyed by 
> the huge env output. 
> The env parameter output should be muted.
> {code}
> [info] PipedRDDSuite:
> [info] - basic pipe (51 milliseconds)
>   0   0   0
> [info] - basic pipe with tokenization (60 milliseconds)
> [info] - failure in iterating over pipe input (49 milliseconds)
> [info] - advanced pipe (100 milliseconds)
> [info] - pipe with empty partition (117 milliseconds)
> PATH=/home/anaconda/envs/py3k/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.3.9/bin/:/usr/java/jdk1.8.0_60/bin:/home/jenkins/tools/hudson.model.JDK/JDK_7u60/bin:/home/jenkins/.cargo/bin:/home/anaconda/bin:/home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.1.1/bin/:/home/android-sdk/:/usr/local/bin:/bin:/usr/bin:/home/anaconda/envs/py3k/bin
> BUILD_CAUSE_GHPRBCAUSE=true
> SBT_MAVEN_PROFILES=-Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl 
> -Phive-thriftserver
> HUDSON_HOME=/var/lib/jenkins
> AWS_SECRET_ACCESS_KEY=
> JOB_URL=https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/
> HUDSON_COOKIE=638da3d2-d27a-4724-b41a-5ff6e8ce6752
> LINES=24
> CURRENT_BLOCK=18
> ANDROID_HOME=/home/android-sdk/
> ghprbActualCommit=70a751c6959048e65c083ab775b01523da4578a2
> ghprbSourceBranch=codeWalkThroughML
> GITHUB_OAUTH_KEY=
> MAIL=/var/mail/jenkins
> AMPLAB_JENKINS=1
> JENKINS_SERVER_COOKIE=472906e9832aeb79
> ghprbPullTitle=[MINOR][MLlib][SQL] Clean up unused variables and unused import
> LOGNAME=jenkins
> PWD=/home/jenkins/workspace/SparkPullRequestBuilder
> JENKINS_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_VERSIONS_SUITE_IVY_PATH=/home/sparkivy/per-executor-caches/9/.ivy2
> ROOT_BUILD_CAUSE_GHPRBCAUSE=true
> ghprbActualCommitAuthorEmail=iamsh...@126.com
> ghprbTargetBranch=master
> BUILD_TAG=jenkins-SparkPullRequestBuilder-64504
> SHELL=/bin/bash
> ROOT_BUILD_CAUSE=GHPRBCAUSE
> SBT_OPTS=-Duser.home=/home/sparkivy/per-executor-caches/9 
> -Dsbt.ivy.home=/home/sparkivy/per-executor-caches/9/.ivy2
> JENKINS_HOME=/var/lib/jenkins
> sha1=origin/pr/14836/merge
> ghprbPullDescription=GitHub pull request #14836 of commit 
> 70a751c6959048e65c083ab775b01523da4578a2 automatically merged.
> NODE_NAME=amp-jenkins-worker-02
> BUILD_DISPLAY_NAME=#64504
> JAVA_7_HOME=/usr/java/jdk1.7.0_79
> GIT_BRANCH=codeWalkThroughML
> SHLVL=3
> AMP_JENKINS_PRB=true
> JAVA_HOME=/usr/java/jdk1.8.0_60
> JENKINS_MASTER_HOSTNAME=amp-jenkins-master
> BUILD_ID=64504
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
> ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/14836
> JOB_NAME=SparkPullRequestBuilder
> BUILD_CAUSE=GHPRBCAUSE
> SPARK_SCALA_VERSION=2.11
> AWS_ACCESS_KEY_ID=
> NODE_LABELS=amp-jenkins-worker-02 centos spark-compile spark-test
> HUDSON_URL=https://amplab.cs.berkeley.edu/jenkins/
> SPARK_PREPEND_CLASSES=1
> COLUMNS=80
> WORKSPACE=/home/jenkins/workspace/SparkPullRequestBuilder
> SPARK_TESTING=1
> _=/usr/java/jdk1.8.0_60/bin/java
> GIT_COMMIT=b31b82bcc9d8767561ee720c9e7192252f4fd3fc
> ghprbPullId=14836
> EXECUTOR_NUMBER=9
> SSH_CLIENT=192.168.10.10 44762 22
> HUDSON_SERVER_COOKIE=472906e9832aeb79
> cat: nonexistent_file: No such file or directory
> cat: nonexistent_file: No such file or directory
> 

[jira] [Created] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side

2016-08-30 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-17310:


 Summary: Disable Parquet's record-by-record filter in normal 
parquet reader and do it in Spark-side
 Key: SPARK-17310
 URL: https://issues.apache.org/jira/browse/SPARK-17310
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Hyukjin Kwon


Currently, we are pushing filters down for normal Parquet reader which also 
filters record-by-record.

It seems Spark-side codegen row-by-row filtering might be faster than Parquet's 
one in general due to type-boxing and virtual function calls which Spark's one 
tries to avoid.

Maybe we should perform a benchmark and disable this. This ticket was from 
https://github.com/apache/spark/pull/14671

Please refer the discussion in the PR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17309) ALTER VIEW should throw exception if view not exist

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17309:


Assignee: Apache Spark  (was: Wenchen Fan)

> ALTER VIEW should throw exception if view not exist
> ---
>
> Key: SPARK-17309
> URL: https://issues.apache.org/jira/browse/SPARK-17309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17309) ALTER VIEW should throw exception if view not exist

2016-08-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17309:


Assignee: Wenchen Fan  (was: Apache Spark)

> ALTER VIEW should throw exception if view not exist
> ---
>
> Key: SPARK-17309
> URL: https://issues.apache.org/jira/browse/SPARK-17309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17309) ALTER VIEW should throw exception if view not exist

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448588#comment-15448588
 ] 

Apache Spark commented on SPARK-17309:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14874

> ALTER VIEW should throw exception if view not exist
> ---
>
> Key: SPARK-17309
> URL: https://issues.apache.org/jira/browse/SPARK-17309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17180) Unable to Alter the Temporary View Using ALTER VIEW command

2016-08-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448587#comment-15448587
 ] 

Apache Spark commented on SPARK-17180:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14874

> Unable to Alter the Temporary View Using ALTER VIEW command
> ---
>
> Key: SPARK-17180
> URL: https://issues.apache.org/jira/browse/SPARK-17180
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In the current master branch, when users do not specify the database name in 
> the `ALTER VIEW AS SELECT` command, we always try to alter the permanent view 
> even if the temporary view exists. 
> The expected behavior of `ALTER VIEW AS SELECT` should be like: alters the 
> temporary view if the temp view exists; otherwise, try to alter the permanent 
> view. This order is consistent with another command `DROP VIEW`, since users 
> are unable to specify the keyword TEMPORARY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >