[jira] [Updated] (SPARK-21031) Add `alterTableStats` to store spark's stats and let `alterTable` keep existing stats

2017-06-09 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21031: - Summary: Add `alterTableStats` to store spark's stats and let `alterTable` keep existing stats

[jira] [Updated] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21031: - Description: Currently, hive's stats are read into `CatalogStatistics`, while spark's stats

[jira] [Assigned] (SPARK-20953) Add hash map metrics to aggregate and join

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20953: Assignee: (was: Apache Spark) > Add hash map metrics to aggregate and join >

[jira] [Commented] (SPARK-20953) Add hash map metrics to aggregate and join

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045382#comment-16045382 ] Apache Spark commented on SPARK-20953: -- User 'viirya' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20953) Add hash map metrics to aggregate and join

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20953: Assignee: Apache Spark > Add hash map metrics to aggregate and join >

[jira] [Commented] (SPARK-12009) Avoid re-allocate yarn container while driver want to stop all Executors

2017-06-09 Thread lida (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045364#comment-16045364 ] lida commented on SPARK-12009: -- AsynchronousListenerBus class def post(event: E) { if (stopped.get) {

[jira] [Commented] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2017-06-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045356#comment-16045356 ] Wenchen Fan commented on SPARK-17642: - yea let's revisit > support DESC FORMATTED TABLE COLUMN

[jira] [Updated] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21044: -- Description: This issue aims to add an optimizer remove invalid `Range` operator from the

[jira] [Assigned] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21041: Assignee: (was: Apache Spark) > With whole-stage codegen, SparkSession.range()'s

[jira] [Assigned] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21041: Assignee: Apache Spark > With whole-stage codegen, SparkSession.range()'s behavior is

[jira] [Assigned] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21044: Assignee: Apache Spark > Add `RemoveInvalidRange` optimizer >

[jira] [Commented] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045322#comment-16045322 ] Apache Spark commented on SPARK-21044: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Assigned] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21044: Assignee: (was: Apache Spark) > Add `RemoveInvalidRange` optimizer >

[jira] [Commented] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045323#comment-16045323 ] Apache Spark commented on SPARK-21041: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Commented] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045320#comment-16045320 ] Dongjoon Hyun commented on SPARK-21041: --- Although this is a correctness issue, this is not a

[jira] [Updated] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21041: -- Affects Version/s: 2.0.2 > With whole-stage codegen, SparkSession.range()'s behavior is

[jira] [Updated] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21041: -- Affects Version/s: 2.1.1 > With whole-stage codegen, SparkSession.range()'s behavior is

[jira] [Updated] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21044: -- Description: This issue aims to add an optimizer remove invalid `Range` operator from the

[jira] [Created] (SPARK-21044) Add `RemoveInvalidRange` optimizer

2017-06-09 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-21044: - Summary: Add `RemoveInvalidRange` optimizer Key: SPARK-21044 URL: https://issues.apache.org/jira/browse/SPARK-21044 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-21043) Add unionByName API to Dataset

2017-06-09 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-21043: Description: It would be useful to add unionByName which resolves columns by name, in addition to

[jira] [Created] (SPARK-21043) Add unionByName API to Dataset

2017-06-09 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-21043: --- Summary: Add unionByName API to Dataset Key: SPARK-21043 URL: https://issues.apache.org/jira/browse/SPARK-21043 Project: Spark Issue Type: New Feature

[jira] [Resolved] (SPARK-21042) Document Dataset.union is resolution by position, not name

2017-06-09 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-21042. - Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 > Document Dataset.union

[jira] [Commented] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045304#comment-16045304 ] Dongjoon Hyun commented on SPARK-21041: --- I'll make a PR for this. > With whole-stage codegen,

[jira] [Commented] (SPARK-19360) Spark 2.X does not support stored by clause

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045299#comment-16045299 ] Dongjoon Hyun commented on SPARK-19360: --- Sorry, I'm not sure for this, too. > Spark 2.X does not

[jira] [Commented] (SPARK-19360) Spark 2.X does not support stored by clause

2017-06-09 Thread Dayou Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045291#comment-16045291 ] Dayou Zhou commented on SPARK-19360: [~dongjoon] is it possible to get a fix for this? Thanks. >

[jira] [Updated] (SPARK-21028) Parallel One vs. Rest Classifier Scala

2017-06-09 Thread Ajay Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Saini updated SPARK-21028: --- Description: Right now the Scala implementation of the one vs. rest algorithm allows for parallelism

[jira] [Commented] (SPARK-21027) Parallel One vs. Rest Classifier

2017-06-09 Thread Ajay Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045245#comment-16045245 ] Ajay Saini commented on SPARK-21027: Yup! Modified the Scala ticket to indicate the necessity of

[jira] [Commented] (SPARK-21027) Parallel One vs. Rest Classifier

2017-06-09 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045228#comment-16045228 ] Joseph K. Bradley commented on SPARK-21027: --- One other comment: The Scala implementation can be

[jira] [Commented] (SPARK-21042) Document Dataset.union is resolution by position, not name

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045202#comment-16045202 ] Apache Spark commented on SPARK-21042: -- User 'rxin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21042) Document Dataset.union is resolution by position, not name

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21042: Assignee: Reynold Xin (was: Apache Spark) > Document Dataset.union is resolution by

[jira] [Assigned] (SPARK-21042) Document Dataset.union is resolution by position, not name

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21042: Assignee: Apache Spark (was: Reynold Xin) > Document Dataset.union is resolution by

[jira] [Created] (SPARK-21042) Document Dataset.union is resolution by position, not name

2017-06-09 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-21042: --- Summary: Document Dataset.union is resolution by position, not name Key: SPARK-21042 URL: https://issues.apache.org/jira/browse/SPARK-21042 Project: Spark

[jira] [Comment Edited] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045180#comment-16045180 ] Dongjoon Hyun edited comment on SPARK-21041 at 6/9/17 11:02 PM: Yep. It

[jira] [Commented] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045180#comment-16045180 ] Dongjoon Hyun commented on SPARK-21041: --- Yep. It looks like the generated code has a logical bug on

[jira] [Commented] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2017-06-09 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045174#comment-16045174 ] Zhenhua Wang commented on SPARK-17642: -- [~smilegator] [~cloud_fan] Shall we revisit the pr and see

[jira] [Commented] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045146#comment-16045146 ] Dongjoon Hyun commented on SPARK-21041: --- Oh, interesting. I'll take a look. [~rednaxelafx]. > With

[jira] [Updated] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Kris Mok (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kris Mok updated SPARK-21041: - Description: When whole-stage codegen is enabled, in face of integer overflow, SparkSession.range()'s

[jira] [Created] (SPARK-21041) With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range()

2017-06-09 Thread Kris Mok (JIRA)
Kris Mok created SPARK-21041: Summary: With whole-stage codegen, SparkSession.range()'s behavior is inconsistent with SparkContext.range() Key: SPARK-21041 URL: https://issues.apache.org/jira/browse/SPARK-21041

[jira] [Commented] (SPARK-21038) Reduce redundant generated init code in Catalyst codegen

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044927#comment-16044927 ] Apache Spark commented on SPARK-21038: -- User 'rednaxelafx' has created a pull request for this

[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final

2017-06-09 Thread Charles Allen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044925#comment-16044925 ] Charles Allen commented on SPARK-19552: --- This is starting to show problems on our side due to

[jira] [Assigned] (SPARK-21038) Reduce redundant generated init code in Catalyst codegen

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21038: Assignee: Apache Spark > Reduce redundant generated init code in Catalyst codegen >

[jira] [Assigned] (SPARK-21038) Reduce redundant generated init code in Catalyst codegen

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21038: Assignee: (was: Apache Spark) > Reduce redundant generated init code in Catalyst

[jira] [Created] (SPARK-21040) On executor/worker decommission consider speculatively re-launching current tasks

2017-06-09 Thread holdenk (JIRA)
holdenk created SPARK-21040: --- Summary: On executor/worker decommission consider speculatively re-launching current tasks Key: SPARK-21040 URL: https://issues.apache.org/jira/browse/SPARK-21040 Project:

[jira] [Created] (SPARK-21039) Use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter

2017-06-09 Thread Lovasoa (JIRA)
Lovasoa created SPARK-21039: --- Summary: Use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter Key: SPARK-21039 URL: https://issues.apache.org/jira/browse/SPARK-21039 Project: Spark

[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044867#comment-16044867 ] Sean Owen commented on SPARK-19552: --- [~rabbitonweb] [~leventov] go ahead and make the changes to make

[jira] [Commented] (SPARK-21027) Parallel One vs. Rest Classifier

2017-06-09 Thread Ajay Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044863#comment-16044863 ] Ajay Saini commented on SPARK-21027: Modified the description to include a description of the

[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final

2017-06-09 Thread Roman Leventov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044862#comment-16044862 ] Roman Leventov commented on SPARK-19552: [~rabbitonweb] +1, this is a very annoying

[jira] [Updated] (SPARK-21027) Parallel One vs. Rest Classifier

2017-06-09 Thread Ajay Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Saini updated SPARK-21027: --- Description: Currently, the Scala implementation of OneVsRest allows the user to run a parallel

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044827#comment-16044827 ] Apache Spark commented on SPARK-18838: -- User 'bOOm-X' has created a pull request for this issue:

[jira] [Commented] (SPARK-21038) Reduce redundant generated init code in Catalyst codegen

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044810#comment-16044810 ] Sean Owen commented on SPARK-21038: --- I double-checked, and yes you're right that initializing a field

[jira] [Commented] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2017-06-09 Thread Maria (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044798#comment-16044798 ] Maria commented on SPARK-17642: --- @gatorsmile, I'm evaluating CBO in Spark and I'm missing this feature.

[jira] [Created] (SPARK-21038) Reduce redundant generated init code in Catalyst codegen

2017-06-09 Thread Kris Mok (JIRA)
Kris Mok created SPARK-21038: Summary: Reduce redundant generated init code in Catalyst codegen Key: SPARK-21038 URL: https://issues.apache.org/jira/browse/SPARK-21038 Project: Spark Issue Type:

[jira] [Created] (SPARK-21037) ignoreNulls does not working properly with window functions

2017-06-09 Thread Stanislav Chernichkin (JIRA)
Stanislav Chernichkin created SPARK-21037: - Summary: ignoreNulls does not working properly with window functions Key: SPARK-21037 URL: https://issues.apache.org/jira/browse/SPARK-21037

[jira] [Resolved] (SPARK-20987) columns with name having dots caused issues with VectorAssemblor

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20987. --- Resolution: Duplicate > columns with name having dots caused issues with VectorAssemblor >

[jira] [Resolved] (SPARK-20662) Block jobs that have greater than a configured number of tasks

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20662. --- Resolution: Won't Fix > Block jobs that have greater than a configured number of tasks >

[jira] [Updated] (SPARK-21029) All StreamingQuery should be stopped when the SparkSession is stopped

2017-06-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21029: - Issue Type: Improvement (was: Bug) > All StreamingQuery should be stopped when the SparkSession

[jira] [Resolved] (SPARK-20918) Use FunctionIdentifier as function identifiers in FunctionRegistry

2017-06-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20918. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18142

[jira] [Created] (SPARK-21036) Truncate action and writes should be in one transaction.

2017-06-09 Thread Talat UYARER (JIRA)
Talat UYARER created SPARK-21036: Summary: Truncate action and writes should be in one transaction. Key: SPARK-21036 URL: https://issues.apache.org/jira/browse/SPARK-21036 Project: Spark

[jira] [Commented] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?

2017-06-09 Thread Ilyes Hachani (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044508#comment-16044508 ] Ilyes Hachani commented on SPARK-12606: --- Any update on this? I am using Java and extending the

[jira] [Closed] (SPARK-21011) RDD filter can combine/corrupt columns

2017-06-09 Thread Steven Landes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Landes closed SPARK-21011. - Resolution: Cannot Reproduce My bad, apparently somehow some user managed to get a backspace

[jira] [Resolved] (SPARK-21025) missing data in jsc.union

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21025. --- Resolution: Not A Problem Different bug, but still a bug in your code. Please don't reopen this as a

[jira] [Closed] (SPARK-21025) missing data in jsc.union

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-21025. - > missing data in jsc.union > - > > Key: SPARK-21025 >

[jira] [Assigned] (SPARK-20997) spark-submit's --driver-cores marked as "YARN-only" but listed under "Spark standalone with cluster deploy mode only"

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20997: - Assignee: guoxiaolongzte > spark-submit's --driver-cores marked as "YARN-only" but listed under

[jira] [Resolved] (SPARK-20997) spark-submit's --driver-cores marked as "YARN-only" but listed under "Spark standalone with cluster deploy mode only"

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20997. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18241

[jira] [Commented] (SPARK-20696) tf-idf document clustering with K-means in Apache Spark putting points into one cluster

2017-06-09 Thread Nassir (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044356#comment-16044356 ] Nassir commented on SPARK-20696: This appears to be a problem with the implementation of K-means

[jira] [Commented] (SPARK-21032) Support add_years and add_days functions

2017-06-09 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044354#comment-16044354 ] Yuming Wang commented on SPARK-21032: - Do we really need these functions? {code:sql}

[jira] [Commented] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044350#comment-16044350 ] Apache Spark commented on SPARK-17914: -- User 'aokolnychyi' has created a pull request for this

[jira] [Assigned] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17914: Assignee: (was: Apache Spark) > Spark SQL casting to TimestampType with nanosecond

[jira] [Assigned] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17914: Assignee: Apache Spark > Spark SQL casting to TimestampType with nanosecond results in

[jira] [Commented] (SPARK-21035) support spark-sql on yarn cluser

2017-06-09 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044299#comment-16044299 ] 吴志龙 commented on SPARK-21035: - Can you tell that specific topic? > support spark-sql on yarn cluser >

[jira] [Comment Edited] (SPARK-21035) support spark-sql on yarn cluser

2017-06-09 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044299#comment-16044299 ] 吴志龙 edited comment on SPARK-21035 at 6/9/17 10:48 AM: -- Can you tell me that specific

[jira] [Comment Edited] (SPARK-20998) BroadcastHashJoin producing wrong results

2017-06-09 Thread Mohit (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044275#comment-16044275 ] Mohit edited comment on SPARK-20998 at 6/9/17 10:30 AM: It was observed that this

[jira] [Closed] (SPARK-20998) BroadcastHashJoin producing wrong results

2017-06-09 Thread Mohit (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit closed SPARK-20998. - Resolution: Fixed This has already been fixed in 2.1.0 onwards > BroadcastHashJoin producing wrong results >

[jira] [Updated] (SPARK-20998) BroadcastHashJoin producing wrong results

2017-06-09 Thread Mohit (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit updated SPARK-20998: -- Fix Version/s: 2.1.0 > BroadcastHashJoin producing wrong results > - >

[jira] [Commented] (SPARK-20998) BroadcastHashJoin producing wrong results

2017-06-09 Thread Mohit (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044274#comment-16044274 ] Mohit commented on SPARK-20998: --- The issue seems to be fixed in . 2.1.0. I shall close this with Fix

[jira] [Resolved] (SPARK-21035) support spark-sql on yarn cluser

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21035. --- Resolution: Invalid This sounds like a topic to discuss on a mailing list > support spark-sql on

[jira] [Created] (SPARK-21035) support spark-sql on yarn cluser

2017-06-09 Thread JIRA
吴志龙 created SPARK-21035: --- Summary: support spark-sql on yarn cluser Key: SPARK-21035 URL: https://issues.apache.org/jira/browse/SPARK-21035 Project: Spark Issue Type: Wish Components: SQL

[jira] [Resolved] (SPARK-21026) Document jenkins plug-ins assumed by the spark documentation build

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21026. --- Resolution: Invalid This is a question for the mailing list but docs/README.md probably answers this

[jira] [Assigned] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21033: Assignee: Apache Spark (was: Wenchen Fan) > fix the potential OOM in

[jira] [Commented] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044162#comment-16044162 ] Apache Spark commented on SPARK-21033: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21033: Assignee: Wenchen Fan (was: Apache Spark) > fix the potential OOM in

[jira] [Created] (SPARK-21034) Filter not getting pushed down the groupBy clause when first() or last() aggregate function is used

2017-06-09 Thread Abhijit Bhole (JIRA)
Abhijit Bhole created SPARK-21034: - Summary: Filter not getting pushed down the groupBy clause when first() or last() aggregate function is used Key: SPARK-21034 URL:

[jira] [Created] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-06-09 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-21033: --- Summary: fix the potential OOM in UnsafeExternalSorter Key: SPARK-21033 URL: https://issues.apache.org/jira/browse/SPARK-21033 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-20995) 'Spark-env.sh.template' should add 'YARN_CONF_DIR' configuration instructions.

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20995. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18212

[jira] [Assigned] (SPARK-20995) 'Spark-env.sh.template' should add 'YARN_CONF_DIR' configuration instructions.

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-20995: - Assignee: guoxiaolongzte > 'Spark-env.sh.template' should add 'YARN_CONF_DIR' configuration

[jira] [Resolved] (SPARK-14408) Update RDD.treeAggregate not to use reduce

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-14408. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18198

[jira] [Commented] (SPARK-21027) Parallel One vs. Rest Classifier

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044102#comment-16044102 ] Sean Owen commented on SPARK-21027: --- I don't see a reason for multiple JIRAs. Parallel in what sense? I

[jira] [Resolved] (SPARK-21028) Parallel One vs. Rest Classifier Scala

2017-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21028. --- Resolution: Duplicate > Parallel One vs. Rest Classifier Scala >

[jira] [Created] (SPARK-21032) Support add_years and add_days functions

2017-06-09 Thread darion yaphet (JIRA)
darion yaphet created SPARK-21032: - Summary: Support add_years and add_days functions Key: SPARK-21032 URL: https://issues.apache.org/jira/browse/SPARK-21032 Project: Spark Issue Type: New

[jira] [Assigned] (SPARK-21024) CSV parse mode handles Univocity parser exceptions

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21024: Assignee: Apache Spark > CSV parse mode handles Univocity parser exceptions >

[jira] [Assigned] (SPARK-21024) CSV parse mode handles Univocity parser exceptions

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21024: Assignee: (was: Apache Spark) > CSV parse mode handles Univocity parser exceptions >

[jira] [Commented] (SPARK-21024) CSV parse mode handles Univocity parser exceptions

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044077#comment-16044077 ] Apache Spark commented on SPARK-21024: -- User 'maropu' has created a pull request for this issue:

[jira] [Commented] (SPARK-19937) Collect metrics of block sizes when shuffle.

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044075#comment-16044075 ] Apache Spark commented on SPARK-19937: -- User 'jinxing64' has created a pull request for this issue:

[jira] [Commented] (SPARK-21024) CSV parse mode handles Univocity parser exceptions

2017-06-09 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044052#comment-16044052 ] Takeshi Yamamuro commented on SPARK-21024: -- Thanks! I'll open a new pr later. > CSV parse mode

[jira] [Updated] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21031: - Description: Currently, hive's stats are read into `CatalogStatistics`, while spark's stats are

[jira] [Assigned] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21031: Assignee: (was: Apache Spark) > Clearly separate hive stats and spark stats in

[jira] [Commented] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044009#comment-16044009 ] Apache Spark commented on SPARK-21031: -- User 'wzhfy' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21031: Assignee: Apache Spark > Clearly separate hive stats and spark stats in catalog >

[jira] [Updated] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21031: - Description: Currently, hive's stats are read into `CatalogStatistics`, while spark's stats

[jira] [Created] (SPARK-21031) Clearly separate hive stats and spark stats in catalog

2017-06-09 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21031: Summary: Clearly separate hive stats and spark stats in catalog Key: SPARK-21031 URL: https://issues.apache.org/jira/browse/SPARK-21031 Project: Spark Issue

  1   2   >