[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435651#comment-15435651 ] Barry Becker commented on SPARK-17219: -- If the decision is to have an additional nul

[jira] [Created] (SPARK-17229) Postgres JDBC dialect should not widen float and short types during reads

2016-08-24 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17229: -- Summary: Postgres JDBC dialect should not widen float and short types during reads Key: SPARK-17229 URL: https://issues.apache.org/jira/browse/SPARK-17229 Project: Spark

[jira] [Commented] (SPARK-17228) Not infer/propagate non-deterministic constraints

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435624#comment-15435624 ] Apache Spark commented on SPARK-17228: -- User 'sameeragarwal' has created a pull requ

[jira] [Assigned] (SPARK-17228) Not infer/propagate non-deterministic constraints

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17228: Assignee: (was: Apache Spark) > Not infer/propagate non-deterministic constraints > --

[jira] [Assigned] (SPARK-17228) Not infer/propagate non-deterministic constraints

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17228: Assignee: Apache Spark > Not infer/propagate non-deterministic constraints > -

[jira] [Commented] (SPARK-15083) History Server would OOM due to unlimited TaskUIData in some stages

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435623#comment-15435623 ] Apache Spark commented on SPARK-15083: -- User 'ajbozarth' has created a pull request

[jira] [Commented] (SPARK-17156) Add multiclass logistic regression Scala Example

2016-08-24 Thread Jacek Laskowski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435611#comment-15435611 ] Jacek Laskowski commented on SPARK-17156: - The code's here - https://github.com/

[jira] [Created] (SPARK-17228) Not infer/propagate non-deterministic constraints

2016-08-24 Thread Sameer Agarwal (JIRA)
Sameer Agarwal created SPARK-17228: -- Summary: Not infer/propagate non-deterministic constraints Key: SPARK-17228 URL: https://issues.apache.org/jira/browse/SPARK-17228 Project: Spark Issue T

[jira] [Updated] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-08-24 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-16216: -- Assignee: Hyukjin Kwon > CSV data source does not write date and timestamp correctly >

[jira] [Commented] (SPARK-16216) CSV data source does not write date and timestamp correctly

2016-08-24 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435602#comment-15435602 ] Herman van Hovell commented on SPARK-16216: --- I have merged https://github.com/a

[jira] [Resolved] (SPARK-16191) Code-Generated SpecificColumnarIterator fails for wide pivot with caching

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16191. --- Resolution: Duplicate > Code-Generated SpecificColumnarIterator fails for wide pivot with caching > -

[jira] [Reopened] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-16845: --- Reversing myself: very similar error but not the same site exactly. There is reason to believe it's not

[jira] [Commented] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435594#comment-15435594 ] Sean Owen commented on SPARK-15285: --- I'm going to un-resolve https://issues.apache.org/

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435588#comment-15435588 ] Sean Owen commented on SPARK-17219: --- Yes, those seem like the 3 options. Hm, I'm reluct

[jira] [Commented] (SPARK-15285) Generated SpecificSafeProjection.apply method grows beyond 64 KB

2016-08-24 Thread K (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435586#comment-15435586 ] K commented on SPARK-15285: --- I am not sure if this issue is fixed, I was able to reproduce with

[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435551#comment-15435551 ] Sean Owen commented on SPARK-17227: --- Making some things simply configurable seems uncon

[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435539#comment-15435539 ] Andrew Ash commented on SPARK-17227: Rob and I work together, and we've seen datasets

[jira] [Resolved] (SPARK-15083) History Server would OOM due to unlimited TaskUIData in some stages

2016-08-24 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-15083. --- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.1.0 > History Serve

[jira] [Updated] (SPARK-17222) Support multline csv records

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17222: -- Issue Type: Improvement (was: Bug) I suspect this one is tough because the CSV will already be parsed

[jira] [Updated] (SPARK-17224) Support skipping multiple header rows in csv

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17224: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Maybe let's see about triaging

[jira] [Updated] (SPARK-17222) Support multline csv records

2016-08-24 Thread Robert Kruszewski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kruszewski updated SPARK-17222: -- Description: Below should be read as one record and currently it won't be since files a

[jira] [Updated] (SPARK-17225) Support multiple null values in csv files

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17225: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > Support multiple null values

[jira] [Updated] (SPARK-17226) Allow defining multiple date formats per column in csv

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17226: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Read through https://cwiki.apa

[jira] [Updated] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17227: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) [~robert3005] hardly a bug > A

[jira] [Resolved] (SPARK-17223) "grows beyond 64 KB" with data frame with many columns

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17223. --- Resolution: Duplicate > "grows beyond 64 KB" with data frame with many columns >

[jira] [Resolved] (SPARK-17092) DataFrame with large number of columns causing code generation error

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17092. --- Resolution: Duplicate > DataFrame with large number of columns causing code generation error > --

[jira] [Created] (SPARK-17227) Allow configuring record delimiter in csv

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17227: - Summary: Allow configuring record delimiter in csv Key: SPARK-17227 URL: https://issues.apache.org/jira/browse/SPARK-17227 Project: Spark Issue Typ

[jira] [Created] (SPARK-17226) Allow defining multiple date formats per column in csv

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17226: - Summary: Allow defining multiple date formats per column in csv Key: SPARK-17226 URL: https://issues.apache.org/jira/browse/SPARK-17226 Project: Spark

[jira] [Updated] (SPARK-17223) "grows beyond 64 KB" with data frame with many columns

2016-08-24 Thread K (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] K updated SPARK-17223: -- Description: Hi everyone, We have a dataset with ~500 column. If I called a LabelIndexer on it and tried to print ou

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results

2016-08-24 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435516#comment-15435516 ] Dongjoon Hyun commented on SPARK-17211: --- Is there any way to meet that situation wi

[jira] [Updated] (SPARK-17223) "grows beyond 64 KB" with data frame with many columns

2016-08-24 Thread K (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] K updated SPARK-17223: -- Description: Hi everyone, We have a dataset with ~500 column. If I called a LabelIndexer on it and tried to print ou

[jira] [Created] (SPARK-17225) Support multiple null values in csv files

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17225: - Summary: Support multiple null values in csv files Key: SPARK-17225 URL: https://issues.apache.org/jira/browse/SPARK-17225 Project: Spark Issue Typ

[jira] [Created] (SPARK-17224) Support skipping multiple header rows in csv

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17224: - Summary: Support skipping multiple header rows in csv Key: SPARK-17224 URL: https://issues.apache.org/jira/browse/SPARK-17224 Project: Spark Issue

[jira] [Created] (SPARK-17223) "grows beyond 64 KB" with data frame with many columns

2016-08-24 Thread K (JIRA)
K created SPARK-17223: - Summary: "grows beyond 64 KB" with data frame with many columns Key: SPARK-17223 URL: https://issues.apache.org/jira/browse/SPARK-17223 Project: Spark Issue Type: Bug Co

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results

2016-08-24 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435512#comment-15435512 ] Dongjoon Hyun commented on SPARK-17211: --- Up to now, this seems to happen in EMR 5.0

[jira] [Created] (SPARK-17222) Support multline csv records

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17222: - Summary: Support multline csv records Key: SPARK-17222 URL: https://issues.apache.org/jira/browse/SPARK-17222 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435497#comment-15435497 ] Seth Hendrickson commented on SPARK-17163: -- I completely agree about the fact th

[jira] [Resolved] (SPARK-16983) Add `prettyName` to row_number, dense_rank, percent_rank, cume_dist

2016-08-24 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-16983. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 > Add

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results

2016-08-24 Thread Himanish Kushary (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435490#comment-15435490 ] Himanish Kushary commented on SPARK-17211: -- [~dongjoon] [~jseppanen] I am also s

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435484#comment-15435484 ] Barry Becker commented on SPARK-17219: -- Nulls were not accepted in the column. I had

[jira] [Updated] (SPARK-16781) java launched by PySpark as gateway may not be the same java used in the spark environment

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-16781: -- Assignee: Sean Owen Updated to 0.10.3 to enable this to pick up JAVA_HOME > java launched by PySpark a

[jira] [Resolved] (SPARK-16781) java launched by PySpark as gateway may not be the same java used in the spark environment

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16781. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull request

[jira] [Assigned] (SPARK-17221) Build File-based Test Cases for Using Join and Left-Semi Join

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17221: Assignee: Apache Spark > Build File-based Test Cases for Using Join and Left-Semi Join > -

[jira] [Comment Edited] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435470#comment-15435470 ] Weiqing Yang edited comment on SPARK-17220 at 8/24/16 6:56 PM:

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435469#comment-15435469 ] Sean Owen commented on SPARK-17219: --- These aren't null though, but NaN. There's no mean

[jira] [Commented] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435470#comment-15435470 ] Weiqing Yang commented on SPARK-17220: -- Oh. I see. Thanks for resolved this. > Upgr

[jira] [Resolved] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17220. --- Resolution: Duplicate This is the resolution to https://issues.apache.org/jira/browse/SPARK-16781 >

[jira] [Assigned] (SPARK-17221) Build File-based Test Cases for Using Join and Left-Semi Join

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17221: Assignee: (was: Apache Spark) > Build File-based Test Cases for Using Join and Left-Se

[jira] [Commented] (SPARK-17221) Build File-based Test Cases for Using Join and Left-Semi Join

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435465#comment-15435465 ] Apache Spark commented on SPARK-17221: -- User 'gatorsmile' has created a pull request

[jira] [Created] (SPARK-17221) Build File-based Test Cases for Using Join and Left-Semi Join

2016-08-24 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17221: --- Summary: Build File-based Test Cases for Using Join and Left-Semi Join Key: SPARK-17221 URL: https://issues.apache.org/jira/browse/SPARK-17221 Project: Spark Issue Ty

[jira] [Commented] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435449#comment-15435449 ] Joseph K. Bradley commented on SPARK-17163: --- My 2 cents: *Intercept/coefficien

[jira] [Updated] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiqing Yang updated SPARK-17220: - Description: Py4J 0.10.3 has landed. It includes some important bug fixes. For example: Both side

[jira] [Created] (SPARK-17220) Upgrade Py4J to 0.10.3

2016-08-24 Thread Weiqing Yang (JIRA)
Weiqing Yang created SPARK-17220: Summary: Upgrade Py4J to 0.10.3 Key: SPARK-17220 URL: https://issues.apache.org/jira/browse/SPARK-17220 Project: Spark Issue Type: Improvement Re

[jira] [Resolved] (SPARK-16445) Multilayer Perceptron Classifier wrapper in SparkR

2016-08-24 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16445. -- Resolution: Fixed Fix Version/s: 2.1.0 > Multilayer Perceptron Classifier wrapper in Spa

[jira] [Commented] (SPARK-17110) Pyspark with locality ANY throw java.io.StreamCorruptedException

2016-08-24 Thread Tomer Kaftan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435418#comment-15435418 ] Tomer Kaftan commented on SPARK-17110: -- [~radost...@gmail.com] Unfortunately we have

[jira] [Comment Edited] (SPARK-17211) Broadcast join produces incorrect results

2016-08-24 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435406#comment-15435406 ] Dongjoon Hyun edited comment on SPARK-17211 at 8/24/16 6:08 PM: ---

[jira] [Commented] (SPARK-17211) Broadcast join produces incorrect results

2016-08-24 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435406#comment-15435406 ] Dongjoon Hyun commented on SPARK-17211: --- Hi, [~jseppanen]. Thank you for the repor

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435403#comment-15435403 ] Barry Becker commented on SPARK-17219: -- There needs to be some way to handle null va

[jira] [Created] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
Barry Becker created SPARK-17219: Summary: QuantileDiscretizer does strange things with NaN values Key: SPARK-17219 URL: https://issues.apache.org/jira/browse/SPARK-17219 Project: Spark Issue

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435391#comment-15435391 ] Sean Owen commented on SPARK-17219: --- Ah this is because NaN != NaN. Where it ends up is

[jira] [Updated] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-17086: - Attachment: titanic.csv > QuantileDiscretizer throws InvalidArgumentException (parameter splits g

[jira] [Commented] (SPARK-3162) Train DecisionTree locally when possible

2016-08-24 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435359#comment-15435359 ] Joseph K. Bradley commented on SPARK-3162: -- The doc looks reasonable to me. [~yu

[jira] [Commented] (SPARK-10297) When save data to a data source table, we should bound the size of a saved file

2016-08-24 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435349#comment-15435349 ] Yin Huai commented on SPARK-10297: -- yea. I guess it may be still good to have relative s

[jira] [Updated] (SPARK-17093) Roundtrip encoding of array> fields is wrong when whole-stage codegen is disabled

2016-08-24 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-17093: --- Priority: Blocker (was: Critical) > Roundtrip encoding of array> fields is wrong when whole-stage >

[jira] [Resolved] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17218. --- Resolution: Duplicate Have a look at JIRA first, several related issues already. > Caching a DataFra

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435236#comment-15435236 ] Barry Becker commented on SPARK-17086: -- Thanks. BTW, I hope there are some test cas

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Mohit Bansal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435257#comment-15435257 ] Mohit Bansal commented on SPARK-17214: -- If I am not wrong, Following command is used

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435237#comment-15435237 ] Sean Owen commented on SPARK-17214: --- Your error however shows {code} > org.apache.sp

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Mohit Bansal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435232#comment-15435232 ] Mohit Bansal commented on SPARK-17214: -- [~srowen] Sorry, but I don't think so... I

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435173#comment-15435173 ] Sean Owen commented on SPARK-17214: --- Duplicate of things like https://issues.apache.org

[jira] [Commented] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Mohit Bansal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435135#comment-15435135 ] Mohit Bansal commented on SPARK-17214: -- [~sowen] Done > How to deal with dots (.)

[jira] [Updated] (SPARK-17214) How to deal with dots (.) present in column names in SparkR

2016-08-24 Thread Mohit Bansal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohit Bansal updated SPARK-17214: - Description: I am trying to load a local csv file into SparkR, which contains dots in column nam

[jira] [Updated] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shea Parkes updated SPARK-17218: Description: Caching a DataFrame with >200 columns causes the contents to be ~nulled. This is qui

[jira] [Created] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
Shea Parkes created SPARK-17218: --- Summary: Caching a DataFrame with >200 columns ~nulls the contents Key: SPARK-17218 URL: https://issues.apache.org/jira/browse/SPARK-17218 Project: Spark Issue

[jira] [Updated] (SPARK-17218) Caching a DataFrame with >200 columns ~nulls the contents

2016-08-24 Thread Shea Parkes (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shea Parkes updated SPARK-17218: Environment: Microsoft Windows 10 Python v3.5.x Standalone Spark Cluster was: Microsoft Windows

[jira] [Updated] (SPARK-17217) Codegeneration fails for describe() on many columns

2016-08-24 Thread Kalle Jepsen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kalle Jepsen updated SPARK-17217: - Description: Consider the following minimal python script: {code:python} import pyspark from pys

[jira] [Created] (SPARK-17217) Codegeneration fails for describe() on many columns

2016-08-24 Thread Kalle Jepsen (JIRA)
Kalle Jepsen created SPARK-17217: Summary: Codegeneration fails for describe() on many columns Key: SPARK-17217 URL: https://issues.apache.org/jira/browse/SPARK-17217 Project: Spark Issue Typ

[jira] [Assigned] (SPARK-17216) Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17216: Assignee: Apache Spark > Even timeline for a stage doesn't core 100% of the bar timeline b

[jira] [Assigned] (SPARK-17216) Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17216: Assignee: (was: Apache Spark) > Even timeline for a stage doesn't core 100% of the bar

[jira] [Commented] (SPARK-17216) Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435046#comment-15435046 ] Apache Spark commented on SPARK-17216: -- User 'robert3005' has created a pull request

[jira] [Created] (SPARK-17216) Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome

2016-08-24 Thread Robert Kruszewski (JIRA)
Robert Kruszewski created SPARK-17216: - Summary: Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome Key: SPARK-17216 URL: https://issues.apache.org/jira/browse/SPARK-17216

[jira] [Assigned] (SPARK-17215) Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17215: Assignee: (was: Apache Spark) > Method `SQLContext.parseDataType(dataTypeString: Strin

[jira] [Commented] (SPARK-17215) Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435034#comment-15435034 ] Apache Spark commented on SPARK-17215: -- User 'jiangxb1987' has created a pull reques

[jira] [Assigned] (SPARK-17215) Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17215: Assignee: Apache Spark > Method `SQLContext.parseDataType(dataTypeString: String)` could b

[jira] [Resolved] (SPARK-16624) Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16624. --- Resolution: Duplicate > Generated SpecificColumnarIterator code can exceed JVM size limit for cached

[jira] [Created] (SPARK-17215) Method `SQLContext.parseDataType(dataTypeString: String)` could be removed.

2016-08-24 Thread Jiang Xingbo (JIRA)
Jiang Xingbo created SPARK-17215: Summary: Method `SQLContext.parseDataType(dataTypeString: String)` could be removed. Key: SPARK-17215 URL: https://issues.apache.org/jira/browse/SPARK-17215 Project:

[jira] [Resolved] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16845. --- Resolution: Duplicate > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" >

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-08-24 Thread David Jung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435016#comment-15435016 ] David Jung commented on SPARK-16845: In addition to receiving this error when attempt

[jira] [Commented] (SPARK-17214) Even after replacing the column names having dots , still it is referring to previous column names in SparkR ref: http://stackoverflow.com/questions/39125255/how-to-de

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434991#comment-15434991 ] Sean Owen commented on SPARK-17214: --- You'll have to turn this into a proper title / des

[jira] [Created] (SPARK-17214) Even after replacing the column names having dots , still it is referring to previous column names in SparkR ref: http://stackoverflow.com/questions/39125255/how-to-deal

2016-08-24 Thread Mohit Bansal (JIRA)
Mohit Bansal created SPARK-17214: Summary: Even after replacing the column names having dots , still it is referring to previous column names in SparkR ref: http://stackoverflow.com/questions/39125255/how-to-deal-with-dots-in-column-names-in-spar

[jira] [Commented] (SPARK-17209) Support manual credential updating in the run-time for Spark on YARN

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434978#comment-15434978 ] Apache Spark commented on SPARK-17209: -- User 'jerryshao' has created a pull request

[jira] [Assigned] (SPARK-17209) Support manual credential updating in the run-time for Spark on YARN

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17209: Assignee: Apache Spark > Support manual credential updating in the run-time for Spark on Y

[jira] [Assigned] (SPARK-17209) Support manual credential updating in the run-time for Spark on YARN

2016-08-24 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17209: Assignee: (was: Apache Spark) > Support manual credential updating in the run-time for

[jira] [Updated] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17086: -- Affects Version/s: (was: 2.1.0) 2.0.0 Fix Version/s: 2.0.1 OK, seems

[jira] [Comment Edited] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434805#comment-15434805 ] Barry Becker edited comment on SPARK-17086 at 8/24/16 12:18 PM: ---

[jira] [Commented] (SPARK-17086) QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434805#comment-15434805 ] Barry Becker commented on SPARK-17086: -- Is it possible to get this fix into 2.0.1? M

[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434798#comment-15434798 ] Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:12 PM:

[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412 ] Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:11 PM:

[jira] [Comment Edited] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412 ] Yanbo Liang edited comment on SPARK-17163 at 8/24/16 12:10 PM:

[jira] [Commented] (SPARK-17163) Decide on unified multinomial and binary logistic regression interfaces

2016-08-24 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434798#comment-15434798 ] Yanbo Liang commented on SPARK-17163: - Think more about this problem, I change my min

<    1   2   3   >