[jira] [Created] (SPARK-2378) Implement functionality to read csv files

2014-07-06 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-2378: - Summary: Implement functionality to read csv files Key: SPARK-2378 URL: https://issues.apache.org/jira/browse/SPARK-2378 Project: Spark Issue Type:

[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs

2014-07-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054262#comment-14054262 ] Hossein Falaki commented on SPARK-2360: --- As a point for comparison the interface in

[jira] [Comment Edited] (SPARK-2360) CSV import to SchemaRDDs

2014-07-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054262#comment-14054262 ] Hossein Falaki edited comment on SPARK-2360 at 7/7/14 11:03 PM:

[jira] [Comment Edited] (SPARK-2360) CSV import to SchemaRDDs

2014-07-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054262#comment-14054262 ] Hossein Falaki edited comment on SPARK-2360 at 7/7/14 11:04 PM:

[jira] [Comment Edited] (SPARK-2360) CSV import to SchemaRDDs

2014-07-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054262#comment-14054262 ] Hossein Falaki edited comment on SPARK-2360 at 7/8/14 12:45 AM:

[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs

2014-07-09 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057015#comment-14057015 ] Hossein Falaki commented on SPARK-2360: --- Hi Matthew, I have not fully read RFC

[jira] [Created] (SPARK-2674) Add date and time types to inferSchema

2014-07-24 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-2674: - Summary: Add date and time types to inferSchema Key: SPARK-2674 URL: https://issues.apache.org/jira/browse/SPARK-2674 Project: Spark Issue Type: New

[jira] [Created] (SPARK-2696) Reduce default spark.serializer.objectStreamReset

2014-07-25 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-2696: - Summary: Reduce default spark.serializer.objectStreamReset Key: SPARK-2696 URL: https://issues.apache.org/jira/browse/SPARK-2696 Project: Spark Issue

[jira] [Created] (SPARK-2698) RDD page Spark Web UI bug

2014-07-25 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-2698: - Summary: RDD page Spark Web UI bug Key: SPARK-2698 URL: https://issues.apache.org/jira/browse/SPARK-2698 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs

2014-08-22 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107155#comment-14107155 ] Hossein Falaki commented on SPARK-2360: --- There is a pull request for this issue:

[jira] [Created] (SPARK-3827) Very long RDD names are not rendered properly in web UI

2014-10-06 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-3827: - Summary: Very long RDD names are not rendered properly in web UI Key: SPARK-3827 URL: https://issues.apache.org/jira/browse/SPARK-3827 Project: Spark

[jira] [Created] (SPARK-4135) Error reading Parquet file generated with SparkSQL

2014-10-29 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-4135: - Summary: Error reading Parquet file generated with SparkSQL Key: SPARK-4135 URL: https://issues.apache.org/jira/browse/SPARK-4135 Project: Spark Issue

[jira] [Updated] (SPARK-4135) Error reading Parquet file generated with SparkSQL

2014-10-29 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-4135: -- Attachment: _metadata part-r-1.parquet Files generated by SparkSQL that cannot

[jira] [Commented] (SPARK-2360) CSV import to SchemaRDDs

2014-11-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202805#comment-14202805 ] Hossein Falaki commented on SPARK-2360: --- Sure. CSV import to SchemaRDDs

[jira] [Closed] (SPARK-2360) CSV import to SchemaRDDs

2014-11-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki closed SPARK-2360. - This will be a package using Data Source API CSV import to SchemaRDDs

[jira] [Created] (SPARK-6339) Support creating temporary tables with DDL

2015-03-14 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-6339: - Summary: Support creating temporary tables with DDL Key: SPARK-6339 URL: https://issues.apache.org/jira/browse/SPARK-6339 Project: Spark Issue Type: New

[jira] [Created] (SPARK-8282) Make number of threads used in RBackend configurable

2015-06-09 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-8282: - Summary: Make number of threads used in RBackend configurable Key: SPARK-8282 URL: https://issues.apache.org/jira/browse/SPARK-8282 Project: Spark Issue

[jira] [Updated] (SPARK-8452) expose jobGroup API in SparkR

2015-06-18 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-8452: -- Description: Following job management calls are missing in SparkR: {code} setJobGroup()

[jira] [Created] (SPARK-8452) expose jobGroup API in SparkR

2015-06-18 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-8452: - Summary: expose jobGroup API in SparkR Key: SPARK-8452 URL: https://issues.apache.org/jira/browse/SPARK-8452 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-8742) Improve SparkR error messages for DataFrame API

2015-06-30 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-8742: - Summary: Improve SparkR error messages for DataFrame API Key: SPARK-8742 URL: https://issues.apache.org/jira/browse/SPARK-8742 Project: Spark Issue Type:

[jira] [Commented] (SPARK-9319) Add support for setting column names, types

2015-08-02 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651367#comment-14651367 ] Hossein Falaki commented on SPARK-9319: --- Yes. I will submit a PR. Add support for

[jira] [Commented] (SPARK-9319) Add support for setting column names, types

2015-08-21 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707263#comment-14707263 ] Hossein Falaki commented on SPARK-9319: --- I have a PR half ready. But I got

[jira] [Created] (SPARK-9443) Explose sampleByKey in SparkR

2015-07-29 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-9443: - Summary: Explose sampleByKey in SparkR Key: SPARK-9443 URL: https://issues.apache.org/jira/browse/SPARK-9443 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-9803) Add transform and subset to DataFrame

2015-08-10 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-9803: - Summary: Add transform and subset to DataFrame Key: SPARK-9803 URL: https://issues.apache.org/jira/browse/SPARK-9803 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-11199) Improve R context management story and add getOrCreate

2015-10-20 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965559#comment-14965559 ] Hossein Falaki commented on SPARK-11199: Thanks for filing this [~felixcheung]. I submitted this

[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

2015-10-06 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945835#comment-14945835 ] Hossein Falaki commented on SPARK-9318: --- I agree with the issue being discussed. SparkR should have

[jira] [Commented] (SPARK-10903) Make sqlContext global

2015-10-08 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949657#comment-14949657 ] Hossein Falaki commented on SPARK-10903: I meant the SQL Context:

[jira] [Commented] (SPARK-10903) Make sqlContext global

2015-10-08 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949293#comment-14949293 ] Hossein Falaki commented on SPARK-10903: +1 We have seen a lot of questions from new SparkR users

[jira] [Created] (SPARK-10776) Pass location of SparkR source files from R process to JVM

2015-09-23 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-10776: -- Summary: Pass location of SparkR source files from R process to JVM Key: SPARK-10776 URL: https://issues.apache.org/jira/browse/SPARK-10776 Project: Spark

[jira] [Updated] (SPARK-10711) Do not assume spark.submit.deployMode is always set

2015-09-18 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-10711: --- Affects Version/s: (was: 1.6.0) 1.5.0 Fix Version/s: 1.5.1

[jira] [Created] (SPARK-10711) Do not assume spark.submit.deployMode is always set

2015-09-18 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-10711: -- Summary: Do not assume spark.submit.deployMode is always set Key: SPARK-10711 URL: https://issues.apache.org/jira/browse/SPARK-10711 Project: Spark

[jira] [Created] (SPARK-12104) collect() does not handle multiple columns with same name

2015-12-02 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12104: -- Summary: collect() does not handle multiple columns with same name Key: SPARK-12104 URL: https://issues.apache.org/jira/browse/SPARK-12104 Project: Spark

[jira] [Updated] (SPARK-12671) Improve tests for better coverage

2016-01-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-12671: --- Affects Version/s: 2.0.0 > Improve tests for better coverage >

[jira] [Created] (SPARK-12671) Improve tests for better coverage

2016-01-05 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12671: -- Summary: Improve tests for better coverage Key: SPARK-12671 URL: https://issues.apache.org/jira/browse/SPARK-12671 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12670) Use spark internal utilities wherever possible

2016-01-05 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12670: -- Summary: Use spark internal utilities wherever possible Key: SPARK-12670 URL: https://issues.apache.org/jira/browse/SPARK-12670 Project: Spark Issue

[jira] [Created] (SPARK-12668) Renaming CSV options to be similar to Pandas and R

2016-01-05 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12668: -- Summary: Renaming CSV options to be similar to Pandas and R Key: SPARK-12668 URL: https://issues.apache.org/jira/browse/SPARK-12668 Project: Spark

[jira] [Created] (SPARK-12669) Organize options for default values

2016-01-05 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12669: -- Summary: Organize options for default values Key: SPARK-12669 URL: https://issues.apache.org/jira/browse/SPARK-12669 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-12702) Populate statistics for DataFrame when reading CSV

2016-01-07 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-12702: -- Summary: Populate statistics for DataFrame when reading CSV Key: SPARK-12702 URL: https://issues.apache.org/jira/browse/SPARK-12702 Project: Spark Issue

[jira] [Updated] (SPARK-12420) Have a built-in CSV data source implementation

2015-12-23 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-12420: --- Attachment: Built-in CSV datasource in Spark.pdf Design doc for submitting spark-csv as a

[jira] [Comment Edited] (SPARK-12420) Have a built-in CSV data source implementation

2015-12-23 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070147#comment-15070147 ] Hossein Falaki edited comment on SPARK-12420 at 12/23/15 8:28 PM: -- Just

[jira] [Comment Edited] (SPARK-9325) Support `collect` on DataFrame columns

2015-11-24 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025611#comment-15025611 ] Hossein Falaki edited comment on SPARK-9325 at 11/24/15 10:50 PM: -- To

[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns

2015-11-24 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025611#comment-15025611 ] Hossein Falaki commented on SPARK-9325: --- To help R users and not open up the API, how about adding

[jira] [Created] (SPARK-15516) Schema merging in driver fails for parquet when merging LongType and IntegerType

2016-05-24 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-15516: -- Summary: Schema merging in driver fails for parquet when merging LongType and IntegerType Key: SPARK-15516 URL: https://issues.apache.org/jira/browse/SPARK-15516

[jira] [Commented] (SPARK-16088) Deprecate setJobGroup, clearJobGroup, cancelJobGroup from SparkR API

2016-06-21 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342400#comment-15342400 ] Hossein Falaki commented on SPARK-16088: +1 for keeping the API in 2.x > Deprecate setJobGroup,

[jira] [Commented] (SPARK-15516) Schema merging in driver fails for parquet when merging LongType and IntegerType

2016-06-23 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347454#comment-15347454 ] Hossein Falaki commented on SPARK-15516: Here is an example: {code} create table mytable using

[jira] [Commented] (SPARK-12669) Organize options for default values

2016-01-17 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103949#comment-15103949 ] Hossein Falaki commented on SPARK-12669: Having a namespace for option names is a good

[jira] [Commented] (SPARK-12669) Organize options for default values

2016-01-16 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103584#comment-15103584 ] Hossein Falaki commented on SPARK-12669: This looks like a good categorization. We have full

[jira] [Created] (SPARK-13260) count(*) does not work with CSV data source

2016-02-09 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13260: -- Summary: count(*) does not work with CSV data source Key: SPARK-13260 URL: https://issues.apache.org/jira/browse/SPARK-13260 Project: Spark Issue Type:

[jira] [Created] (SPARK-13261) Expose maxCharactersPerColumn as a user configurable option

2016-02-09 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13261: -- Summary: Expose maxCharactersPerColumn as a user configurable option Key: SPARK-13261 URL: https://issues.apache.org/jira/browse/SPARK-13261 Project: Spark

[jira] [Commented] (SPARK-12669) Organize options for default values

2016-01-19 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107892#comment-15107892 ] Hossein Falaki commented on SPARK-12669: If/when Spark Data sources API supports that, CSV data

[jira] [Commented] (SPARK-13108) Encoding not working with non-ascii compatible encodings (UTF-16/32 etc.)

2016-02-22 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156612#comment-15156612 ] Hossein Falaki commented on SPARK-13108: [~hyukjin.kwon] wouldn't this be an issue for JSON as

[jira] [Created] (SPARK-13866) Handle decimal type

2016-03-14 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13866: -- Summary: Handle decimal type Key: SPARK-13866 URL: https://issues.apache.org/jira/browse/SPARK-13866 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-13866) Handle decimal type in CSV inference

2016-03-14 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-13866: --- Summary: Handle decimal type in CSV inference (was: Handle decimal type) > Handle decimal

[jira] [Created] (SPARK-13792) Limit logging of bad records

2016-03-09 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13792: -- Summary: Limit logging of bad records Key: SPARK-13792 URL: https://issues.apache.org/jira/browse/SPARK-13792 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14223: --- Affects Version/s: 2.0.0 Target Version/s: (was: 2.0.0) > Cannot project all columns

[jira] [Created] (SPARK-14224) Cannot project all columns from a table with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-14224: -- Summary: Cannot project all columns from a table with ~1,100 columns Key: SPARK-14224 URL: https://issues.apache.org/jira/browse/SPARK-14224 Project: Spark

[jira] [Created] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-14223: -- Summary: Cannot project all columns from a parquet files with ~1,100 columns Key: SPARK-14223 URL: https://issues.apache.org/jira/browse/SPARK-14223 Project:

[jira] [Updated] (SPARK-14224) Cannot project all columns from a table with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14224: --- Description: I created a temporary table from 1000 genomes dataset and cached it. When I try

[jira] [Updated] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14223: --- Description: The parquet file is generated by saving first 10 rows of the 1000 genomes

[jira] [Updated] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14223: --- Description: The parquet file is generated by saving first 10 rows of the 1000 genomes

[jira] [Updated] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14223: --- Description: The parquet file is generated by saving first 10 rows of the 1000 genomes

[jira] [Created] (SPARK-14226) Caching a table with 1,100 columns and a few million rows fails

2016-03-28 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-14226: -- Summary: Caching a table with 1,100 columns and a few million rows fails Key: SPARK-14226 URL: https://issues.apache.org/jira/browse/SPARK-14226 Project: Spark

[jira] [Updated] (SPARK-14223) Cannot project all columns from a parquet files with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14223: --- Attachment: 1000genomes.gz.parquet > Cannot project all columns from a parquet files with

[jira] [Updated] (SPARK-14224) Cannot project all columns from a table with ~1,100 columns

2016-03-28 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-14224: --- Target Version/s: 2.0.0 > Cannot project all columns from a table with ~1,100 columns >

[jira] [Commented] (SPARK-14260) Increase default value for maxCharsPerColumn

2016-03-30 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218446#comment-15218446 ] Hossein Falaki commented on SPARK-14260: I think default values should be reasonable for common

[jira] [Created] (SPARK-14143) Options for parsing NaNs, Infinity and nulls for numeric types

2016-03-24 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-14143: -- Summary: Options for parsing NaNs, Infinity and nulls for numeric types Key: SPARK-14143 URL: https://issues.apache.org/jira/browse/SPARK-14143 Project: Spark

[jira] [Created] (SPARK-13754) Keep old data source name for backwards compatibility

2016-03-08 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-13754: -- Summary: Keep old data source name for backwards compatibility Key: SPARK-13754 URL: https://issues.apache.org/jira/browse/SPARK-13754 Project: Spark

[jira] [Commented] (SPARK-12420) Have a built-in CSV data source implementation

2016-04-27 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260677#comment-15260677 ] Hossein Falaki commented on SPARK-12420: HI [~koert]. There is pending PR with extensive set of

[jira] [Commented] (SPARK-16883) SQL decimal type is not properly cast to number when collecting SparkDataFrame

2016-08-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408621#comment-15408621 ] Hossein Falaki commented on SPARK-16883: I think that is because we are not converting

[jira] [Created] (SPARK-16903) nullValue in first field is not respected by CSV source when read

2016-08-04 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-16903: -- Summary: nullValue in first field is not respected by CSV source when read Key: SPARK-16903 URL: https://issues.apache.org/jira/browse/SPARK-16903 Project: Spark

[jira] [Commented] (SPARK-16903) nullValue in first field is not respected by CSV source when read

2016-08-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408722#comment-15408722 ] Hossein Falaki commented on SPARK-16903: Thanks for the info. That make me doubt the decision to

[jira] [Commented] (SPARK-16896) Loading csv with duplicate column names

2016-08-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408728#comment-15408728 ] Hossein Falaki commented on SPARK-16896: I suggest we generally follow the restrictions of

[jira] [Commented] (SPARK-16883) SQL decimal type is not properly cast to number when collecting SparkDataFrame

2016-08-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410105#comment-15410105 ] Hossein Falaki commented on SPARK-16883: Thanks [~shivaram]! This may require changing the

[jira] [Created] (SPARK-16883) SQL decimal type is not properly cast to number when collecting SparkDataFrame

2016-08-03 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-16883: -- Summary: SQL decimal type is not properly cast to number when collecting SparkDataFrame Key: SPARK-16883 URL: https://issues.apache.org/jira/browse/SPARK-16883

[jira] [Updated] (SPARK-17442) Additional arguments in write.df are not passed to data source

2016-09-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17442: --- Priority: Critical (was: Major) > Additional arguments in write.df are not passed to data

[jira] [Updated] (SPARK-17442) Additional arguments in write.df are not passed to data source

2016-09-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17442: --- Priority: Blocker (was: Critical) > Additional arguments in write.df are not passed to data

[jira] [Updated] (SPARK-17442) Additional arguments in write.df are not passed to data source

2016-09-07 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17442: --- Target Version/s: 2.0.1 > Additional arguments in write.df are not passed to data source >

[jira] [Created] (SPARK-17442) Additional arguments in write.df are not passed to data source

2016-09-07 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17442: -- Summary: Additional arguments in write.df are not passed to data source Key: SPARK-17442 URL: https://issues.apache.org/jira/browse/SPARK-17442 Project: Spark

[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550488#comment-15550488 ] Hossein Falaki commented on SPARK-17774: I agree. I think if we decouple {{head}} from

[jira] [Created] (SPARK-17811) SparkR cannot parallelize data.frame with NA or NULL in Date columns

2016-10-06 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17811: -- Summary: SparkR cannot parallelize data.frame with NA or NULL in Date columns Key: SPARK-17811 URL: https://issues.apache.org/jira/browse/SPARK-17811 Project:

[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547232#comment-15547232 ] Hossein Falaki commented on SPARK-17774: Putting implementation aside, throwing an error for

[jira] [Created] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17781: -- Summary: datetime is serialized as double inside dapply() Key: SPARK-17781 URL: https://issues.apache.org/jira/browse/SPARK-17781 Project: Spark Issue

[jira] [Updated] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17781: --- Description: When we ship a SparkDataFrame to workers for dapply family functions, inside

[jira] [Updated] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17781: --- Affects Version/s: (was: 2.0.1) > datetime is serialized as double inside dapply() >

[jira] [Updated] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17781: --- Description: When we ship a SparkDataFrame to workers for dapply family functions, inside

[jira] [Updated] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17781: --- Affects Version/s: 2.0.0 > datetime is serialized as double inside dapply() >

[jira] [Updated] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-04 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17781: --- Description: When we ship a SparkDataFrame to workers for dapply family functions, inside

[jira] [Commented] (SPARK-17774) Add support for head on DataFrame Column

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549913#comment-15549913 ] Hossein Falaki commented on SPARK-17774: I strongly feel {{head}} should work, but I don't have

[jira] [Created] (SPARK-17790) Support for parallelizing/creating DataFrame on data larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17790: -- Summary: Support for parallelizing/creating DataFrame on data larger than 2GB Key: SPARK-17790 URL: https://issues.apache.org/jira/browse/SPARK-17790 Project:

[jira] [Updated] (SPARK-17790) Support for parallelizing data.frame larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17790: --- Summary: Support for parallelizing data.frame larger than 2GB (was: Support for

[jira] [Commented] (SPARK-17790) Support for parallelizing/creating DataFrame on data larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549932#comment-15549932 ] Hossein Falaki commented on SPARK-17790: [~shivaram] and [~mengxr] just double checking that in

[jira] [Commented] (SPARK-17790) Support for parallelizing data.frame larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549996#comment-15549996 ] Hossein Falaki commented on SPARK-17790: Thanks for pointing it out. SPARK-6235 seems to be an

[jira] [Updated] (SPARK-17790) Support for parallelizing R data.frame larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17790: --- Summary: Support for parallelizing R data.frame larger than 2GB (was: Support for

[jira] [Updated] (SPARK-17790) Support for parallelizing data.frame larger than 2GB

2016-10-05 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17790: --- Issue Type: Sub-task (was: Story) Parent: SPARK-6235 > Support for parallelizing

[jira] [Created] (SPARK-17774) Add support for head on DataFrame Column

2016-10-03 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17774: -- Summary: Add support for head on DataFrame Column Key: SPARK-17774 URL: https://issues.apache.org/jira/browse/SPARK-17774 Project: Spark Issue Type:

[jira] [Updated] (SPARK-17774) Add support for head on DataFrame Column

2016-10-03 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17774: --- Description: There was a lot of discussion on SPARK-9325. To summarize the conversation on

[jira] [Created] (SPARK-17762) invokeJava serialized argument list is larger than INT_MAX (2,147,483,647) bytes

2016-10-02 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-17762: -- Summary: invokeJava serialized argument list is larger than INT_MAX (2,147,483,647) bytes Key: SPARK-17762 URL: https://issues.apache.org/jira/browse/SPARK-17762

[jira] [Updated] (SPARK-17762) invokeJava fails when serialized argument list is larger than INT_MAX (2,147,483,647) bytes

2016-10-02 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17762: --- Summary: invokeJava fails when serialized argument list is larger than INT_MAX

[jira] [Updated] (SPARK-17774) Add support for head on DataFrame Column

2016-10-03 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hossein Falaki updated SPARK-17774: --- Description: There was a lot of discussion on SPARK-9325. To summarize the conversation on

[jira] [Commented] (SPARK-17781) datetime is serialized as double inside dapply()

2016-10-10 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563193#comment-15563193 ] Hossein Falaki commented on SPARK-17781: I investigated the issue. The root cause is that Date

  1   2   >