[jira] [Commented] (SPARK-23997) Configurable max number of buckets

2018-06-13 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511800#comment-16511800 ] Fernando Pereira commented on SPARK-23997: -- cc [~cloud_fan] [~tejasp] This a pretty

[jira] [Commented] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

2018-05-13 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473584#comment-16473584 ] Fernando Pereira commented on SPARK-19618: -- [~cloud_fan] I have created the Jira and an

[jira] [Commented] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

2018-04-16 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440085#comment-16440085 ] Fernando Pereira commented on SPARK-19618: -- Opened 

[jira] [Created] (SPARK-23997) Configurable max number of buckets

2018-04-16 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-23997: Summary: Configurable max number of buckets Key: SPARK-23997 URL: https://issues.apache.org/jira/browse/SPARK-23997 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

2018-04-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438822#comment-16438822 ] Fernando Pereira commented on SPARK-19618: -- Is there any technological problem in using more

[jira] [Updated] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2018-02-27 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira updated SPARK-17859: - Fix Version/s: 2.2.1 > persist should not impede with spark's ability to perform a

[jira] [Reopened] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2018-02-02 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira reopened SPARK-17859: -- This bug persists {code:java} SPARK version 2.2.1 SparkSession available as 'spark'. In

[jira] [Comment Edited] (SPARK-19256) Hive bucketing support

2018-02-02 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345397#comment-16345397 ] Fernando Pereira edited comment on SPARK-19256 at 2/2/18 8:50 AM: --

[jira] [Comment Edited] (SPARK-19256) Hive bucketing support

2018-01-30 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345397#comment-16345397 ] Fernando Pereira edited comment on SPARK-19256 at 1/30/18 5:16 PM: ---

[jira] [Commented] (SPARK-19256) Hive bucketing support

2018-01-30 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345397#comment-16345397 ] Fernando Pereira commented on SPARK-19256: -- Thanks a lot for this great contribution to Spark.  

[jira] [Comment Edited] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326195#comment-16326195 ] Fernando Pereira edited comment on SPARK-17998 at 1/15/18 12:35 PM:

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326195#comment-16326195 ] Fernando Pereira commented on SPARK-17998: -- [~sams] Did you have the change to check

[jira] [Comment Edited] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326122#comment-16326122 ] Fernando Pereira edited comment on SPARK-21172 at 1/15/18 11:28 AM:

[jira] [Comment Edited] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326122#comment-16326122 ] Fernando Pereira edited comment on SPARK-21172 at 1/15/18 11:27 AM:

[jira] [Comment Edited] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326122#comment-16326122 ] Fernando Pereira edited comment on SPARK-21172 at 1/15/18 11:28 AM:

[jira] [Commented] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

2018-01-15 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326122#comment-16326122 ] Fernando Pereira commented on SPARK-21172: -- With my previous small dataset I was able to make it

[jira] [Commented] (SPARK-23029) Doc spark.shuffle.file.buffer units are kb when no units specified

2018-01-12 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324402#comment-16324402 ] Fernando Pereira commented on SPARK-23029: -- [~jerryshao] Thanks a lot for the info, now I

[jira] [Commented] (SPARK-23029) Setting spark.shuffle.file.buffer will make the shuffle fail

2018-01-12 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323783#comment-16323783 ] Fernando Pereira commented on SPARK-23029: -- There is some bug here anyway. With buffer = 32768

[jira] [Reopened] (SPARK-23029) Setting spark.shuffle.file.buffer will make the shuffle fail

2018-01-12 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira reopened SPARK-23029: -- Eventually update docs > Setting spark.shuffle.file.buffer will make the shuffle fail >

[jira] [Commented] (SPARK-23029) Setting spark.shuffle.file.buffer will make the shuffle fail

2018-01-12 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323777#comment-16323777 ] Fernando Pereira commented on SPARK-23029: -- IMHO it's a misleading syntax to accept 32 = 32k, as

[jira] [Comment Edited] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-10 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321234#comment-16321234 ] Fernando Pereira edited comment on SPARK-17998 at 1/10/18 10:20 PM:

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-10 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321234#comment-16321234 ] Fernando Pereira commented on SPARK-17998: -- It says spark.sql.files.maxPartitionBytes in this

[jira] [Created] (SPARK-23029) Setting spark.shuffle.file.buffer will make the shuffle fail

2018-01-10 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-23029: Summary: Setting spark.shuffle.file.buffer will make the shuffle fail Key: SPARK-23029 URL: https://issues.apache.org/jira/browse/SPARK-23029 Project: Spark

[jira] [Commented] (SPARK-17998) Reading Parquet files coalesces parts into too few in-memory partitions

2018-01-09 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318243#comment-16318243 ] Fernando Pereira commented on SPARK-17998: -- The documentation

[jira] [Commented] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

2017-12-20 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298735#comment-16298735 ] Fernando Pereira commented on SPARK-21172: -- I'm getting the same problem, even though my data is

[jira] [Updated] (SPARK-22771) SQL concat for binary

2017-12-13 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira updated SPARK-22771: - Description: spark.sql {{concat}} function automatically casts arguments to StringType

[jira] [Created] (SPARK-22771) SQL concat for binary

2017-12-13 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-22771: Summary: SQL concat for binary Key: SPARK-22771 URL: https://issues.apache.org/jira/browse/SPARK-22771 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-22649) localCheckpoint support in Dataset API

2017-11-29 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-22649: Summary: localCheckpoint support in Dataset API Key: SPARK-22649 URL: https://issues.apache.org/jira/browse/SPARK-22649 Project: Spark Issue Type:

[jira] [Commented] (SPARK-22051) Explicit control of number of partitions after dataframe operations (join, order...)

2017-11-27 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267169#comment-16267169 ] Fernando Pereira commented on SPARK-22051: -- Ideas anyone? > Explicit control of number of

[jira] [Commented] (SPARK-22250) Be less restrictive on type checking

2017-11-27 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266642#comment-16266642 ] Fernando Pereira commented on SPARK-22250: -- [~bryanc] It could help, but it doesn't solve the

[jira] [Commented] (SPARK-22250) Be less restrictive on type checking

2017-10-17 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207798#comment-16207798 ] Fernando Pereira commented on SPARK-22250: -- I did some tests and even though verifySchema=False

[jira] [Comment Edited] (SPARK-22276) Unnecessary repartitioning

2017-10-16 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205794#comment-16205794 ] Fernando Pereira edited comment on SPARK-22276 at 10/16/17 12:07 PM: -

[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

2017-10-16 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205794#comment-16205794 ] Fernando Pereira commented on SPARK-22276: -- I made some more tests and this behavior only

[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

2017-10-16 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205655#comment-16205655 ] Fernando Pereira commented on SPARK-22276: -- I added a simple example that shows what I mean. You

[jira] [Updated] (SPARK-22276) Unnecessary repartitioning

2017-10-16 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira updated SPARK-22276: - Description: When a dataframe is sorted it is partitioned with a RangePartitioner. If

[jira] [Created] (SPARK-22276) Unnecessary repartitioning

2017-10-13 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-22276: Summary: Unnecessary repartitioning Key: SPARK-22276 URL: https://issues.apache.org/jira/browse/SPARK-22276 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-22250) Be less restrictive on type checking

2017-10-13 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204059#comment-16204059 ] Fernando Pereira edited comment on SPARK-22250 at 10/13/17 7:28 PM: I

[jira] [Commented] (SPARK-22250) Be less restrictive on type checking

2017-10-13 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204059#comment-16204059 ] Fernando Pereira commented on SPARK-22250: -- I have to admit I was not aware of that option.

[jira] [Created] (SPARK-22250) Be less restrictive on type checking

2017-10-11 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-22250: Summary: Be less restrictive on type checking Key: SPARK-22250 URL: https://issues.apache.org/jira/browse/SPARK-22250 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-22051) Explicit control of number of partitions after dataframe operations (join, order...)

2017-09-18 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-22051: Summary: Explicit control of number of partitions after dataframe operations (join, order...) Key: SPARK-22051 URL: https://issues.apache.org/jira/browse/SPARK-22051

[jira] [Commented] (SPARK-20580) Allow RDD cache with unserializable objects

2017-05-08 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001607#comment-16001607 ] Fernando Pereira commented on SPARK-20580: -- I understand that at some point it will be better to

[jira] [Commented] (SPARK-20580) Allow RDD cache with unserializable objects

2017-05-03 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994968#comment-15994968 ] Fernando Pereira commented on SPARK-20580: -- I try to avoid any operation involving

[jira] [Created] (SPARK-20580) Allow RDD cache with unserializable objects

2017-05-03 Thread Fernando Pereira (JIRA)
Fernando Pereira created SPARK-20580: Summary: Allow RDD cache with unserializable objects Key: SPARK-20580 URL: https://issues.apache.org/jira/browse/SPARK-20580 Project: Spark Issue