[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: {code:java} val url =

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: {code:java} val url =

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: {code:java} val url =

[jira] [Comment Edited] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324111#comment-16324111 ] Yash Datta edited comment on SPARK-23056 at 1/12/18 3:26 PM: - Agreed that

[jira] [Comment Edited] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324111#comment-16324111 ] Yash Datta edited comment on SPARK-23056 at 1/12/18 3:25 PM: - Agreed that

[jira] [Commented] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324111#comment-16324111 ] Yash Datta commented on SPARK-23056: Agreed that going strictly by standard, these are IRIs and not

[jira] [Comment Edited] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324056#comment-16324056 ] Yash Datta edited comment on SPARK-23056 at 1/12/18 2:52 PM: - We have

[jira] [Commented] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324056#comment-16324056 ] Yash Datta commented on SPARK-23056: We have production use case with many different IRIs in

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Labels: regression (was: regresion) > parse_url regression when switched to using java.net.URI

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: {code:java} val url =

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Labels: regresion (was: ) > parse_url regression when switched to using java.net.URI instead of >

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: {code:java} val url =

[jira] [Updated] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-23056: --- Description: When using internationalized Domains in the urls like: val url =

[jira] [Created] (SPARK-23056) parse_url regression when switched to using java.net.URI instead of java.net.URL

2018-01-12 Thread Yash Datta (JIRA)
Yash Datta created SPARK-23056: -- Summary: parse_url regression when switched to using java.net.URI instead of java.net.URL Key: SPARK-23056 URL: https://issues.apache.org/jira/browse/SPARK-23056

[jira] [Comment Edited] (SPARK-5948) Support writing to partitioned table for the Parquet data source

2015-12-29 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073755#comment-15073755 ] Yash Datta edited comment on SPARK-5948 at 12/29/15 10:18 AM: -- oh i see, So

[jira] [Commented] (SPARK-5948) Support writing to partitioned table for the Parquet data source

2015-12-29 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073755#comment-15073755 ] Yash Datta commented on SPARK-5948: --- oh i see, So does it mean that when using hive commands that use

[jira] [Commented] (SPARK-5948) Support writing to partitioned table for the Parquet data source

2015-12-28 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073551#comment-15073551 ] Yash Datta commented on SPARK-5948: --- Can you please mention which change resolved this one ? > Support

[jira] [Created] (SPARK-11878) Eliminate distribute by in case group by is present with exactly the same grouping expressions

2015-11-19 Thread Yash Datta (JIRA)
Yash Datta created SPARK-11878: -- Summary: Eliminate distribute by in case group by is present with exactly the same grouping expressions Key: SPARK-11878 URL: https://issues.apache.org/jira/browse/SPARK-11878

[jira] [Created] (SPARK-10527) evaluate debugString only when log level is debug

2015-09-09 Thread Yash Datta (JIRA)
Yash Datta created SPARK-10527: -- Summary: evaluate debugString only when log level is debug Key: SPARK-10527 URL: https://issues.apache.org/jira/browse/SPARK-10527 Project: Spark Issue Type:

[jira] [Created] (SPARK-10451) Prevent unnecessary serializations in InMemoryColumnarTableScan

2015-09-04 Thread Yash Datta (JIRA)
Yash Datta created SPARK-10451: -- Summary: Prevent unnecessary serializations in InMemoryColumnarTableScan Key: SPARK-10451 URL: https://issues.apache.org/jira/browse/SPARK-10451 Project: Spark

[jira] [Created] (SPARK-7340) Use latest parquet release 1.6.0 in spark

2015-05-04 Thread Yash Datta (JIRA)
Yash Datta created SPARK-7340: - Summary: Use latest parquet release 1.6.0 in spark Key: SPARK-7340 URL: https://issues.apache.org/jira/browse/SPARK-7340 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold

2015-04-27 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-7097: -- Description: Currently when deciding about whether to create HashJoin or ShuffleHashJoin, the size

[jira] [Created] (SPARK-7142) Minor enhancement to BooleanSimplification Optimizer rule

2015-04-25 Thread Yash Datta (JIRA)
Yash Datta created SPARK-7142: - Summary: Minor enhancement to BooleanSimplification Optimizer rule Key: SPARK-7142 URL: https://issues.apache.org/jira/browse/SPARK-7142 Project: Spark Issue

[jira] [Created] (SPARK-7097) Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold

2015-04-23 Thread Yash Datta (JIRA)
Yash Datta created SPARK-7097: - Summary: Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold Key: SPARK-7097 URL:

[jira] [Updated] (SPARK-6742) Spark pushes down filters in old parquet path that reference partitioning columns

2015-04-07 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-6742: -- This is same as SPARK-6554 for new parquet path Spark pushes down filters in old parquet path that

[jira] [Created] (SPARK-6742) Spark pushes down filters in old parquet path that reference partitioning columns

2015-04-07 Thread Yash Datta (JIRA)
Yash Datta created SPARK-6742: - Summary: Spark pushes down filters in old parquet path that reference partitioning columns Key: SPARK-6742 URL: https://issues.apache.org/jira/browse/SPARK-6742 Project:

[jira] [Commented] (SPARK-4258) NPE with new Parquet Filters

2015-04-03 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394670#comment-14394670 ] Yash Datta commented on SPARK-4258: --- [~yhuai] No it does not. I fixed this in parquet

[jira] [Created] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself

2015-03-31 Thread Yash Datta (JIRA)
Yash Datta created SPARK-6632: - Summary: Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself Key: SPARK-6632 URL:

[jira] [Updated] (SPARK-6471) Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns

2015-03-23 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-6471: -- Summary: Metastore schema should only be a subset of parquet schema to support dropping of columns

[jira] [Created] (SPARK-6471) Metastoreschema should only be a subset of parquetSchema to support dropping of columns using replace columns

2015-03-23 Thread Yash Datta (JIRA)
Yash Datta created SPARK-6471: - Summary: Metastoreschema should only be a subset of parquetSchema to support dropping of columns using replace columns Key: SPARK-6471 URL:

[jira] [Commented] (SPARK-6471) Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns

2015-03-23 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376286#comment-14376286 ] Yash Datta commented on SPARK-6471: --- https://github.com/apache/spark/pull/5141

[jira] [Issue Comment Deleted] (SPARK-6471) Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns

2015-03-23 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-6471: -- Comment: was deleted (was: https://github.com/apache/spark/pull/5141) Metastore schema should only be

[jira] [Created] (SPARK-6006) Optimize count distinct in case high cardinality columns

2015-02-25 Thread Yash Datta (JIRA)
Yash Datta created SPARK-6006: - Summary: Optimize count distinct in case high cardinality columns Key: SPARK-6006 URL: https://issues.apache.org/jira/browse/SPARK-6006 Project: Spark Issue Type:

[jira] [Updated] (SPARK-6006) Optimize count distinct in case of high cardinality columns

2015-02-25 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-6006: -- Summary: Optimize count distinct in case of high cardinality columns (was: Optimize count distinct in

[jira] [Updated] (SPARK-5684) Key not found exception is thrown in case location of added partition to a parquet table is different than a path containing the partition values

2015-02-09 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-5684: -- Priority: Major (was: Critical) Key not found exception is thrown in case location of added partition

[jira] [Created] (SPARK-5453) Use hive-site.xml to set class for adding custom filter for input files

2015-01-28 Thread Yash Datta (JIRA)
Yash Datta created SPARK-5453: - Summary: Use hive-site.xml to set class for adding custom filter for input files Key: SPARK-5453 URL: https://issues.apache.org/jira/browse/SPARK-5453 Project: Spark

[jira] [Commented] (SPARK-4786) Parquet filter pushdown for BYTE and SHORT types

2015-01-21 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287004#comment-14287004 ] Yash Datta commented on SPARK-4786: --- https://github.com/apache/spark/pull/4156 Parquet

[jira] [Issue Comment Deleted] (SPARK-4786) Parquet filter pushdown for BYTE and SHORT types

2015-01-21 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-4786: -- Comment: was deleted (was: https://github.com/apache/spark/pull/4156) Parquet filter pushdown for

[jira] [Created] (SPARK-4762) Add support for tuples in where in clause query

2014-12-05 Thread Yash Datta (JIRA)
Yash Datta created SPARK-4762: - Summary: Add support for tuples in where in clause query Key: SPARK-4762 URL: https://issues.apache.org/jira/browse/SPARK-4762 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4762) Add support for tuples in 'where in' clause query

2014-12-05 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-4762: -- Summary: Add support for tuples in 'where in' clause query (was: Add support for tuples in where in

[jira] [Commented] (SPARK-4762) Add support for tuples in 'where in' clause query

2014-12-05 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235280#comment-14235280 ] Yash Datta commented on SPARK-4762: --- Already created a PR for the hive parser Add

[jira] [Created] (SPARK-4365) Remove unnecessary filter call on records returned from parquet library

2014-11-12 Thread Yash Datta (JIRA)
Yash Datta created SPARK-4365: - Summary: Remove unnecessary filter call on records returned from parquet library Key: SPARK-4365 URL: https://issues.apache.org/jira/browse/SPARK-4365 Project: Spark

[jira] [Updated] (SPARK-3968) Use parquet-mr filter2 api in spark sql

2014-10-20 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-3968: -- Description: The parquet-mr project has introduced a new filter api , along with several fixes (like

[jira] [Updated] (SPARK-3968) Use parquet-mr filter2 api in spark sql

2014-10-18 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-3968: -- Description: The parquet-mr project has introduced a new filter api , along with several fixes . It

[jira] [Created] (SPARK-3968) Using parquet-mr filter2 api in spark sql, add a custom filter for InSet clause

2014-10-16 Thread Yash Datta (JIRA)
Yash Datta created SPARK-3968: - Summary: Using parquet-mr filter2 api in spark sql, add a custom filter for InSet clause Key: SPARK-3968 URL: https://issues.apache.org/jira/browse/SPARK-3968 Project:

[jira] [Updated] (SPARK-3968) Using parquet-mr filter2 api in spark sql, add a custom filter for InSet clause

2014-10-16 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-3968: -- Shepherd: Yash Datta Using parquet-mr filter2 api in spark sql, add a custom filter for InSet clause

[jira] [Commented] (SPARK-3711) Optimize where in clause filter queries

2014-09-30 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152878#comment-14152878 ] Yash Datta commented on SPARK-3711: --- On a 2 node setup each machine config: 24 core

[jira] [Created] (SPARK-3711) Optimize where in clause filter queries

2014-09-27 Thread Yash Datta (JIRA)
Yash Datta created SPARK-3711: - Summary: Optimize where in clause filter queries Key: SPARK-3711 URL: https://issues.apache.org/jira/browse/SPARK-3711 Project: Spark Issue Type: Improvement