[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown
[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112791#comment-13112791 ] Vaibhav Aggarwal commented on HIVE-1643: I have been looking into this since last 3 days. I would ideally like to break this into: 1. Add support for range query on primary key 2. Add support for filter pushdown on non primary key columns I will try to submit a patch for 1. soon. support range scans and non-key columns in HBase filter pushdown Key: HIVE-1643 URL: https://issues.apache.org/jira/browse/HIVE-1643 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Vaibhav Aggarwal HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus conjunctions etc). Non-rowkey conditions can't be used to filter out entire ranges, but they can be used to push the per-row filter processing as far down as possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095701#comment-13095701 ] Vaibhav Aggarwal commented on HIVE-2020: Thanks for looking at this Carl! Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Fix For: 0.8.0 Attachments: HIVE-2020-2.patch, HIVE-2020-3.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'set hivevar:x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095747#comment-13095747 ] Vaibhav Aggarwal commented on HIVE-2266: This patch attempts to fix a bug in the existing functionality in two ways: 1. In HiveFileFormatUtils.java, wrong jobconf is getting passed which is clear from the context. 2. In other cases the compression parameters are not getting set. The only difference this patch produces from the current behavior is smaller file sizes on file system. I am not sure how to write a hive query which can verify difference in file sizes. Do you have any ideas which can help me add some quick tests for this? The current test executes though the code checking that it does not result in any Exception or Error. It does not compare file size. Really? Which platforms are you talking about? Can you tell me how to reproduce this interesting behavior? Hadoop loads native compression libraries. I believe that they are platform dependent hence I do not assume that they always have same compression ratio. Please correct me if I am wrong here. In any case I think this is a broken existing functionality in Hive which we should fix. Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2266: --- Status: Patch Available (was: Open) Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2178) Log related Check style Comments fixes
[ https://issues.apache.org/jira/browse/HIVE-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089699#comment-13089699 ] Vaibhav Aggarwal commented on HIVE-2178: This is a good patch. I have noticed a number of cases here root cause of exception is missing. Log related Check style Comments fixes -- Key: HIVE-2178 URL: https://issues.apache.org/jira/browse/HIVE-2178 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0, 0.8.0 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-2178.1.patch, HIVE-2178.patch Fix Log related Check style Comments -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2266: --- Status: Patch Available (was: Open) Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2266: --- Attachment: HIVE-2266-2.patch Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266-2.patch, HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1643) support range scans and non-key columns in HBase filter pushdown
[ https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089927#comment-13089927 ] Vaibhav Aggarwal commented on HIVE-1643: I would like to work on this if it not being worked on actively as of now. support range scans and non-key columns in HBase filter pushdown Key: HIVE-1643 URL: https://issues.apache.org/jira/browse/HIVE-1643 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.7.0 Reporter: John Sichi Assignee: John Sichi HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus conjunctions etc). Non-rowkey conditions can't be used to filter out entire ranges, but they can be used to push the per-row filter processing as far down as possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Attachment: HIVE-2020-3.patch Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020-2.patch, HIVE-2020-3.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Status: Patch Available (was: Open) Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020-2.patch, HIVE-2020-3.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Description: Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'set hivevar:x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' was: Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020-2.patch, HIVE-2020-3.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'set hivevar:x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2357) Support connection timeout in hive JDBC
Support connection timeout in hive JDBC --- Key: HIVE-2357 URL: https://issues.apache.org/jira/browse/HIVE-2357 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Attachment: HIVE-2020-2.patch Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020-2.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081267#comment-13081267 ] Vaibhav Aggarwal commented on HIVE-2020: New review request: https://reviews.apache.org/r/1324/ Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020-2.patch, HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080275#comment-13080275 ] Vaibhav Aggarwal commented on HIVE-2020: I had a chat with Carl about this issue. The following are the planned next steps: 1. Use VariableSubstitution instead of DefaultPreprocessor. 2. Add support for specifying variables as '${var_name}' only for now. (Already implemented) 3. Support set -v to clearly separate hive variables from hiveconf variables. 4. Support setting variables through command line as '-d x=y' OR '--define x=y' (Already implemented) Thanks Vaibhav Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2318) Support multiple file systems
[ https://issues.apache.org/jira/browse/HIVE-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079604#comment-13079604 ] Vaibhav Aggarwal commented on HIVE-2318: @Carl You would notice that 70% of the code deals with 1. Supporting reading with one file system and writing to another in the same query. 2. Writing directly to result directory if the file system does not support move. S3FileSystem serves as a specific example in this case because of which I choose this title. Support multiple file systems - Key: HIVE-2318 URL: https://issues.apache.org/jira/browse/HIVE-2318 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2318.patch Currently some of the Hive tasks like MoveTask, ConditionalMergeResolver assume that the data is being copied or moved on the same file system. These operators file if the source table is in one filesystem (like HDFS) and destination table is in another file system (like s3). This patch aims at: 1. Support moving data between different file systems. 2. Add support for file systems which do not support 'move' operation like s3. 3. Remove redundant operations like moving data from and to the same location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2318) Support multiple file systems
[ https://issues.apache.org/jira/browse/HIVE-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079606#comment-13079606 ] Vaibhav Aggarwal commented on HIVE-2318: I am thinking of writing some unit tests testing individual methods in order to simplify testing. What do you think? Support multiple file systems - Key: HIVE-2318 URL: https://issues.apache.org/jira/browse/HIVE-2318 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2318.patch Currently some of the Hive tasks like MoveTask, ConditionalMergeResolver assume that the data is being copied or moved on the same file system. These operators file if the source table is in one filesystem (like HDFS) and destination table is in another file system (like s3). This patch aims at: 1. Support moving data between different file systems. 2. Add support for file systems which do not support 'move' operation like s3. 3. Remove redundant operations like moving data from and to the same location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Status: Patch Available (was: Open) Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298-3.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Attachment: HIVE-2298-3.patch Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298-3.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079128#comment-13079128 ] Vaibhav Aggarwal commented on HIVE-2298: Hi Amreshwari Thanks for reporting the diff. I generated a new patch. Hopefully this fixes the issue. Vaibhav Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298-3.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078490#comment-13078490 ] Vaibhav Aggarwal commented on HIVE-2298: Amreshwari, could you please post the diff or failure cause. The test succeeded on my desktop. I ran the following command: ant test -Dtestcase=TestCliDriver -Dqfile=udf_percentile.q Is there a different command I should be using to run just this particular test? Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Status: Patch Available (was: Open) Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Attachment: HIVE-2298-2.patch Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075989#comment-13075989 ] Vaibhav Aggarwal commented on HIVE-2298: Also added a test case. Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298-2.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2266: --- Status: Patch Available (was: Open) Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2266: --- Attachment: HIVE-2266.patch Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Attachment: HIVE-2020.patch Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2318) Support multiple file systems
[ https://issues.apache.org/jira/browse/HIVE-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2318: --- Status: Patch Available (was: Open) Support multiple file systems - Key: HIVE-2318 URL: https://issues.apache.org/jira/browse/HIVE-2318 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2318.patch Currently some of the Hive tasks like MoveTask, ConditionalMergeResolver assume that the data is being copied or moved on the same file system. These operators file if the source table is in one filesystem (like HDFS) and destination table is in another file system (like s3). This patch aims at: 1. Support moving data between different file systems. 2. Add support for file systems which do not support 'move' operation like s3. 3. Remove redundant operations like moving data from and to the same location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2266) Fix compression parameters
[ https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13072549#comment-13072549 ] Vaibhav Aggarwal commented on HIVE-2266: Carl, I have attached the patch. Fix compression parameters -- Key: HIVE-2266 URL: https://issues.apache.org/jira/browse/HIVE-2266 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2266.patch There are a number of places where compression values are not set correctly in FileSinkOperator. This results in uncompressed files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2318) Support multiple file systems
[ https://issues.apache.org/jira/browse/HIVE-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2318: --- Attachment: HIVE-2318.patch Support multiple file systems - Key: HIVE-2318 URL: https://issues.apache.org/jira/browse/HIVE-2318 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2318.patch Currently some of the Hive tasks like MoveTask, ConditionalMergeResolver assume that the data is being copied or moved on the same file system. These operators file if the source table is in one filesystem (like HDFS) and destination table is in another file system (like s3). This patch aims at: 1. Support moving data between different file systems. 2. Add support for file systems which do not support 'move' operation like s3. 3. Remove redundant operations like moving data from and to the same location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2020: --- Status: Patch Available (was: Open) Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Attachments: HIVE-2020.patch Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Attachment: HIVE-2259-2.patch Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2259-2.patch, HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal reassigned HIVE-2020: -- Assignee: Vaibhav Aggarwal Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Assignee: Vaibhav Aggarwal Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2195) Hive queries hangs with first stage job created with zero mappers and 1 reducer,
[ https://issues.apache.org/jira/browse/HIVE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal resolved HIVE-2195. Resolution: Not A Problem Hive queries hangs with first stage job created with zero mappers and 1 reducer, -- Key: HIVE-2195 URL: https://issues.apache.org/jira/browse/HIVE-2195 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: vitthal (Suhas) Gogate Assignee: Vaibhav Aggarwal This happens when query aggregate data w/ predicate selecting bunch of non-existing data partitions. e,g, Table XXX has five columns, A (int), B (int), C(string), date (int) and hour (int), where date/hour are the partition columns, select cast((100*date+hour) as BIGINT) as datehour, sum(A) as sumA, sum(B) as sumB from XXX where date=20110925 and C='test' group by date, hour order by datehour In the above query, make a note that selected date partition range does not exists in hive table i.e. no date partitions for date=20110925 The above query hangs with the first map reduce job it creates, with zero mappers and 1 reducer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2318) Support multiple file systems
Support multiple file systems - Key: HIVE-2318 URL: https://issues.apache.org/jira/browse/HIVE-2318 Project: Hive Issue Type: New Feature Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Currently some of the Hive tasks like MoveTask, ConditionalMergeResolver assume that the data is being copied or moved on the same file system. These operators file if the source table is in one filesystem (like HDFS) and destination table is in another file system (like s3). This patch aims at: 1. Support moving data between different file systems. 2. Add support for file systems which do not support 'move' operation like s3. 3. Remove redundant operations like moving data from and to the same location. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2020) Create a separate namespace for Hive variables
[ https://issues.apache.org/jira/browse/HIVE-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071318#comment-13071318 ] Vaibhav Aggarwal commented on HIVE-2020: I propose to use -d, --define to define Hive variables. Amazon ElasticMapreduce is already using this notation for hive variables and variable substitution. This approach would also clearly separate use of -hiveconf from -d or --define which would be used to purely set hive variables. This would also maintain consistency for Hive users. Create a separate namespace for Hive variables -- Key: HIVE-2020 URL: https://issues.apache.org/jira/browse/HIVE-2020 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Carl Steinbach Support for variable substitution was added in HIVE-1096. However, variable substitution was implemented by reusing the HiveConf namespace, so there is no separation between Hive configuration properties and Hive variables. This ticket encompasses the following enhancements: * Create a separate namespace for managing Hive variables. * Add support for setting variables on the command line via '-hivevar x=y' * Add support for setting variables through the CLI via 'var x=y' * Add support for referencing variables in statements using either '${hivevar:var_name}' or '${var_name}' * Provide a means for differentiating between hiveconf, hivevar, system, and environment properties in the output of 'set -v' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070662#comment-13070662 ] Vaibhav Aggarwal commented on HIVE-2297: I generated the patch from 0.7 branch and then rebased it to 0.8. Didn't realize that it was already fixed in 0.8 while generating the patch. I will resolve this. Thanks Vaibhav Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2297.patch, fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Resolution: Not A Problem Status: Resolved (was: Patch Available) Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2297.patch, fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070800#comment-13070800 ] Vaibhav Aggarwal commented on HIVE-2299: Thanks for looking at this improvement request Carl! Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.8.0 Attachments: HIVE-2299.patch Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070814#comment-13070814 ] Vaibhav Aggarwal commented on HIVE-2298: I will make the style changes and try to add a test case to test this specific case. Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.7.0 Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Attachment: fix_npe.patch Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069134#comment-13069134 ] Vaibhav Aggarwal commented on HIVE-2297: Some of the file systems can return null if there are no objects to list. Added a fix for that. Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Status: Patch Available (was: Open) Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Attachment: HIVE-2298.patch Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Status: Patch Available (was: Open) Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2299) Optimize Hive query startup time for multiple partitions
Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O(n) operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Description: Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. was: Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O(n) operation. Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Attachment: HIVE-2297.patch Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2297.patch, fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Attachment: HIVE-2299.patch Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Attachments: HIVE-2299.patch Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Assignee: Vaibhav Aggarwal Status: Patch Available (was: Open) Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2299.patch Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1604) Patch to allow variables in Hive
[ https://issues.apache.org/jira/browse/HIVE-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal reassigned HIVE-1604: -- Assignee: Vaibhav Aggarwal Patch to allow variables in Hive Key: HIVE-1604 URL: https://issues.apache.org/jira/browse/HIVE-1604 Project: Hive Issue Type: Improvement Components: CLI Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-1604.patch Patch to Hive which allows command line substitution. The patch modifies the Hive command line driver and options processor to support the following arguments: hive [-d key=value] [-define key=value] -dSubsitution to apply to script -define Subsitution to apply to script -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2195) Hive queries hangs with first stage job created with zero mappers and 1 reducer,
[ https://issues.apache.org/jira/browse/HIVE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal reassigned HIVE-2195: -- Assignee: Vaibhav Aggarwal Hive queries hangs with first stage job created with zero mappers and 1 reducer, -- Key: HIVE-2195 URL: https://issues.apache.org/jira/browse/HIVE-2195 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: vitthal (Suhas) Gogate Assignee: Vaibhav Aggarwal This happens when query aggregate data w/ predicate selecting bunch of non-existing data partitions. e,g, Table XXX has five columns, A (int), B (int), C(string), date (int) and hour (int), where date/hour are the partition columns, select cast((100*date+hour) as BIGINT) as datehour, sum(A) as sumA, sum(B) as sumB from XXX where date=20110925 and C='test' group by date, hour order by datehour In the above query, make a note that selected date partition range does not exists in hive table i.e. no date partitions for date=20110925 The above query hangs with the first map reduce job it creates, with zero mappers and 1 reducer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2258) Honor -S silent floag during hadoop rmr command
Honor -S silent floag during hadoop rmr command --- Key: HIVE-2258 URL: https://issues.apache.org/jira/browse/HIVE-2258 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2258) Honor -S flag during hadoop rmr command
[ https://issues.apache.org/jira/browse/HIVE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2258: --- Description: Currently even if -S flag is specified, the output of hadoop -rmr command is printed to the screen. The reason is that the command writes output to screen instead of log file. I have fixed the problem by temporarily redirecting the output for that command. Summary: Honor -S flag during hadoop rmr command (was: Honor -S silent floag during hadoop rmr command) Honor -S flag during hadoop rmr command --- Key: HIVE-2258 URL: https://issues.apache.org/jira/browse/HIVE-2258 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Currently even if -S flag is specified, the output of hadoop -rmr command is printed to the screen. The reason is that the command writes output to screen instead of log file. I have fixed the problem by temporarily redirecting the output for that command. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2259) Skip comments in hive script
[ https://issues.apache.org/jira/browse/HIVE-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2259: --- Attachment: HIVE-2259.patch Skip comments in hive script Key: HIVE-2259 URL: https://issues.apache.org/jira/browse/HIVE-2259 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2259.patch If you specify something like: -- This is a comment add jar jar_path; select * from my_table; This fails. I have created a fix to skip the commented lines. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2150) Sampling fails after dynamic-partition insert into a bucketed s3n table
[ https://issues.apache.org/jira/browse/HIVE-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal reassigned HIVE-2150: -- Assignee: Vaibhav Aggarwal Sampling fails after dynamic-partition insert into a bucketed s3n table --- Key: HIVE-2150 URL: https://issues.apache.org/jira/browse/HIVE-2150 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Steven Wong Assignee: Vaibhav Aggarwal When using dynamic-partition insert and bucketing together on an s3n table, the insert does not create files for empty buckets. This will result in the following exception when running a sampling query that includes the empty buckets. {noformat} FAILED: Hive Internal Error: java.lang.RuntimeException(Cannot get bucket path for bucket 1) java.lang.RuntimeException: Cannot get bucket path for bucket 1 at org.apache.hadoop.hive.ql.metadata.Partition.getBucketPath(Partition.java:367) at org.apache.hadoop.hive.ql.optimizer.SamplePruner.prune(SamplePruner.java:186) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:603) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.processFS(GenMRFileSink1.java:586) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:145) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:6336) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6615) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:332) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:686) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:149) at org.apache.hadoop.hive.cli.CliDriver.processLineInternal(CliDriver.java:228) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:355) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.metadata.Partition.getBucketPath(Partition.java:365) ... 27 more {noformat} Here is a repro case: {noformat} CREATE TABLE tab (x string) PARTITIONED BY (p1 string, p2 string) CLUSTERED BY (x) INTO 4 BUCKETS LOCATION 's3n://some/path'; SET hive.exec.dynamic.partition=true; SET hive.enforce.bucketing=true; INSERT OVERWRITE TABLE tab PARTITION (p1='p', p2) SELECT 'v1', 'v2' FROM dual; SELECT * FROM tab TABLESAMPLE (BUCKET 2 OUT OF 4); {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2086) Data loss with external table
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054550#comment-13054550 ] Vaibhav Aggarwal commented on HIVE-2086: Has this patch been committed or is anyone still working on this particular patch? Data loss with external table - Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Assignee: Jonathan Natkins Attachments: HIVE-2086.1.patch, HIVE-2086.2.patch, create_like.q.out Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-1790: --- Attachment: HIVE-1790.4.patch.txt Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790-1.patch, HIVE-1790.2.patch.txt, HIVE-1790.3.patch.txt, HIVE-1790.4.patch.txt, HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973463#action_12973463 ] Vaibhav Aggarwal commented on HIVE-1790: Please note that I have not created a common method for genFilterPlan and genHavingPlan. The genHavingPlan needs to populate the input row resolver with additional information related to column aliases. It is not happening in genFilterPlan. I don't think that there is lot of common code between the two methods. In this case I would prefer separate methods over if statements in code. It keeps the control flow much cleaner. Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790-1.patch, HIVE-1790.2.patch.txt, HIVE-1790.3.patch.txt, HIVE-1790.4.patch.txt, HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932211#action_12932211 ] Vaibhav Aggarwal commented on HIVE-1790: Hi I think the patch is ready for review. This is my first attempt though. Thanks Vaibhav Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932275#action_12932275 ] Vaibhav Aggarwal commented on HIVE-1790: Hey Carl Do you know how to access the schema and data in the staging tables? I am not able to find the table src anywhere in the hive codebase. Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-1790: --- Attachment: HIVE-1790-1.patch Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790-1.patch, HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932318#action_12932318 ] Vaibhav Aggarwal commented on HIVE-1790: I have added test cases and have uploaded a new patch. Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Fix For: 0.7.0 Attachments: HIVE-1790-1.patch, HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1790) Patch to support HAVING clause in Hive
Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1790) Patch to support HAVING clause in Hive
[ https://issues.apache.org/jira/browse/HIVE-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-1790: --- Attachment: HIVE-1790.patch Patch to support HAVING clause in Hive -- Key: HIVE-1790 URL: https://issues.apache.org/jira/browse/HIVE-1790 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-1790.patch Currently Hive users have to do nested queries in order to apply filter on group by expressions. This patch allows users to directly apply filter on group by expressions by using HAVING clause. This patch also helps us integrate Hive with other data analysis tools which rely on HAVING expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.