[jira] [Commented] (HIVE-2136) Add get_version() call to Thrift API
[ https://issues.apache.org/jira/browse/HIVE-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026150#comment-13026150 ] Carl Steinbach commented on HIVE-2136: -- We may also want to consider wrapping this in some kind of generic mechanism similar to ODBC's SQLGetInfo call. Add get_version() call to Thrift API Key: HIVE-2136 URL: https://issues.apache.org/jira/browse/HIVE-2136 Project: Hive Issue Type: Improvement Components: Thrift API Reporter: Carl Steinbach Clients need to be able to determine the version of the HiveServer and HiveMetastore. Open questions: * Should there be separate methods for determining the HiveServer and HiveMetaStore versions? * Should the return value be a string, or should we have separate integer valued methods that return the major/minor/patch versions separately (the latter would be easier for clients written in C). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026204#comment-13026204 ] jirapos...@reviews.apache.org commented on HIVE-2121: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/633/ --- (Updated 2011-04-28 08:32:17.534107) Review request for hive, Ning Zhang and namit jain. Changes --- Two changes made according to Namit's comments: 1. explain will print out some about the sampling. (It might not be the best way to print but it follows the framework) 2. the granularity of sampling is down from split-level to HDFS block level. Summary --- We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples input splits with size at least n% of the original inputs. This addresses bug HIVE-2121. https://issues.apache.org/jira/browse/HIVE-2121 Diffs (updated) - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096852 trunk/conf/hive-default.xml 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1096852 trunk/ql/src/test/queries/clientnegative/split_sample_out_of_range.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_out_of_range.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/bucket1.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket2.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket3.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample10.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample2.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample3.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample4.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample5.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample6.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample7.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample8.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample9.q.out 1096852 trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 1096852 trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 1096852 trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 1096852 Diff: https://reviews.apache.org/r/633/diff Testing --- TestCliDriver TestNegativeCliDriver, manual tests on real clusters. Thanks, Siying Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be
[jira] [Updated] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2121: -- Status: Patch Available (was: Open) Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: ANNOUNCE: New PMC Member Carl Steinbach
Congratulations Carl.. Ashish On Apr 27, 2011, at 7:09 PM, John Sichi wrote: Hi all, The Hive Project Management Committee is happy to announce that Carl Steinbach has been voted in as a new PMC member. Carl is currently a very active committer and has successfully managed two Hive releases (0.6 and 0.7). His work on running Hive contributor meetups has helped foster an ever-growing development community. Congratulations, Carl! JVS
Re: ANNOUNCE: New PMC Member Carl Steinbach
Congrats, Carl ! On Thu, Apr 28, 2011 at 05:39, Ashish Thusoo athu...@fb.com wrote: Congratulations Carl.. Ashish On Apr 27, 2011, at 7:09 PM, John Sichi wrote: Hi all, The Hive Project Management Committee is happy to announce that Carl Steinbach has been voted in as a new PMC member. Carl is currently a very active committer and has successfully managed two Hive releases (0.6 and 0.7). His work on running Hive contributor meetups has helped foster an ever-growing development community. Congratulations, Carl! JVS
[jira] [Updated] (HIVE-2125) alter table concatenate fails and deletes data
[ https://issues.apache.org/jira/browse/HIVE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-2125: - Resolution: Fixed Status: Resolved (was: Patch Available) Just committed. Thanks Yongqiang! Also Yongqiang, can you file a JIRA to fix the comments in .q files (if it is not filed already)? alter table concatenate fails and deletes data -- Key: HIVE-2125 URL: https://issues.apache.org/jira/browse/HIVE-2125 Project: Hive Issue Type: Bug Reporter: Joydeep Sen Sarma Assignee: He Yongqiang Priority: Critical Attachments: HIVE-2125.1.patch, HIVE-2125.2.patch the number of reducers is not set by this command (unlike other hive queries). since mapred.reduce.tasks=-1 (to let hive infer this automatically) - jobtracker fails the job (number of reducers cannot be negative) hive alter table ad_imps_2 partition(ds='2009-06-16') concatenate; alter table ad_imps_2 partition(ds='2009-06-16') concatenate; Starting Job = job_201103101203_453180, Tracking URL = http://curium.data.facebook.com:50030/jobdetails.jsp?jobid=job_201103101203_453180 Kill Command = /mnt/vol/hive/sites/curium/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=curium.data.facebook.com:50029 -kill job_201103101203_453180 Hadoop job information for null: number of mappers: 0; number of reducers: 0 2011-04-22 10:21:24,046 null map = 100%, reduce = 100% Ended Job = job_201103101203_453180 with errors Moved to trash: /user/facebook/warehouse/ad_imps_2/_backup.ds=2009-06-16 after the job fails - the partition is deleted thankfully it's still in trash -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive's JDBC Driver version
Hello Hive People! I need to know what version of the Hive JDBC Driver I am working with so that I know what to expect back for column names when I execute a Select statement. For example in 0.5.0 SELECT account_id FROM account yields the column name _col0 whereas in 0.6.0 the same query returns the column name account_id. My question regards the correct way to store the driver version information. I see in HiveDatabaseMetaData I see that getVersion() fetches a full string from the manifest file while getDriverMajorVersion() and getDriverMinorVersion() return a local, static 0. The HiveDriver also provides Driver version information through its methods getMajorVersion() and getMinorVersion() and they both return static int 0 that are scoped to the class. I am primarily interested in the HiveDriver.get...Version() methods as I do not want to create a DB connection first to check the DatabaseMetaData, and my first thought is to update the MAJOR_VERSION/MINOR_VERSION values accordingly. Next I was going to change the ..._VERSION static int values to be package visible and use them in HiveDatabaseMeta's getDriverMajorVersion()/getDriverMinorVersion(). The cost is that someone must manually update the static int values on HiveDriver for each version revision. So I was wondering if it would be preferred that all version information is pulled from the manifest file, or if I should move forward with my usage of the HiveDriver static final ints. If the manifest version is to be used, should we create two new fields for Major Version / Minor Version or parse the already existing Implementation-Version. Thank you for your direction, -Curtis
[jira] [Updated] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2121: -- Attachment: HIVE-2121.6.patch forgot a file. Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-243 - ^C breaks out of running query, but not whole CLI.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/626/ --- (Updated 2011-04-28 20:54:48.288055) Review request for hive. Changes --- Added code to kill all running jobs before interrupting the current CLI thread. Summary --- Fixed by adding INT signal hander that interrupts the CLI thread. The CLI thread gets InterruptedException and it stops the current command. This addresses bug HIVE-243. https://issues.apache.org/jira/browse/HIVE-243 Diffs (updated) - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 1097569 Diff: https://reviews.apache.org/r/626/diff Testing --- Manual test Ctrl+C on bunch of command (show tables, select, etc) Thanks, georgedj
Re: Review Request: Input Sampling Splits
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/633/#review605 --- trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java https://reviews.apache.org/r/633/#comment1249 talked to siying offline - the check: if (split instanceof Hadoop20Shims.InputSplitShim) is not needed - this can be replaced by an assert. Same in Hadoop20SShims. Otherwise looks good - namit On 2011-04-28 08:32:17, Siying Dong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/633/ --- (Updated 2011-04-28 08:32:17) Review request for hive, Ning Zhang and namit jain. Summary --- We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples input splits with size at least n% of the original inputs. This addresses bug HIVE-2121. https://issues.apache.org/jira/browse/HIVE-2121 Diffs - trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096852 trunk/conf/hive-default.xml 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1096852 trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1096852 trunk/ql/src/test/queries/clientnegative/split_sample_out_of_range.q PRE-CREATION trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q PRE-CREATION trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_out_of_range.q.out PRE-CREATION trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out PRE-CREATION trunk/ql/src/test/results/clientpositive/bucket1.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket2.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucket3.q.out 1096852 trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample1.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample10.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample2.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample3.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample4.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample5.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample6.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample7.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample8.q.out 1096852 trunk/ql/src/test/results/clientpositive/sample9.q.out 1096852 trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 1096852 trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 1096852 trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 1096852 Diff: https://reviews.apache.org/r/633/diff Testing --- TestCliDriver TestNegativeCliDriver, manual tests on real clusters. Thanks, Siying
[jira] [Updated] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-2121: - Status: Open (was: Patch Available) Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2121: -- Attachment: HIVE-2121.7.patch move instanceof InputSplitShim to assert. Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-243 - ^C breaks out of running query, but not whole CLI.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/626/#review609 --- trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java https://reviews.apache.org/r/626/#comment1252 This change kills the launched MR job. however the following interrupt() only called to the current thread. This does not really behaves what it is expected to be when there are multiple threads running. For example when we are getting partitions from metastore (JDO may use its thread pool) and when getting HDFS summaries (we are also using thread pool). For these situations we probably need to interrupt all child threads. - Ning On 2011-04-28 20:54:48, georgedj wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/626/ --- (Updated 2011-04-28 20:54:48) Review request for hive. Summary --- Fixed by adding INT signal hander that interrupts the CLI thread. The CLI thread gets InterruptedException and it stops the current command. This addresses bug HIVE-243. https://issues.apache.org/jira/browse/HIVE-243 Diffs - trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 1097569 Diff: https://reviews.apache.org/r/626/diff Testing --- Manual test Ctrl+C on bunch of command (show tables, select, etc) Thanks, georgedj
[jira] [Commented] (HIVE-243) ^C breaks out of running query, but not whole CLI
[ https://issues.apache.org/jira/browse/HIVE-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026743#comment-13026743 ] jirapos...@reviews.apache.org commented on HIVE-243: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/626/#review609 --- trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java https://reviews.apache.org/r/626/#comment1252 This change kills the launched MR job. however the following interrupt() only called to the current thread. This does not really behaves what it is expected to be when there are multiple threads running. For example when we are getting partitions from metastore (JDO may use its thread pool) and when getting HDFS summaries (we are also using thread pool). For these situations we probably need to interrupt all child threads. - Ning On 2011-04-28 20:54:48, georgedj wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/626/ bq. --- bq. bq. (Updated 2011-04-28 20:54:48) bq. bq. bq. Review request for hive. bq. bq. bq. Summary bq. --- bq. bq. Fixed by adding INT signal hander that interrupts the CLI thread. The CLI thread gets InterruptedException and it stops the current command. bq. bq. bq. This addresses bug HIVE-243. bq. https://issues.apache.org/jira/browse/HIVE-243 bq. bq. bq. Diffs bq. - bq. bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 1097569 bq. bq. Diff: https://reviews.apache.org/r/626/diff bq. bq. bq. Testing bq. --- bq. bq. Manual test Ctrl+C on bunch of command (show tables, select, etc) bq. bq. bq. Thanks, bq. bq. georgedj bq. bq. ^C breaks out of running query, but not whole CLI - Key: HIVE-243 URL: https://issues.apache.org/jira/browse/HIVE-243 Project: Hive Issue Type: Wish Components: Query Processor Affects Versions: 0.8.0 Reporter: Adam Kramer It would be lovely if, when I know a query is bad, I could just ^C out of it. I can do that now, but the whole CLI quits. It'd be quite nice if it took an extra ^C to break the CLI, or if there was some control character to break out of a query without breaking out of the CLI. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-895: --- Assignee: Carl Steinbach Add SerDe for Avro serialized data -- Key: HIVE-895 URL: https://issues.apache.org/jira/browse/HIVE-895 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Jeff Hammerbacher Assignee: Carl Steinbach As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-895: --- Assignee: Jakob Homan (was: Carl Steinbach) Add SerDe for Avro serialized data -- Key: HIVE-895 URL: https://issues.apache.org/jira/browse/HIVE-895 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Jeff Hammerbacher Assignee: Jakob Homan As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026760#comment-13026760 ] Carl Steinbach commented on HIVE-895: - @Jakob: There's lots of interest :) Please post the patch, even if it's a WIP. Add SerDe for Avro serialized data -- Key: HIVE-895 URL: https://issues.apache.org/jira/browse/HIVE-895 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Jeff Hammerbacher Assignee: Jakob Homan As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2121) Input Sampling By Splits
[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026779#comment-13026779 ] Namit Jain commented on HIVE-2121: -- +1 Input Sampling By Splits Key: HIVE-2121 URL: https://issues.apache.org/jira/browse/HIVE-2121 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Siying Dong Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch We need a better input sampling to serve at least two purposes: 1. test their queries against a smaller data set 2. understand more about how the data look like without scanning the whole table. A simple function that gives a subset splits will help in those cases. It doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/ --- (Updated 2011-04-29 00:01:06.921150) Review request for hive. Changes --- HIVE-1644.17.patch Summary --- Review request for HIVE-1644.12.patch This addresses bug HIVE-1644. https://issues.apache.org/jira/browse/HIVE-1644 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f77f46c conf/hive-default.xml 6bd615e eclipse-templates/.classpath 8d2dc52 ql/src/java/org/apache/hadoop/hive/ql/Driver.java ca337a8 ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 24e16e4 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java f90d64f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 374e123 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2207ac4 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 ql/src/test/queries/clientpositive/index_auto.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_file_format.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_multiple.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_test_if_used.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_unused.q PRE-CREATION ql/src/test/results/clientpositive/index_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_file_format.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_multiple.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_test_if_used.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_unused.q.out PRE-CREATION Diff: https://reviews.apache.org/r/558/diff Testing --- Thanks, Russell
[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026789#comment-13026789 ] jirapos...@reviews.apache.org commented on HIVE-1644: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/ --- (Updated 2011-04-29 00:01:06.921150) Review request for hive. Changes --- HIVE-1644.17.patch Summary --- Review request for HIVE-1644.12.patch This addresses bug HIVE-1644. https://issues.apache.org/jira/browse/HIVE-1644 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f77f46c conf/hive-default.xml 6bd615e eclipse-templates/.classpath 8d2dc52 ql/src/java/org/apache/hadoop/hive/ql/Driver.java ca337a8 ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 24e16e4 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java f90d64f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 374e123 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2207ac4 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 ql/src/test/queries/clientpositive/index_auto.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_file_format.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_multiple.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_partitioned.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_test_if_used.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_unused.q PRE-CREATION ql/src/test/results/clientpositive/index_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_file_format.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_multiple.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_partitioned.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_test_if_used.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_unused.q.out PRE-CREATION Diff: https://reviews.apache.org/r/558/diff Testing --- Thanks, Russell use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes
[ https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Melick updated HIVE-1644: - Attachment: HIVE-1644.18.patch patch 18 I moved that logic into a helper method, but I'm not seeing the settings being changed in build/ql/tmp/hive.log When I have the unit test use a predicate like key=86 instead of key 45 AND key 55, I see the following error {{{ java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82) at org.apache.hadoop.fs.Path.init(Path.java:90) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:224) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:282) at org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:123) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) ... }}} It seems like this is causing a problem when there are no blocks to return. use filter pushdown for automatically accessing indexes --- Key: HIVE-1644 URL: https://issues.apache.org/jira/browse/HIVE-1644 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: John Sichi Assignee: Russell Melick Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, HIVE-1644.17.patch, HIVE-1644.18.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch HIVE-1226 provides utilities for analyzing filters which have been pushed down to a table scan. The next step is to use these for selecting available indexes and generating access plans for those indexes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira