[jira] [Commented] (HIVE-2136) Add get_version() call to Thrift API

2011-04-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026150#comment-13026150
 ] 

Carl Steinbach commented on HIVE-2136:
--

We may also want to consider wrapping this in some kind of generic mechanism 
similar to ODBC's SQLGetInfo call.

 Add get_version() call to Thrift API
 

 Key: HIVE-2136
 URL: https://issues.apache.org/jira/browse/HIVE-2136
 Project: Hive
  Issue Type: Improvement
  Components: Thrift API
Reporter: Carl Steinbach

 Clients need to be able to determine the version of the HiveServer and 
 HiveMetastore.
 Open questions:
 * Should there be separate methods for determining the HiveServer and 
 HiveMetaStore versions?
 * Should the return value be a string, or should we have separate integer 
 valued methods that return the major/minor/patch versions separately (the 
 latter would be easier for clients written in C).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026204#comment-13026204
 ] 

jirapos...@reviews.apache.org commented on HIVE-2121:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/633/
---

(Updated 2011-04-28 08:32:17.534107)


Review request for hive, Ning Zhang and namit jain.


Changes
---

Two changes made according to Namit's comments:
1. explain will print out some about the sampling. (It might not be the best 
way to print but it follows the framework)
2. the granularity of sampling is down from split-level to HDFS block level.


Summary
---

We need a better input sampling to serve at least two purposes:
1. test their queries against a smaller data set
2. understand more about how the data look like without scanning the whole 
table.
A simple function that gives a subset splits will help in those cases. It 
doesn't have to be strict sampling.

This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples 
input splits with size at least n% of the original inputs.


This addresses bug HIVE-2121.
https://issues.apache.org/jira/browse/HIVE-2121


Diffs (updated)
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096852 
  trunk/conf/hive-default.xml 1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1096852 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java 
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1096852 
  trunk/ql/src/test/queries/clientnegative/split_sample_out_of_range.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/split_sample_out_of_range.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/bucket1.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/bucket2.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/bucket3.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample1.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample10.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample2.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample3.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample4.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample5.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample6.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample7.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample8.q.out 1096852 
  trunk/ql/src/test/results/clientpositive/sample9.q.out 1096852 
  trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 
1096852 
  trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
1096852 
  trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 
1096852 

Diff: https://reviews.apache.org/r/633/diff


Testing
---

TestCliDriver TestNegativeCliDriver, manual tests on real clusters.


Thanks,

Siying



 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be 

[jira] [Updated] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2121:
--

Status: Patch Available  (was: Open)

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: ANNOUNCE: New PMC Member Carl Steinbach

2011-04-28 Thread Ashish Thusoo
Congratulations Carl..

Ashish
On Apr 27, 2011, at 7:09 PM, John Sichi wrote:

 Hi all,
 
 The Hive Project Management Committee is happy to announce that Carl 
 Steinbach has been voted in as a new PMC member.  Carl is currently a very 
 active committer and has successfully managed two Hive releases (0.6 and 
 0.7).  His work on running Hive contributor meetups has helped foster an 
 ever-growing development community.
 
 Congratulations, Carl!
 
 JVS
 



Re: ANNOUNCE: New PMC Member Carl Steinbach

2011-04-28 Thread Ashutosh Chauhan
Congrats, Carl !

On Thu, Apr 28, 2011 at 05:39, Ashish Thusoo athu...@fb.com wrote:
 Congratulations Carl..

 Ashish
 On Apr 27, 2011, at 7:09 PM, John Sichi wrote:

 Hi all,

 The Hive Project Management Committee is happy to announce that Carl 
 Steinbach has been voted in as a new PMC member.  Carl is currently a very 
 active committer and has successfully managed two Hive releases (0.6 and 
 0.7).  His work on running Hive contributor meetups has helped foster an 
 ever-growing development community.

 Congratulations, Carl!

 JVS





[jira] [Updated] (HIVE-2125) alter table concatenate fails and deletes data

2011-04-28 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2125:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Just committed. Thanks Yongqiang!

Also Yongqiang, can you file a JIRA to fix the comments in .q files (if it is 
not filed already)?

 alter table concatenate fails and deletes data
 --

 Key: HIVE-2125
 URL: https://issues.apache.org/jira/browse/HIVE-2125
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: He Yongqiang
Priority: Critical
 Attachments: HIVE-2125.1.patch, HIVE-2125.2.patch


 the number of reducers is not set by this command (unlike other hive 
 queries). since mapred.reduce.tasks=-1 (to let hive infer this automatically) 
 - jobtracker fails the job (number of reducers cannot be negative)
 hive alter table ad_imps_2 partition(ds='2009-06-16') concatenate;
 alter table ad_imps_2 partition(ds='2009-06-16') concatenate;
 Starting Job = job_201103101203_453180, Tracking URL = 
 http://curium.data.facebook.com:50030/jobdetails.jsp?jobid=job_201103101203_453180
 Kill Command = /mnt/vol/hive/sites/curium/hadoop/bin/../bin/hadoop job  
 -Dmapred.job.tracker=curium.data.facebook.com:50029 -kill 
 job_201103101203_453180
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2011-04-22 10:21:24,046 null map = 100%,  reduce = 100%
 Ended Job = job_201103101203_453180 with errors
 Moved to trash: /user/facebook/warehouse/ad_imps_2/_backup.ds=2009-06-16
 after the job fails - the partition is deleted
 thankfully it's still in trash

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive's JDBC Driver version

2011-04-28 Thread Curtis Boyden
Hello Hive People!

   I need to know what version of the Hive JDBC Driver I am working 
with so that I know what to expect back for column names when I execute a 
Select statement. For example in 0.5.0 SELECT account_id FROM account yields 
the column name _col0 whereas in 0.6.0 the same query returns the column name 
account_id.

   My question regards the correct way to store the driver version 
information. I see in HiveDatabaseMetaData I see that getVersion() fetches a 
full string from the manifest file while getDriverMajorVersion() and 
getDriverMinorVersion() return a local, static 0. The HiveDriver also provides 
Driver version information through its methods getMajorVersion() and 
getMinorVersion() and they both return static int 0 that are scoped to the 
class.

   I am primarily interested in the HiveDriver.get...Version() 
methods as I do not want to create a DB connection first to check the 
DatabaseMetaData, and my first thought is to update the 
MAJOR_VERSION/MINOR_VERSION values accordingly. Next I was going to change the 
..._VERSION static int values to be package visible and use them in 
HiveDatabaseMeta's getDriverMajorVersion()/getDriverMinorVersion().
   The cost is that someone must manually update the static int 
values on HiveDriver for each version revision.

   So I was wondering if it would be preferred that all version 
information is pulled from the manifest file, or if I should move forward with 
my usage of the HiveDriver static final ints.

   If the manifest version is to be used, should we create two new 
fields for Major Version / Minor Version or parse the already existing 
Implementation-Version.


   Thank you for your direction,
   -Curtis


[jira] [Updated] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2121:
--

Attachment: HIVE-2121.6.patch

forgot a file.

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-243 - ^C breaks out of running query, but not whole CLI.

2011-04-28 Thread djabarov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/626/
---

(Updated 2011-04-28 20:54:48.288055)


Review request for hive.


Changes
---

Added code to kill all running jobs before interrupting the current CLI thread.


Summary
---

Fixed by adding INT signal hander that interrupts the CLI thread. The CLI 
thread gets InterruptedException and it stops the current command.


This addresses bug HIVE-243.
https://issues.apache.org/jira/browse/HIVE-243


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 
1097569 

Diff: https://reviews.apache.org/r/626/diff


Testing
---

Manual test Ctrl+C on bunch of command (show tables, select, etc)


Thanks,

georgedj



Re: Review Request: Input Sampling Splits

2011-04-28 Thread namit jain

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/633/#review605
---



trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
https://reviews.apache.org/r/633/#comment1249

talked to siying offline -

the check:

if (split instanceof Hadoop20Shims.InputSplitShim)


is not needed - this can be replaced by an assert.

Same in Hadoop20SShims.


Otherwise looks good


- namit


On 2011-04-28 08:32:17, Siying Dong wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/633/
 ---
 
 (Updated 2011-04-28 08:32:17)
 
 
 Review request for hive, Ning Zhang and namit jain.
 
 
 Summary
 ---
 
 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.
 
 This diff allows a syntax of .. table TABLESAMPLE(n PERCENT), which samples 
 input splits with size at least n% of the original inputs.
 
 
 This addresses bug HIVE-2121.
 https://issues.apache.org/jira/browse/HIVE-2121
 
 
 Diffs
 -
 
   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1096852 
   trunk/conf/hive-default.xml 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinFactory.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
 1096852 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SplitSample.java 
 PRE-CREATION 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1096852 
   trunk/ql/src/test/queries/clientnegative/split_sample_out_of_range.q 
 PRE-CREATION 
   trunk/ql/src/test/queries/clientnegative/split_sample_wrong_format.q 
 PRE-CREATION 
   trunk/ql/src/test/queries/clientpositive/split_sample.q PRE-CREATION 
   trunk/ql/src/test/results/clientnegative/split_sample_out_of_range.q.out 
 PRE-CREATION 
   trunk/ql/src/test/results/clientnegative/split_sample_wrong_format.q.out 
 PRE-CREATION 
   trunk/ql/src/test/results/clientpositive/bucket1.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/bucket2.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/bucket3.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample1.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample10.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample2.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample3.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample4.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample5.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample6.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample7.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample8.q.out 1096852 
   trunk/ql/src/test/results/clientpositive/sample9.q.out 1096852 
   trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 
 1096852 
   trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
 1096852 
   trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 1096852 
 
 Diff: https://reviews.apache.org/r/633/diff
 
 
 Testing
 ---
 
 TestCliDriver TestNegativeCliDriver, manual tests on real clusters.
 
 
 Thanks,
 
 Siying
 




[jira] [Updated] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2121:
-

Status: Open  (was: Patch Available)

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2121:
--

Attachment: HIVE-2121.7.patch

move instanceof InputSplitShim to assert.

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-243 - ^C breaks out of running query, but not whole CLI.

2011-04-28 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/626/#review609
---



trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
https://reviews.apache.org/r/626/#comment1252

This change kills the launched MR job. however the following interrupt() 
only called to the current thread. This does not really behaves what it is 
expected to be when there are multiple threads running. 

For example when we are getting partitions from metastore (JDO may use its 
thread pool) and when getting HDFS summaries (we are also using thread pool). 
For these situations we probably need to interrupt all child threads. 


- Ning


On 2011-04-28 20:54:48, georgedj wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/626/
 ---
 
 (Updated 2011-04-28 20:54:48)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Fixed by adding INT signal hander that interrupts the CLI thread. The CLI 
 thread gets InterruptedException and it stops the current command.
 
 
 This addresses bug HIVE-243.
 https://issues.apache.org/jira/browse/HIVE-243
 
 
 Diffs
 -
 
   trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 
 1097569 
 
 Diff: https://reviews.apache.org/r/626/diff
 
 
 Testing
 ---
 
 Manual test Ctrl+C on bunch of command (show tables, select, etc)
 
 
 Thanks,
 
 georgedj
 




[jira] [Commented] (HIVE-243) ^C breaks out of running query, but not whole CLI

2011-04-28 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026743#comment-13026743
 ] 

jirapos...@reviews.apache.org commented on HIVE-243:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/626/#review609
---



trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java
https://reviews.apache.org/r/626/#comment1252

This change kills the launched MR job. however the following interrupt() 
only called to the current thread. This does not really behaves what it is 
expected to be when there are multiple threads running. 

For example when we are getting partitions from metastore (JDO may use its 
thread pool) and when getting HDFS summaries (we are also using thread pool). 
For these situations we probably need to interrupt all child threads. 


- Ning


On 2011-04-28 20:54:48, georgedj wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/626/
bq.  ---
bq.  
bq.  (Updated 2011-04-28 20:54:48)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Fixed by adding INT signal hander that interrupts the CLI thread. The CLI 
thread gets InterruptedException and it stops the current command.
bq.  
bq.  
bq.  This addresses bug HIVE-243.
bq.  https://issues.apache.org/jira/browse/HIVE-243
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1097569 
bq.
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 
1097569 
bq.  
bq.  Diff: https://reviews.apache.org/r/626/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Manual test Ctrl+C on bunch of command (show tables, select, etc)
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  georgedj
bq.  
bq.



 ^C breaks out of running query, but not whole CLI
 -

 Key: HIVE-243
 URL: https://issues.apache.org/jira/browse/HIVE-243
 Project: Hive
  Issue Type: Wish
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Adam Kramer

 It would be lovely if, when I know a query is bad, I could just ^C out of it. 
 I can do that now, but the whole CLI quits.
 It'd be quite nice if it took an extra ^C to break the CLI, or if there was 
 some control character to break out of a query without breaking out of the 
 CLI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-895) Add SerDe for Avro serialized data

2011-04-28 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-895:
---

Assignee: Carl Steinbach

 Add SerDe for Avro serialized data
 --

 Key: HIVE-895
 URL: https://issues.apache.org/jira/browse/HIVE-895
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Jeff Hammerbacher
Assignee: Carl Steinbach

 As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro 
 data seems like a solid win.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-895) Add SerDe for Avro serialized data

2011-04-28 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-895:
---

Assignee: Jakob Homan  (was: Carl Steinbach)

 Add SerDe for Avro serialized data
 --

 Key: HIVE-895
 URL: https://issues.apache.org/jira/browse/HIVE-895
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Jeff Hammerbacher
Assignee: Jakob Homan

 As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro 
 data seems like a solid win.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-895) Add SerDe for Avro serialized data

2011-04-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026760#comment-13026760
 ] 

Carl Steinbach commented on HIVE-895:
-

@Jakob: There's lots of interest :) Please post the patch, even if it's a WIP.

 Add SerDe for Avro serialized data
 --

 Key: HIVE-895
 URL: https://issues.apache.org/jira/browse/HIVE-895
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Jeff Hammerbacher
Assignee: Jakob Homan

 As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro 
 data seems like a solid win.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2121) Input Sampling By Splits

2011-04-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026779#comment-13026779
 ] 

Namit Jain commented on HIVE-2121:
--

+1

 Input Sampling By Splits
 

 Key: HIVE-2121
 URL: https://issues.apache.org/jira/browse/HIVE-2121
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
 HIVE-2121.4.patch, HIVE-2121.5.patch, HIVE-2121.6.patch, HIVE-2121.7.patch


 We need a better input sampling to serve at least two purposes:
 1. test their queries against a smaller data set
 2. understand more about how the data look like without scanning the whole 
 table.
 A simple function that gives a subset splits will help in those cases. It 
 doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes

2011-04-28 Thread Russell Melick

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/
---

(Updated 2011-04-29 00:01:06.921150)


Review request for hive.


Changes
---

HIVE-1644.17.patch


Summary
---

Review request for HIVE-1644.12.patch


This addresses bug HIVE-1644.
https://issues.apache.org/jira/browse/HIVE-1644


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f77f46c 
  conf/hive-default.xml 6bd615e 
  eclipse-templates/.classpath 8d2dc52 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java ca337a8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 24e16e4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b 
  ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
f90d64f 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 
0ae9fa2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 374e123 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2207ac4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 
  ql/src/test/queries/clientpositive/index_auto.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_file_format.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_multiple.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_partitioned.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_test_if_used.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_unused.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_file_format.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_multiple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_partitioned.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_test_if_used.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_unused.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/558/diff


Testing
---


Thanks,

Russell



[jira] [Commented] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-28 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026789#comment-13026789
 ] 

jirapos...@reviews.apache.org commented on HIVE-1644:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/
---

(Updated 2011-04-29 00:01:06.921150)


Review request for hive.


Changes
---

HIVE-1644.17.patch


Summary
---

Review request for HIVE-1644.12.patch


This addresses bug HIVE-1644.
https://issues.apache.org/jira/browse/HIVE-1644


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java f77f46c 
  conf/hive-default.xml 6bd615e 
  eclipse-templates/.classpath 8d2dc52 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java ca337a8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 24e16e4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b 
  ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
f90d64f 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 
0ae9fa2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 374e123 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2207ac4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 
  ql/src/test/queries/clientpositive/index_auto.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_file_format.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_multiple.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_partitioned.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_test_if_used.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_auto_unused.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_file_format.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_multiple.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_partitioned.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_test_if_used.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_auto_unused.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/558/diff


Testing
---


Thanks,

Russell



 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, 
 HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, 
 HIVE-1644.2.patch, HIVE-1644.3.patch, HIVE-1644.4.patch, HIVE-1644.5.patch, 
 HIVE-1644.6.patch, HIVE-1644.7.patch, HIVE-1644.8.patch, HIVE-1644.9.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-04-28 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Attachment: HIVE-1644.18.patch

patch 18

I moved that logic into a helper method, but I'm not seeing the settings being 
changed in build/ql/tmp/hive.log

When I have the unit test use a predicate like key=86 instead of key  45 AND 
key  55, I see the following error
{{{
  java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
at org.apache.hadoop.fs.Path.init(Path.java:90)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:224)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:282)
at 
org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:123)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
...
}}}

It seems like this is causing a problem when there are no blocks to return.

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.8.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.10.patch, 
 HIVE-1644.11.patch, HIVE-1644.12.patch, HIVE-1644.13.patch, 
 HIVE-1644.14.patch, HIVE-1644.15.patch, HIVE-1644.16.patch, 
 HIVE-1644.17.patch, HIVE-1644.18.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
 HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch, 
 HIVE-1644.8.patch, HIVE-1644.9.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira