Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu

2010-11-09 Thread Yongqiang He
Congrats amareshwari!

Yongqiang

On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote:

 Hi Folks,
 
 The Hive PMC has passed the vote to make Amareshwari Sriramadasu a
 new committer on the Apache Hive project.
 
 Following is a list of the contributions that Amareshwari has made to the
 project:
 
 http://bit.ly/c3z0ty
 
 Congratulations Amareshwari.
 Please send over your CLA to Apache.
 
 
 Thanks,
 Namit


Confusing in hive-default.xml

2010-11-09 Thread 김영우
Hi,

Myabe it is a typo but i'm not sure.

Excerpt from hive-default.xml (trunk):
property
  namehive.input.format/name
  valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value
  descriptionThe default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. /description
/property

The 'CombinedHiveInputFormat' does not exist. It should be
'CombineHiveInputFormat' so the property's value is
'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'.

I don't know whether it's a intention or typo.

Thanks.

- Youngwoo


[jira] Resolved: (HIVE-1766) Dynamic partition is not working as expected.

2010-11-09 Thread Saravanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saravanan resolved HIVE-1766.
-

  Resolution: Not A Problem
Release Note: The data format was wrong. I have used CSV format with 
quotes. Which was not supported in hive.

The data format was wrong. I have used CSV format with quotes. Which was not 
supported in hive.

 Dynamic partition is not working as expected. 
 --

 Key: HIVE-1766
 URL: https://issues.apache.org/jira/browse/HIVE-1766
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.6.0, 0.7.0
 Environment: Linux, Got the latest code from hive trunk and also 
 tested in 0.6. hadoop version 0.20
Reporter: Saravanan
 Fix For: 0.7.0, 0.6.0


 Create source table
 --
 CREATE EXTERNAL TABLE testmove (
   a  string,  
   b string
   )
   PARTITIONED BY (cust string, dt string);
 Data has been kept in /usr/hive/warehouse/testmove/cust=a/dt=20100102/a.txt
 a.txt has 1 row the value is a, b
 Create Destination table
 ---
 CREATE EXTERNAL TABLE testmove1 (
   a  string,  
   b string
   )
   PARTITIONED BY (cust string, dt string)
 Run the query for dynamic partion insert
 ---
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 FROM testmove t
 INSERT OVERWRITE TABLE testmove1 PARTITION (cust, dt)
   SELECT t.a, t.b, 'a', '20100102';
 output
 ---
 otal MapReduce jobs = 2
 Launching Job 1 out of 2
 Number of reduce tasks is set to 0 since there's no reduce operator
 Execution log at: 
 /tmp/root/root_20101103170404_9e869676-7bb5-4655-b027-5bcb4b7fa2cb.log
 Job running in-process (local Hadoop)
 2010-11-03 17:04:06,818 null map = 100%,  reduce = 0%
 Ended Job = job_local_0001
 Ended Job = -64572, job is filtered out (removed at runtime).
 Moving data to: 
 file:/tmp/hive-root/hive_2010-11-03_17-03-59_979_5901061386316364507/-ext-1
 Loading data to table testmove1 partition (cust=null, dt=null)
 [Warning] could not update stats.
 OK
 If i run as static partion is the data is inserted in to destination table.
 FROM testmove t
 INSERT OVERWRITE TABLE testmove1 PARTITION (cust='a', dt='20100102')
   SELECT t.a, t.b;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu

2010-11-09 Thread Edward Capriolo
Welcome!

On Tue, Nov 9, 2010 at 3:28 AM, Yongqiang He heyongqiang...@gmail.com wrote:
 Congrats amareshwari!

 Yongqiang

 On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote:

 Hi Folks,

 The Hive PMC has passed the vote to make Amareshwari Sriramadasu a
 new committer on the Apache Hive project.

 Following is a list of the contributions that Amareshwari has made to the
 project:

 http://bit.ly/c3z0ty

 Congratulations Amareshwari.
 Please send over your CLA to Apache.


 Thanks,
 Namit



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: hive-1754_4.patch

Resolved all the output conflicts in this patch

 Remove JDBM component from Map Join
 ---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
 hive-1754_4.patch


 Right now, JDBM is the major performance bottleneck of performance.
 With the growth of the small table, the PUT and GET operation will take most 
 of execution time.
 Map Join is designed to load the data of small table into memory. 
 If the data is too large to hold in memory, then there is no need to use the 
 map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Confusing in hive-default.xml

2010-11-09 Thread Paul Yang
Yes, good catch, that is just a typo. It should be CombineHiveInputFormat. If 
you want to fix it, could you file a JIRA for this?

Cheers,
Paul

-Original Message-
From: 김영우 [mailto:warwit...@gmail.com] 
Sent: Tuesday, November 09, 2010 3:12 AM
To: dev@hive.apache.org
Subject: Confusing in hive-default.xml

Hi,

Myabe it is a typo but i'm not sure.

Excerpt from hive-default.xml (trunk):
property
  namehive.input.format/name
  valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value
  descriptionThe default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. /description
/property

The 'CombinedHiveInputFormat' does not exist. It should be
'CombineHiveInputFormat' so the property's value is
'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'.

I don't know whether it's a intention or typo.

Thanks.

- Youngwoo


Re: [ANNOUNCE] New Hive Committer - Amareshwari Sriramadasu

2010-11-09 Thread Carl Steinbach
Congratulations Amareshwari!

Carl

On Tue, Nov 9, 2010 at 8:33 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Welcome!

 On Tue, Nov 9, 2010 at 3:28 AM, Yongqiang He heyongqiang...@gmail.com
 wrote:
  Congrats amareshwari!
 
  Yongqiang
 
  On Nov 8, 2010, at 6:00 PM, Namit Jain nj...@facebook.com wrote:
 
  Hi Folks,
 
  The Hive PMC has passed the vote to make Amareshwari Sriramadasu a
  new committer on the Apache Hive project.
 
  Following is a list of the contributions that Amareshwari has made to
 the
  project:
 
  http://bit.ly/c3z0ty
 
  Congratulations Amareshwari.
  Please send over your CLA to Apache.
 
 
  Thanks,
  Namit
 



[jira] Resolved: (HIVE-1775) Assertation on inputObjInspectors.length in Groupy operator

2010-11-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HIVE-1775.
--

  Resolution: Fixed
Release Note: This bug is fixed in Hive-1754

 Assertation on inputObjInspectors.length in Groupy operator
 ---

 Key: HIVE-1775
 URL: https://issues.apache.org/jira/browse/HIVE-1775
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0


 In the Groupby Operator:
 Line 188: assert (inputObjInspectors.length == 1); 
 But this assertion may not necessary true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join

2010-11-09 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930291#action_12930291
 ] 

He Yongqiang commented on HIVE-1754:


will take a close look.

 Remove JDBM component from Map Join
 ---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
 hive-1754_4.patch, hive-1754_5.patch


 Right now, JDBM is the major performance bottleneck of performance.
 With the growth of the small table, the PUT and GET operation will take most 
 of execution time.
 Map Join is designed to load the data of small table into memory. 
 If the data is too large to hold in memory, then there is no need to use the 
 map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Confusing in hive-default.xml

2010-11-09 Thread Namit Jain
It is a typo. 
Can you file a jira and fix it ?


Thanks,
-namit


-Original Message-
From: 김영우 [mailto:warwit...@gmail.com] 
Sent: Tuesday, November 09, 2010 3:12 AM
To: dev@hive.apache.org
Subject: Confusing in hive-default.xml

Hi,

Myabe it is a typo but i'm not sure.

Excerpt from hive-default.xml (trunk):
property
  namehive.input.format/name
  valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value
  descriptionThe default input format, if it is not specified, the system
assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
always be manually set to HiveInputFormat. /description
/property

The 'CombinedHiveInputFormat' does not exist. It should be
'CombineHiveInputFormat' so the property's value is
'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'.

I don't know whether it's a intention or typo.

Thanks.

- Youngwoo


Re: Review Request: HIVE-1771: ROUND(infinity) chokes

2010-11-09 Thread Paul Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53/#review31
---

Ship it!


+1 Looks good. Will test/commit.

- Paul


On 2010-11-08 18:36:25, John Sichi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/53/
 ---
 
 (Updated 2010-11-08 18:36:25)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 Review request from jvs.
 
 
 This addresses bug HIVE-1771.
 https://issues.apache.org/jira/browse/HIVE-1771
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRound.java
  1032795 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/udf_round.q
  1032795 
   
 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/udf_round.q.out
  1032795 
 
 Diff: https://reviews.apache.org/r/53/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 John
 




[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-11-09 Thread Paul Butler (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930326#action_12930326
 ] 

Paul Butler commented on HIVE-1648:
---

I get a bunch of tests failing when I build the latest trunk, even without 
applying my patch. I'm trying to figure out what's wrong with those first.

 Automatically gathering stats when reading a table/partition
 

 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Paul Butler
 Attachments: HIVE-1648.2.patch, HIVE-1648.patch


 HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
 gathering stats. This requires additional scan of the data. Stats gathering 
 can be piggy-backed on TableScanOperator whenever a table/partition is 
 scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-11-09 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930321#action_12930321
 ] 

Namit Jain commented on HIVE-1648:
--

Paul, any updates on the unit tests ?

 Automatically gathering stats when reading a table/partition
 

 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Paul Butler
 Attachments: HIVE-1648.2.patch, HIVE-1648.patch


 HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
 gathering stats. This requires additional scan of the data. Stats gathering 
 can be piggy-backed on TableScanOperator whenever a table/partition is 
 scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



jmx metrics for metastore server

2010-11-09 Thread Sushanth Sowmyan
Hi all,

We were looking at monitoring requirements from within Howl, which
essentially translates to monitoring requirements for the Metastore
server and would like to add in volume and latency jmx counters to
metastore server calls.

I notice an old and somewhat inactive jira :
https://issues.apache.org/jira/browse/HIVE-551 which seems to mention
jmx metrics for hive, but from an overall point of view, where each
query running on hive, and the cli would need tracking. If we started
work on just metastore monitoring for now, is there interest in such a
thing?

Is there any other jira or plan to do anything similar? If not, I can
open up a jira on it and can work on it.

-Sushanth


Re: jmx metrics for metastore server

2010-11-09 Thread Edward Capriolo
On Tue, Nov 9, 2010 at 5:38 PM, Sushanth Sowmyan khorg...@gmail.com wrote:
 Hi all,

 We were looking at monitoring requirements from within Howl, which
 essentially translates to monitoring requirements for the Metastore
 server and would like to add in volume and latency jmx counters to
 metastore server calls.

 I notice an old and somewhat inactive jira :
 https://issues.apache.org/jira/browse/HIVE-551 which seems to mention
 jmx metrics for hive, but from an overall point of view, where each
 query running on hive, and the cli would need tracking. If we started
 work on just metastore monitoring for now, is there interest in such a
 thing?

 Is there any other jira or plan to do anything similar? If not, I can
 open up a jira on it and can work on it.

 -Sushanth

You are correct I have that ticket open and the Metrics are query
centric. I think metastore counters makes sense however I wonder what
% of people are using a Metastore server as opposed to just using a
LOCAL (JDBC) metastore?

Edward


[jira] Updated: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system

2010-11-09 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1776:


Attachment: HIVE-1776.1.patch

the problem is that tasks are trying to modify the shared hive configuration 
object and trampling each other. fix is to clone the configuration object 
before modifying it in the Task.

 parallel execution and auto-local mode combine to place plan file in wrong 
 file system
 --

 Key: HIVE-1776
 URL: https://issues.apache.org/jira/browse/HIVE-1776
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
 Attachments: HIVE-1776.1.patch


 A query (that i can't reproduce verbatim) submits a job to a MR cluster with 
 a plan file that is resident on  the local file system. This job obviously 
 fails.
 This seems to result from an interaction between the parallel execution 
 (which is trying to run one local and one remote job at the same time). 
 Turning off either the parallel execution mode or the auto-local mode seems 
 to fix the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system

2010-11-09 Thread Ted Yu
This line is removed from MapRedTask.java:

ctx.setOriginalTracker(conf.getVar(HiveConf.ConfVars.HADOOPJT));

I assume this is intentional.

On Tue, Nov 9, 2010 at 3:28 PM, Joydeep Sen Sarma (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Joydeep Sen Sarma updated HIVE-1776:
 

Attachment: HIVE-1776.1.patch

 the problem is that tasks are trying to modify the shared hive
 configuration object and trampling each other. fix is to clone the
 configuration object before modifying it in the Task.

  parallel execution and auto-local mode combine to place plan file in
 wrong file system
 
 --
 
  Key: HIVE-1776
  URL: https://issues.apache.org/jira/browse/HIVE-1776
  Project: Hive
   Issue Type: Bug
 Reporter: Joydeep Sen Sarma
  Attachments: HIVE-1776.1.patch
 
 
  A query (that i can't reproduce verbatim) submits a job to a MR cluster
 with a plan file that is resident on  the local file system. This job
 obviously fails.
  This seems to result from an interaction between the parallel execution
 (which is trying to run one local and one remote job at the same time).
 Turning off either the parallel execution mode or the auto-local mode seems
 to fix the problem.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




[jira] Commented: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system

2010-11-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930359#action_12930359
 ] 

Ted Yu commented on HIVE-1776:
--

This line is removed from MapRedTask.java:

ctx.setOriginalTracker(conf.getVar(HiveConf.ConfVars.HADOOPJT));

I assume this is intentional.




 parallel execution and auto-local mode combine to place plan file in wrong 
 file system
 --

 Key: HIVE-1776
 URL: https://issues.apache.org/jira/browse/HIVE-1776
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: HIVE-1776.1.patch


 A query (that i can't reproduce verbatim) submits a job to a MR cluster with 
 a plan file that is resident on  the local file system. This job obviously 
 fails.
 This seems to result from an interaction between the parallel execution 
 (which is trying to run one local and one remote job at the same time). 
 Turning off either the parallel execution mode or the auto-local mode seems 
 to fix the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1776) parallel execution and auto-local mode combine to place plan file in wrong file system

2010-11-09 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930368#action_12930368
 ] 

Joydeep Sen Sarma commented on HIVE-1776:
-

yeah it was - but shoot - i forgot to take out the corresponding call in the 
finally block to restore the tracker. will upload new patch.

these are no longer necessary because we are using a cloned configuration 
object that is discarded once the task completes.

 parallel execution and auto-local mode combine to place plan file in wrong 
 file system
 --

 Key: HIVE-1776
 URL: https://issues.apache.org/jira/browse/HIVE-1776
 Project: Hive
  Issue Type: Bug
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: HIVE-1776.1.patch


 A query (that i can't reproduce verbatim) submits a job to a MR cluster with 
 a plan file that is resident on  the local file system. This job obviously 
 fails.
 This seems to result from an interaction between the parallel execution 
 (which is trying to run one local and one remote job at the same time). 
 Turning off either the parallel execution mode or the auto-local mode seems 
 to fix the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1779) Implement GenericUDF str_to_map

2010-11-09 Thread Siying Dong (JIRA)
Implement GenericUDF str_to_map
---

 Key: HIVE-1779
 URL: https://issues.apache.org/jira/browse/HIVE-1779
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor


People need way to load their data to map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1779) Implement GenericUDF str_to_map

2010-11-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1779:
--

Attachment: HIVE-1779.1.patch

 Implement GenericUDF str_to_map
 ---

 Key: HIVE-1779
 URL: https://issues.apache.org/jira/browse/HIVE-1779
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-1779.1.patch


 People need way to load their data to map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1779) Implement GenericUDF str_to_map

2010-11-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1779:
--

Status: Patch Available  (was: Open)

 Implement GenericUDF str_to_map
 ---

 Key: HIVE-1779
 URL: https://issues.apache.org/jira/browse/HIVE-1779
 Project: Hive
  Issue Type: New Feature
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-1779.1.patch


 People need way to load their data to map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join

2010-11-09 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930405#action_12930405
 ] 

He Yongqiang commented on HIVE-1754:


1. code style:

A new file always needs a Apache license header.

And for example:

{noformat}
public class PathUtil {
  public static String suffix=.hashtable;
  public static String generatePath(String baseURI,Byte tag,String 
bigBucketFileName){
String path = new 
String(baseURI+Path.SEPARATOR+-+tag+-+bigBucketFileName+suffix);
return path;
  }
  public static String generateFileName(Byte tag,String bigBucketFileName){
String fileName = new String(-+tag+-+bigBucketFileName+suffix);
return fileName;
  }

  public static String generateTmpURI(String baseURI,String id){
String tmpFileURI = new String(baseURI+Path.SEPARATOR+HashTable-+id);
return tmpFileURI;
  }
}
{noformat}

Should be formated to :

{noformat}
/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.hive.ql.util;

import org.apache.hadoop.fs.Path;

public class PathUtil {

  public static String suffix = .hashtable;

  public static String generatePath(String baseURI, Byte tag,
  String bigBucketFileName) {
String path = new String(baseURI + Path.SEPARATOR + - + tag + -
+ bigBucketFileName + suffix);
return path;
  }

  public static String generateFileName(Byte tag, String bigBucketFileName) {
String fileName = new String(- + tag + - + bigBucketFileName + suffix);
return fileName;
  }

  public static String generateTmpURI(String baseURI, String id) {
String tmpFileURI = new String(baseURI + Path.SEPARATOR + HashTable- + 
id);
return tmpFileURI;
  }
}
{noformat}

2.
Let's put PathUtil.java and TimeUtil.java into a HiveUtil class, like Utilities 
in exec (or create a new one and put in exec package or common package). 

3.
In ExecDriver.java

-//Qualify the path against the filesystem. The user configured path might 
contain default port which is skipped
-//in the file status. This makes sure that all paths which goes into 
PathToPartitionInfo are always listed status
-//filepath.
-newPath = fs.makeQualified(newPath);

why these code are removed? They should be there.

4.
revert the changes in ExecMapper. keep it clean. 

5.
code style in HashTableDummyOperator. add a default serialize id. do not use 2 
blank lines inside a method. keep at least one blank line between 2 method 
definitons.

6.
remove some never read vars from HashTableSinkOperator.
{noformat}
  protected transient
  MapByte, ListObjectInspector rowContainerStandardObjectInspectors;
{noformat}
should be in one line.

generateMapMetaData(); can be put into init(). MapJoinRowContainer res = null; 
should be parameterized.
int bucketSize = HiveConf.getIntVar(hconf, 
HiveConf.ConfVars.HIVEMAPJOINBUCKETCACHESIZE); should be put into init().
bucketSize can be a class field.
res.add(value); is duplicate in if () {} else {}. Put it after the if else.

In close(), if the abort is true, do we need to do the dump?

{noformat}
  String bigBucketFileName = 
this.getExecContext().getCurrentBigBucketFile();
  if(bigBucketFileName == null ||bigBucketFileName.length()==0) {
bigBucketFileName=-;
  }
{noformat}

I guess if we run it locally, the bigBucketFileName is always null. Is that 
true. If yes, how does this patch handle the bucket map join?

7.
revert changes of MapRedTask

8.
AbstractRowContainer/MapJoinDoubleKeys/MapJoinRowContainer/MapJoinSingleKey 
misses the apache header.

Please make sure cleaning up the code.


 Remove JDBM component from Map Join
 ---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1754.patch, Hive-1754_2.patch, Hive-1754_3.patch, 
 hive-1754_4.patch, hive-1754_5.patch


 Right now, JDBM is the major performance 

[jira] Created: (HIVE-1780) Typo in hive-default.xml

2010-11-09 Thread YoungWoo Kim (JIRA)
Typo in hive-default.xml


 Key: HIVE-1780
 URL: https://issues.apache.org/jira/browse/HIVE-1780
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Reporter: YoungWoo Kim
Priority: Trivial
 Fix For: 0.7.0


'CombineHiveInputFormat' is spelt incorrectly in the hive-default.xml:

It should be 'CombineHiveInputFormat' instead of  'CombinedHiveInputFormat'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1780) Typo in hive-default.xml

2010-11-09 Thread YoungWoo Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YoungWoo Kim updated HIVE-1780:
---

Attachment: HIVE-1780.patch

A patch for fixing tyops

 Typo in hive-default.xml
 

 Key: HIVE-1780
 URL: https://issues.apache.org/jira/browse/HIVE-1780
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Reporter: YoungWoo Kim
Priority: Trivial
 Fix For: 0.7.0

 Attachments: HIVE-1780.patch


 'CombineHiveInputFormat' is spelt incorrectly in the hive-default.xml:
 It should be 'CombineHiveInputFormat' instead of  'CombinedHiveInputFormat'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Confusing in hive-default.xml

2010-11-09 Thread 김영우
Hi,

I filed a jira for this and attached a patch.

https://issues.apache.org/jira/browse/HIVE-1780

- Youngwoo

2010/11/10 Namit Jain nj...@facebook.com

 It is a typo.
 Can you file a jira and fix it ?


 Thanks,
 -namit


 -Original Message-
 From: 김영우 [mailto:warwit...@gmail.com]
 Sent: Tuesday, November 09, 2010 3:12 AM
 To: dev@hive.apache.org
 Subject: Confusing in hive-default.xml

 Hi,

 Myabe it is a typo but i'm not sure.

 Excerpt from hive-default.xml (trunk):
 property
  namehive.input.format/name
  valueorg.apache.hadoop.hive.ql.io.HiveInputFormat/value
  descriptionThe default input format, if it is not specified, the system
 assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
 whereas it is set to CombinedHiveInputFormat for hadoop 20. The user can
 always overwrite it - if there is a bug in CombinedHiveInputFormat, it can
 always be manually set to HiveInputFormat. /description
 /property

 The 'CombinedHiveInputFormat' does not exist. It should be
 'CombineHiveInputFormat' so the property's value is
 'org.apache.hadoop.hive.ql.io.CombineHiveInputFormat'.

 I don't know whether it's a intention or typo.

 Thanks.

 - Youngwoo



[jira] Updated: (HIVE-1696) Add delegation token support to metastore

2010-11-09 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1696:
-

Component/s: Server Infrastructure
 Security

 Add delegation token support to metastore
 -

 Key: HIVE-1696
 URL: https://issues.apache.org/jira/browse/HIVE-1696
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Security, Server Infrastructure
Reporter: Todd Lipcon

 As discussed in HIVE-842, kerberos authentication is only sufficient for 
 authentication of a hive user client to the metastore. There are other cases 
 where thrift calls need to be authenticated when the caller is running in an 
 environment without kerberos credentials. For example, an MR task running as 
 part of a hive job may want to report statistics to the metastore, or a job 
 may be running within the context of Oozie or Hive Server.
 This JIRA is to implement support of delegation tokens for the metastore. The 
 concept of a delegation token is borrowed from the Hadoop security design - 
 the quick summary is that a kerberos-authenticated client may retrieve a 
 binary token from the server. This token can then be passed to other clients 
 which can use it to achieve authentication as the original user in lieu of a 
 kerberos ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1712) Migrating metadata from derby to mysql thrown NullPointerException

2010-11-09 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930443#action_12930443
 ] 

Paul Yang commented on HIVE-1712:
-

+1 looks good, will test/commit

 Migrating metadata from derby to mysql thrown NullPointerException
 --

 Key: HIVE-1712
 URL: https://issues.apache.org/jira/browse/HIVE-1712
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.6.0
Reporter: Jake Farrell
 Fix For: 0.7.0

 Attachments: hive-1712.patch, hive-1712_rebase.patch


 Exported derby data to csv, loaded data into mysql and ran hive query which 
 worked in derby and got the following exception
 2010-10-16 08:57:29,080 INFO  metastore.ObjectStore 
 (ObjectStore.java:setConf(106)) - Initialized ObjectStore
 2010-10-16 08:57:29,552 INFO  metastore.HiveMetaStore 
 (HiveMetaStore.java:logStartFunction(171)) - 0: get_table : db=default 
 tbl=testimport
 2010-10-16 08:57:30,140 ERROR metadata.Hive (Hive.java:getTable(395)) - 
 java.lang.NullPointerException
 at java.util.Hashtable.put(Hashtable.java:394)
 at java.util.Hashtable.putAll(Hashtable.java:466)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:520)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:489)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:381)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:683)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5200)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1743) Group-by to determine equals of Keys in reverse order

2010-11-09 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1743:
--

Status: Patch Available  (was: Open)

 Group-by to determine equals of Keys in reverse order
 -

 Key: HIVE-1743
 URL: https://issues.apache.org/jira/browse/HIVE-1743
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-1743.1.patch


 When processing group-by, in reduce side, keys are ordered. Comparing 
 equality of two keys can be more efficient in reverse order.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES

2010-11-09 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1746:
---

Attachment: HIVE-1746.3.patch

 Support for using ALTER to set IDXPROPERTIES
 

 Key: HIVE-1746
 URL: https://issues.apache.org/jira/browse/HIVE-1746
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: Marquis Wang
Assignee: Marquis Wang
 Fix For: 0.7.0

 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch


 Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to 
 support ALTERing those properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES

2010-11-09 Thread Marquis Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930483#action_12930483
 ] 

Marquis Wang commented on HIVE-1746:


New patch.

Eliminates println calls, adds private updateModifiedParameters method, and 
pass the database name into AlterIndexDesc. Otherwise the same.

 Support for using ALTER to set IDXPROPERTIES
 

 Key: HIVE-1746
 URL: https://issues.apache.org/jira/browse/HIVE-1746
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: Marquis Wang
Assignee: Marquis Wang
 Fix For: 0.7.0

 Attachments: 1746.prelim.patch, HIVE-1746.2.patch, HIVE-1746.3.patch


 Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to 
 support ALTERing those properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1496) enhance CREATE INDEX to support immediate index build

2010-11-09 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang reassigned HIVE-1496:
--

Assignee: Marquis Wang  (was: Russell Melick)

 enhance CREATE INDEX to support immediate index build
 -

 Key: HIVE-1496
 URL: https://issues.apache.org/jira/browse/HIVE-1496
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Marquis Wang
 Fix For: 0.7.0


 Currently we only support WITH DEFERRED REBUILD.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-11-09 Thread Skye Berghel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Skye Berghel updated HIVE-1501:
---

Status: Patch Available  (was: Open)

 when generating reentrant INSERT for index rebuild, quote identifiers using 
 backticks
 -

 Key: HIVE-1501
 URL: https://issues.apache.org/jira/browse/HIVE-1501
 Project: Hive
  Issue Type: Bug
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Skye Berghel
 Fix For: 0.7.0

 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, 
 HIVE-1501.4.patch, HIVE-1501.5.patch, HIVE-1501.6.patch


 Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
 accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
 to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-11-09 Thread Skye Berghel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Skye Berghel updated HIVE-1501:
---

Attachment: HIVE-1501.6.patch

Added a new patch with formatting fixes.

 when generating reentrant INSERT for index rebuild, quote identifiers using 
 backticks
 -

 Key: HIVE-1501
 URL: https://issues.apache.org/jira/browse/HIVE-1501
 Project: Hive
  Issue Type: Bug
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Skye Berghel
 Fix For: 0.7.0

 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, 
 HIVE-1501.4.patch, HIVE-1501.5.patch, HIVE-1501.6.patch


 Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
 accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
 to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.