[jira] [Updated] (HIVE-4502) NPE - subquery smb joins fails

2013-05-13 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4502:
-

Attachment: HIVE-4502-1.patch

Hi [~navis]

My attached patch actually retains the SMB join and I feel it is a better plan 
over-all than converting all of the joins to reduce side joins. It would be 
great if you could take a look and let me know your opinion. All existing unit 
tests pass with this patch.

Thanks
Vikram.

 NPE - subquery smb joins fails
 --

 Key: HIVE-4502
 URL: https://issues.apache.org/jira/browse/HIVE-4502
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Navis
 Attachments: HIVE-4502-1.patch, HIVE-4502.D10695.1.patch, 
 smb_mapjoin_25.q


 Found this issue while running some SMB joins. Attaching test case that 
 causes this error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties

2013-05-13 Thread Jin Adachi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655815#comment-13655815
 ] 

Jin Adachi commented on HIVE-2564:
--

Hope to resolve this issue soon, me too.
If this patch is too big, I'd like to send new small patch.

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-4546:
-

 Summary: Hive CLI leaves behind the per session resource directory 
on non-interactive invocation
 Key: HIVE-4546
 URL: https://issues.apache.org/jira/browse/HIVE-4546
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar


As part of HIVE-4505, the resource directory is set to 
/tmp/${hive.session.id}_resources and suppose to be removed at the end. The CLI 
fails to remove it when invoked using -f or -e (non-interactive mode)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4546:
--

Attachment: HIVE-4546-1.patch

 Hive CLI leaves behind the per session resource directory on non-interactive 
 invocation
 ---

 Key: HIVE-4546
 URL: https://issues.apache.org/jira/browse/HIVE-4546
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4546-1.patch


 As part of HIVE-4505, the resource directory is set to 
 /tmp/${hive.session.id}_resources and suppose to be removed at the end. The 
 CLI fails to remove it when invoked using -f or -e (non-interactive mode)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-4546: Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11083/
---

Review request for hive, Owen O'Malley and Gunther Hagleitner.


Description
---

Hive CLI leaves behind the per session resource directory on non-interactive 
invocation. The patch includes executing session state close() at the end of 
non-interactive invocation.
Also changed the session id format to be a UUID. This is avoid possible 
resource directory path conflict when there are multiple session HiveServer2 
from same user at same time.


This addresses bug HIVE-4546.
https://issues.apache.org/jira/browse/HIVE-4546


Diffs
-

  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 4239392 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8e6e24a 

Diff: https://reviews.apache.org/r/11083/diff/


Testing
---


Thanks,

Prasad Mujumdar



[jira] [Commented] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655853#comment-13655853
 ] 

Prasad Mujumdar commented on HIVE-4546:
---

Review request on https://reviews.apache.org/r/11083/

 Hive CLI leaves behind the per session resource directory on non-interactive 
 invocation
 ---

 Key: HIVE-4546
 URL: https://issues.apache.org/jira/browse/HIVE-4546
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4546-1.patch


 As part of HIVE-4505, the resource directory is set to 
 /tmp/${hive.session.id}_resources and suppose to be removed at the end. The 
 CLI fails to remove it when invoked using -f or -e (non-interactive mode)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4547) A complex create view statement fails with new Antlr 3.4

2013-05-13 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-4547:
-

 Summary: A complex create view statement fails with new Antlr 3.4
 Key: HIVE-4547
 URL: https://issues.apache.org/jira/browse/HIVE-4547
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0


A complex create view statement with CAST in join condition fails with 
IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade 
(HIVE-2439). The same statement works fine with Hive 0.9


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4

2013-05-13 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4547:
--

Attachment: HIVE-4547-repro.tar

Attached repro script

 A complex create view statement fails with new Antlr 3.4
 

 Key: HIVE-4547
 URL: https://issues.apache.org/jira/browse/HIVE-4547
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4547-repro.tar


 A complex create view statement with CAST in join condition fails with 
 IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade 
 (HIVE-2439). The same statement works fine with Hive 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-4547: A complex create view statement fails with new Antlr 3.4

2013-05-13 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11084/
---

Review request for hive and Ashutosh Chauhan.


Description
---

The parser has a translation map where its possible to replace all the text 
with the appropriate escaped version in case of a view creation. This holds all 
individual translations and where they apply in the view definition.
The newer antlr version seems to be more restrictive and throws assertion if 
there's an overlaps in these escape positions. The original patch for antlr 
upgrade added a check to take care of some of the simpler overlap cases found 
by unit tests. There are few more scenarios like the one in the customer case 
which are not covered.
The patch includes Traverse the list of translation in a loop and look for all 
the possible overlaps.


This addresses bug HIVE-4547.
https://issues.apache.org/jira/browse/HIVE-4547


Diffs
-

  data/files/v1.txt PRE-CREATION 
  data/files/v2.txt PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java ec2c088 
  ql/src/test/queries/clientpositive/view_cast.q PRE-CREATION 
  ql/src/test/results/clientpositive/view_cast.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/11084/diff/


Testing
---

Ran full test suite. 
Added new test.


Thanks,

Prasad Mujumdar



[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4

2013-05-13 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4547:
--

Attachment: HIVE-4547-1.patch

 A complex create view statement fails with new Antlr 3.4
 

 Key: HIVE-4547
 URL: https://issues.apache.org/jira/browse/HIVE-4547
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4547-1.patch, HIVE-4547-repro.tar


 A complex create view statement with CAST in join condition fails with 
 IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade 
 (HIVE-2439). The same statement works fine with Hive 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4547) A complex create view statement fails with new Antlr 3.4

2013-05-13 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4547:
--

Status: Patch Available  (was: Open)

Review request on https://reviews.apache.org/r/11084/

 A complex create view statement fails with new Antlr 3.4
 

 Key: HIVE-4547
 URL: https://issues.apache.org/jira/browse/HIVE-4547
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4547-1.patch, HIVE-4547-repro.tar


 A complex create view statement with CAST in join condition fails with 
 IllegalArgumentException error. This is exposed by the Antlr 3.4 upgrade 
 (HIVE-2439). The same statement works fine with Hive 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4546) Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4546:
--

Status: Patch Available  (was: Open)

Review request on https://reviews.apache.org/r/11083/

 Hive CLI leaves behind the per session resource directory on non-interactive 
 invocation
 ---

 Key: HIVE-4546
 URL: https://issues.apache.org/jira/browse/HIVE-4546
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4546-1.patch


 As part of HIVE-4505, the resource directory is set to 
 /tmp/${hive.session.id}_resources and suppose to be removed at the end. The 
 CLI fails to remove it when invoked using -f or -e (non-interactive mode)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Request for subscribing to mailing list

2013-05-13 Thread nabhajit ray
Hi,

Please add me to the mailing list

Thanks,
Nabhajit Ray




Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #144

2013-05-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/144/

--
[...truncated 6524 lines...]
ivy-retrieve-hadoop-shim:
 [echo] Project: shims
[javac] Compiling 13 source files to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/shims/classes
[javac] Note: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
 uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [echo] Building shims 0.23

build-shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.23/java
 against hadoop 2.0.0-alpha 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/hadoopcore/hadoop-2.0.0-alpha)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/ivy/ivysettings.xml
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.0.0-alpha/hadoop-common-2.0.0-alpha-tests.jar
 ...
[ivy:resolve] 
...
 (1073kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-common;2.0.0-alpha!hadoop-common.jar(tests) (246ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.0.0-alpha/hadoop-common-2.0.0-alpha.jar
 ...
[ivy:resolve] 
.
 (2051kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-common;2.0.0-alpha!hadoop-common.jar (292ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.0.0-alpha/hadoop-mapreduce-client-core-2.0.0-alpha.jar
 ...
[ivy:resolve] 
..
 (1314kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-mapreduce-client-core;2.0.0-alpha!hadoop-mapreduce-client-core.jar
 (218ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-archives/2.0.0-alpha/hadoop-archives-2.0.0-alpha.jar
 ...
[ivy:resolve] ... (20kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-archives;2.0.0-alpha!hadoop-archives.jar (124ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.0.0-alpha/hadoop-hdfs-2.0.0-alpha.jar
 ...
[ivy:resolve] 
...
 (3790kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-hdfs;2.0.0-alpha!hadoop-hdfs.jar (260ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.0.0-alpha/hadoop-hdfs-2.0.0-alpha-tests.jar
 ...
[ivy:resolve] 
.
 (1365kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-hdfs;2.0.0-alpha!hadoop-hdfs.jar(tests) (214ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.0.0-alpha/hadoop-mapreduce-client-jobclient-2.0.0-alpha.jar
 ...
[ivy:resolve]  (33kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.0.0-alpha!hadoop-mapreduce-client-jobclient.jar
 (115ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.0.0-alpha/hadoop-mapreduce-client-jobclient-2.0.0-alpha-tests.jar
 ...
[ivy:resolve] 

Hive-trunk-h0.21 - Build # 2101 - Still Failing

2013-05-13 Thread Apache Jenkins Server
Changes for Build #2074
[namit] HIVE-4371 some issue with merging join trees
(Navis via namit)

[hashutosh] HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani 
via Ashutosh Chauhan)

[namit] HIVE-4342 NPE for query involving UNION ALL with nested JOIN and UNION 
ALL
(Navis via namit)

[hashutosh] HIVE-4364 : beeline always exits with 0 status, should exit with 
non-zero status on error (Rob Weltman via Ashutosh Chauhan)

[hashutosh] HIVE-4130 : Bring the Lead/Lag UDFs interface in line with Lead/Lag 
UDAFs (Harish Butani via Ashutosh Chauhan)


Changes for Build #2075
[hashutosh] HIVE-2379 : Hive/HBase integration could be improved (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-4295 : Lateral view makes invalid result if CP is disabled 
(Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4365 : wrong result in left semi join (Navis via Ashutosh 
Chauhan)

[hashutosh] HIVE-3861 : Upgrade hbase dependency to 0.94 (Gunther Hagleitner 
via Ashutosh Chauhan)


Changes for Build #2076
[hashutosh] HIVE-3891 : physical optimizer changes for auto sort-merge join 
(Namit Jain via Ashutosh Chauhan)

[namit] HIVE-4393 Make the deleteData flag accessable from DropTable/Partition 
events
(Morgan Philips via namit)

[hashutosh] HIVE-4394 : test leadlag.q fails (Ashutosh Chauhan)

[namit] HIVE-4018 MapJoin failing with Distributed Cache error
(Amareshwari Sriramadasu via Namit Jain)


Changes for Build #2077
[namit] HIVE-4300 ant thriftif generated code that is checkedin is not 
up-to-date
(Roshan Naik via namit)


Changes for Build #2078
[namit] HIVE-4409 Prevent incompatible column type changes
(Dilip Joseph via namit)

[namit] HIVE-4095 Add exchange partition in Hive
(Dheeraj Kumar Singh via namit)

[namit] HIVE-4005 Column truncation
(Kevin Wilfong via namit)

[namit] HIVE-3952 merge map-job followed by map-reduce job
(Vinod Kumar Vavilapalli via namit)

[hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. 
(Navis via Ashutosh Chauhan)

[khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config 
directory (Thejas M Nair via Sushanth Sowmyan)

[namit] HIVE-4181 Star argument without table alias for UDTF is not working
(Navis via namit)

[hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails 
because of null case difference (Thejas Nair via Ashutosh Chauhan)

[hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via 
Ashutosh Chauhan)


Changes for Build #2079
[namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409
(Namit Jain)

[hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer 
valid (Harish Butani via Ashutosh Chauhan)


Changes for Build #2080
[navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is 
not estimated correctly (Navis)

[khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via 
Sushanth Sowmyan)

[hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration 
(Billie Rinaldi via Ashutosh Chauhan)


Changes for Build #2081

Changes for Build #2082
[hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh 
Chauhan)

[hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when 
originating session is closed (Ashish Vaidya via Ashutosh Chauhan)

[hashutosh] HIVE-4019 : Ability to create and drop temporary partition function 
(Brock Noland via Ashutosh Chauhan)


Changes for Build #2083
[navis] HIVE-4437 Missing file on HIVE-4068 (Navis)


Changes for Build #2084

Changes for Build #2085

Changes for Build #2086
[hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via 
Ashutosh Chauhan)

[hashutosh] HIVE-4439 : Remove unused join configuration parameter: 
hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4438 : Remove unused join configuration parameter: 
hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-3682 : when output hive table to file,users should could have 
a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan)

[hashutosh] HIVE-4373 : Hive Version returned by 
HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via 
Ashutosh Chauhan)


Changes for Build #2087

Changes for Build #2088
[gates] HIVE-4465 webhcat e2e tests succeed regardless of exitvalue


Changes for Build #2089
[cws] HIVE-3957. Add pseudo-BNF grammar for RCFile to Javadoc (Mark Grover via 
cws)

[cws] HIVE-4497. beeline module tests don't get run by default (Thejas Nair via 
cws)

[gangtimliu] HIVE-4474: Column access not tracked properly for partitioned 
tables. Samuel Yuan via Gang Tim Liu

[hashutosh] HIVE-4455 : HCatalog build directories get included in tar file 
produced by ant tar (Alan Gates via Ashutosh Chauhan)


Changes for Build #2090

Changes for Build #2091
[hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit 
aggregate functions with star 

Re: Review Request: HIVE-4546: Hive CLI leaves behind the per session resource directory on non-interactive invocation

2013-05-13 Thread Owen O'Malley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11083/#review20491
---


I think we should refactor CliDriver.run into the setup and the part that 
actually runs the commands. If we pull out from the part where the cli object 
is created down, we can isolate all of the multiple exits to that routine and 
make the ss.close handling more future proof.

In terms of changing the session id to a uuid, I think that it is better to 
have a human readable string than a random identifier. Since the current 
session id will be unique upto a process, maybe we could add a static counter 
that keeps track of how many session ids this process has created and add that 
as a suffix.

- Owen O'Malley


On May 13, 2013, 8:50 a.m., Prasad Mujumdar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11083/
 ---
 
 (Updated May 13, 2013, 8:50 a.m.)
 
 
 Review request for hive, Owen O'Malley and Gunther Hagleitner.
 
 
 Description
 ---
 
 Hive CLI leaves behind the per session resource directory on non-interactive 
 invocation. The patch includes executing session state close() at the end of 
 non-interactive invocation.
 Also changed the session id format to be a UUID. This is avoid possible 
 resource directory path conflict when there are multiple session HiveServer2 
 from same user at same time.
 
 
 This addresses bug HIVE-4546.
 https://issues.apache.org/jira/browse/HIVE-4546
 
 
 Diffs
 -
 
   cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 4239392 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8e6e24a 
 
 Diff: https://reviews.apache.org/r/11083/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prasad Mujumdar
 




[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656141#comment-13656141
 ] 

Mikhail Bautin commented on HIVE-4525:
--

Correction to the design of this feature (I can't edit comments because of 
permissions, so adding another comment). In case the seconds field needs more 
than 31 bit, the first VInt is {{-1-reversedDecimal}} regardless of whether 
{{reversedDecimal}} is zero or not.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2

2013-05-13 Thread Owen O'Malley
On Saturday, I didn't include the Maven staging urls:

Hive: https://repository.apache.org/content/repositories/orgapachehive-013/
HCatalog:
https://repository.apache.org/content/repositories/orgapachehcatalog-014/

Thanks,
   Owen


On Sat, May 11, 2013 at 10:33 AM, Owen O'Malley omal...@apache.org wrote:

 Based on feedback from everyone, I have respun release candidate, RC2.
 Please take a look. We've fixed 7 problems with the previous RC:
 * Release notes were incorrect
  * HIVE-4018 - MapJoin failing with Distributed Cache error
  * HIVE-4421 - Improve memory usage by ORC dictionaries
  * HIVE-4500 - Ensure that HiveServer 2 closes log files.
  * HIVE-4494 - ORC map columns get class cast exception in some contexts
  * HIVE-4498 - Fix TestBeeLineWithArgs failure
  * HIVE-4505 - Hive can't load transforms with remote scripts
  * HIVE-4527 - Fix the eclipse template

 Source tag for RC2 is at:

 https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2


 Source tar ball and convenience binary artifacts can be found
 at: http://people.apache.org/~omalley/hive-0.11.0rc2/

 This release has many goodies including HiveServer2, integrated
 hcatalog, windowing and analytical functions, decimal data type,
 better query planning, performance enhancements and various bug fixes.
 In total, we resolved more than 350 issues. Full list of fixed issues
 can be found at:  http://s.apache.org/8Fr


 Voting will conclude in 72 hours.

 Hive PMC Members: Please test and vote.

 Thanks,

 Owen



[jira] [Created] (HIVE-4548) Speed up vectorized LIKE filter for special cases abc%, %abc and %abc%

2013-05-13 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-4548:
-

 Summary: Speed up vectorized LIKE filter for special cases abc%, 
%abc and %abc%
 Key: HIVE-4548
 URL: https://issues.apache.org/jira/browse/HIVE-4548
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Teddy Choi
Priority: Minor
 Fix For: vectorization-branch


Speed up vectorized LIKE filter evaluation for abc%, %abc, and %abc% pattern 
special cases (here, abc is just a place holder for some fixed string).  
  
Problem: The current vectorized LIKE implementation always calls the standard 
LIKE function code in UDFLike.java. But this is pretty expensive. It calls 
multiple functions and allocates at least one new object per call. Probably 80% 
of uses of LIKE are for the simple patterns abc%, %abc, and %abc%.  These can 
be implemented much more efficiently.

Start by speeding up the case for  

Column LIKE abc%
  
The goal would be to minimize expense in the inner loop. Don't use new() in the 
inner loop, and write a static function that checks the prefix of the string 
matches the like pattern as efficiently as possible, operating directly on the 
byte array holding UTF-8-encoded string data, and avoiding unnecessary 
additional function calls and if/else logic. Call that in the inner loop.

If feasible, consider using a template-driven approach, with an instance of the 
template expanded for each of the three cases. Start doing the abc% (prefix 
match) by hand, then consider templatizing for the other two cases.

The code is in the vectorization branch of the main hive repo.
  
Start by checking in the constructor for FilterStringColLikeStringScalar.java 
if the pattern is one of the simple special cases. If so, record that, and have 
the evaluate() method call a special-case function for each case, i.e. the 
general case, and each of the 3 special cases. All the dynamic decision-making 
would be done once per vector, not once per element.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 195 - Still Failing

2013-05-13 Thread Apache Jenkins Server
Changes for Build #169
[hashutosh] HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani 
via Ashutosh Chauhan)

[namit] HIVE-4342 NPE for query involving UNION ALL with nested JOIN and UNION 
ALL
(Navis via namit)

[hashutosh] HIVE-4364 : beeline always exits with 0 status, should exit with 
non-zero status on error (Rob Weltman via Ashutosh Chauhan)

[hashutosh] HIVE-4130 : Bring the Lead/Lag UDFs interface in line with Lead/Lag 
UDAFs (Harish Butani via Ashutosh Chauhan)


Changes for Build #170
[hashutosh] HIVE-4295 : Lateral view makes invalid result if CP is disabled 
(Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4365 : wrong result in left semi join (Navis via Ashutosh 
Chauhan)

[hashutosh] HIVE-3861 : Upgrade hbase dependency to 0.94 (Gunther Hagleitner 
via Ashutosh Chauhan)

[namit] HIVE-4371 some issue with merging join trees
(Navis via namit)


Changes for Build #171
[hashutosh] HIVE-2379 : Hive/HBase integration could be improved (Navis via 
Ashutosh Chauhan)


Changes for Build #172
[hashutosh] HIVE-4394 : test leadlag.q fails (Ashutosh Chauhan)

[namit] HIVE-4018 MapJoin failing with Distributed Cache error
(Amareshwari Sriramadasu via Namit Jain)


Changes for Build #173
[namit] HIVE-4300 ant thriftif generated code that is checkedin is not 
up-to-date
(Roshan Naik via namit)

[hashutosh] HIVE-3891 : physical optimizer changes for auto sort-merge join 
(Namit Jain via Ashutosh Chauhan)

[namit] HIVE-4393 Make the deleteData flag accessable from DropTable/Partition 
events
(Morgan Philips via namit)


Changes for Build #174
[khorgath] HIVE-4419 : webhcat - support ${WEBHCAT_PREFIX}/conf/ as config 
directory (Thejas M Nair via Sushanth Sowmyan)

[namit] HIVE-4181 Star argument without table alias for UDTF is not working
(Navis via namit)

[hashutosh] HIVE-4407 : TestHCatStorer.testStoreFuncAllSimpleTypes fails 
because of null case difference (Thejas Nair via Ashutosh Chauhan)

[hashutosh] HIVE-4369 : Many new failures on hadoop 2 (Vikram Dixit via 
Ashutosh Chauhan)


Changes for Build #175
[hashutosh] HIVE-4358 : Check for Map side processing in PTFOp is no longer 
valid (Harish Butani via Ashutosh Chauhan)

[namit] HIVE-4409 Prevent incompatible column type changes
(Dilip Joseph via namit)

[namit] HIVE-4095 Add exchange partition in Hive
(Dheeraj Kumar Singh via namit)

[namit] HIVE-4005 Column truncation
(Kevin Wilfong via namit)

[namit] HIVE-3952 merge map-job followed by map-reduce job
(Vinod Kumar Vavilapalli via namit)

[hashutosh] HIVE-4412 : PTFDesc tries serialize transient fields like OIs, etc. 
(Navis via Ashutosh Chauhan)


Changes for Build #176
[hashutosh] HIVE-3708 : Add mapreduce workflow information to job configuration 
(Billie Rinaldi via Ashutosh Chauhan)

[namit] HIVE-4424 MetaStoreUtils.java.orig checked in mistakenly by HIVE-4409
(Namit Jain)


Changes for Build #177
[navis] HIVE-4068 Size of aggregation buffer which uses non-primitive type is 
not estimated correctly (Navis)

[khorgath] HIVE-4420 : HCatalog unit tests stop after a failure (Alan Gates via 
Sushanth Sowmyan)


Changes for Build #178

Changes for Build #179
[hashutosh] HIVE-4423 : Improve RCFile::sync(long) 10x (Gopal V via Ashutosh 
Chauhan)

[hashutosh] HIVE-4398 : HS2 Resource leak: operation handles not cleaned when 
originating session is closed (Ashish Vaidya via Ashutosh Chauhan)

[hashutosh] HIVE-4019 : Ability to create and drop temporary partition function 
(Brock Noland via Ashutosh Chauhan)


Changes for Build #180
[navis] HIVE-4437 Missing file on HIVE-4068 (Navis)


Changes for Build #181

Changes for Build #182

Changes for Build #183
[hashutosh] HIVE-4350 : support AS keyword for table alias (Matthew Weaver via 
Ashutosh Chauhan)

[hashutosh] HIVE-4439 : Remove unused join configuration parameter: 
hive.mapjoin.cache.numrows (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4438 : Remove unused join configuration parameter: 
hive.mapjoin.size.key (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-3682 : when output hive table to file,users should could have 
a separator of their own choice (Sushanth Sowmyan via Ashutosh Chauhan)

[hashutosh] HIVE-4373 : Hive Version returned by 
HiveDatabaseMetaData.getDatabaseProductVersion is incorrect (Thejas Nair via 
Ashutosh Chauhan)


Changes for Build #184

Changes for Build #185

Changes for Build #186

Changes for Build #187

Changes for Build #188
[hashutosh] HIVE-4466 : Fix continue.on.failure in unit tests to -well- 
continue on failure in unit tests (Gunther Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4471 : Build fails with hcatalog checkstyle error (Gunther 
Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-4392 : Illogical InvalidObjectException throwed when use mulit 
aggregate functions with star columns  (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-4421 : Improve memory usage by ORC dictionaries (Owen Omalley 
via Ashutosh Chauhan)

[mithun] HCATALOG-627 - Adding thread-safety to 

[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656226#comment-13656226
 ] 

Eric Hanson commented on HIVE-4525:
---

For vectorized query execution (HIVE-4160), we are going to represent a 
timestamp value internally as a vector of 64 bit integers representing the 
number of nanos since the epoch (in 1970). Given your proposal to also support 
time values before 1970, I'd propose that for vectorized QE we extend this so a 
negative number of nanos is used to represent a value before 1970. This gives a 
range of 292 years before or after 1970, good enough for practical purposes. 
Data outside that range might first not be supported for vectorized QE, and 
then later might be supported but revert to a slower code path.

We may want to consider that the storage layer (say ORC) store timestamps 
simply as a long, so it is not as expensive to flow this data into vectorized 
query execution. With compression, these long values will compress pretty well, 
so the storage layout becomes less of a concern and query execution speed 
becomes the more pressing issue.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4160) Vectorized Query Execution in Hive

2013-05-13 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4160:
--

Attachment: Hive-Vectorized-Query-Execution-Design-rev7.docx

Added discussion of timestamp values before the epoch (in 1970) related to 
HIVE-4525.

 Vectorized Query Execution in Hive
 --

 Key: HIVE-4160
 URL: https://issues.apache.org/jira/browse/HIVE-4160
 Project: Hive
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
 Hive-Vectorized-Query-Execution-Design-rev2.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.docx, 
 Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev4.docx, 
 Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev5.docx, 
 Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev6.docx, 
 Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
 Hive-Vectorized-Query-Execution-Design-rev7.docx


 The Hive query execution engine currently processes one row at a time. A 
 single row of data goes through all the operators before the next row can be 
 processed. This mode of processing is very inefficient in terms of CPU usage. 
 Research has demonstrated that this yields very low instructions per cycle 
 [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
 and data columns go through a layer of object inspectors that identify column 
 type, deserialize data and determine appropriate expression routines in the 
 inner loop. These layers of virtual method calls further slow down the 
 processing. 
 This work will add support for vectorized query execution to Hive, where, 
 instead of individual rows, batches of about a thousand rows at a time are 
 processed. Each column in the batch is represented as a vector of a primitive 
 data type. The inner loop of execution scans these vectors very fast, 
 avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
 substantially reduces CPU time used, and gives excellent instructions per 
 cycle (i.e. improved processor pipeline utilization). See the attached design 
 specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656271#comment-13656271
 ] 

Mikhail Bautin commented on HIVE-4525:
--

[~ehans]: switching to long nanosecond timestamps would definitely be a much 
nicer solution, but don't you think it would break backward-compatibility for 
timestamps serialized using the old format?

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4510) HS2 doesn't nest exceptions properly (fun debug times)

2013-05-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656302#comment-13656302
 ] 

Thejas M Nair commented on HIVE-4510:
-

I am running the full hive unit test suite on this patch. I will update when it 
is done.


 HS2 doesn't nest exceptions properly (fun debug times)
 --

 Key: HIVE-4510
 URL: https://issues.apache.org/jira/browse/HIVE-4510
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Gunther Hagleitner
Assignee: Thejas M Nair
 Attachments: HIVE-4510.1.patch, HIVE-4510.2.patch


 In SQLOperation.java lines 97 + 113 for instance, we catch errors and throw a 
 new HiveSQLException, but we don't wrap the original exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4540) JOIN-GRP BY-DISTINCT fails with NPE when mapjoin.mapreduce=true

2013-05-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4540:
-

Status: Patch Available  (was: Open)

 JOIN-GRP BY-DISTINCT fails with NPE when mapjoin.mapreduce=true
 ---

 Key: HIVE-4540
 URL: https://issues.apache.org/jira/browse/HIVE-4540
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4540.1.patch


 If the mapjoin.mapreduce optimization kicks in on a query of this form:
 {noformat}
 select count(distinct a.v) 
 from a join b on (a.k = b.k)
 group by a.g
 {noformat}
 The planer will NPE in the metadataonly optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-4513 - disable hivehistory logs by default

2013-05-13 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11029/
---

(Updated May 13, 2013, 8:13 p.m.)


Review request for hive.


Summary (updated)
-

HIVE-4513 - disable hivehistory logs by default


Description
---

HIVE-4513


This addresses bug HIVE-4513.
https://issues.apache.org/jira/browse/HIVE-4513


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 
  conf/hive-default.xml.template 3a7d1dc 
  data/conf/hive-site.xml 544ba35 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 
  ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 

Diff: https://reviews.apache.org/r/11029/diff/


Testing
---


Thanks,

Thejas Nair



[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-4.patch

Adding documentation.

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4549) JDBC compliance change

2013-05-13 Thread Johndee Burks (JIRA)
Johndee Burks created HIVE-4549:
---

 Summary: JDBC compliance change
 Key: HIVE-4549
 URL: https://issues.apache.org/jira/browse/HIVE-4549
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.10.0
 Environment: Hive 0.10
Reporter: Johndee Burks
Priority: Trivial


The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata 
columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second 
column name is not compliant with the JDBC standard 
(http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()):
 the column name should be TABLE_SCHEM instead of TABLE_SCHEMA.

Suggested fix in Hive 
(org.apache.hive.service.cli.operation.GetTablesOperation.java) change from

private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
.addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
.addStringColumn(TABLE_SCHEMA, Schema name.) 
.addStringColumn(TABLE_NAME, Table name.) 
.addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
etc.) 
.addStringColumn(REMARKS, Comments about the table.);

to

private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
.addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
.addStringColumn(TABLE_SCHEM, Schema name.) 
.addStringColumn(TABLE_NAME, Table name.) 
.addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
etc.) 
.addStringColumn(REMARKS, Comments about the table.);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4549) JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM

2013-05-13 Thread Johndee Burks (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johndee Burks updated HIVE-4549:


Summary: JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM  (was: JDBC 
compliance change)

 JDBC compliance change TABLE_SCHEMA to TABLE_SCHEM
 --

 Key: HIVE-4549
 URL: https://issues.apache.org/jira/browse/HIVE-4549
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.10.0
 Environment: Hive 0.10
Reporter: Johndee Burks
Priority: Trivial
  Labels: newbie

 The ResultSet returned by HiveDatabaseMetadata.getTables has the metadata 
 columns TABLE_CAT, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, REMARKS. The second 
 column name is not compliant with the JDBC standard 
 (http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html#getSchemas()):
  the column name should be TABLE_SCHEM instead of TABLE_SCHEMA.
 Suggested fix in Hive 
 (org.apache.hive.service.cli.operation.GetTablesOperation.java) change from
 private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
 .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
 .addStringColumn(TABLE_SCHEMA, Schema name.) 
 .addStringColumn(TABLE_NAME, Table name.) 
 .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
 etc.) 
 .addStringColumn(REMARKS, Comments about the table.);
 to
 private static final TableSchema RESULT_SET_SCHEMA = new TableSchema() 
 .addStringColumn(TABLE_CAT, Catalog name. NULL if not applicable.) 
 .addStringColumn(TABLE_SCHEM, Schema name.) 
 .addStringColumn(TABLE_NAME, Table name.) 
 .addStringColumn(TABLE_TYPE, The table type, e.g. \TABLE\, \VIEW\, 
 etc.) 
 .addStringColumn(REMARKS, Comments about the table.);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions

2013-05-13 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-4550:


 Summary: local_mapred_error_cache fails on some hadoop versions
 Key: HIVE-4550
 URL: https://issues.apache.org/jira/browse/HIVE-4550
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor


I've tested it manually on the upcoming 1.3 version (branch 1).

We do mask job_* ids, but not job_local* ids. The fix is to extend this to both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions

2013-05-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4550:
-

Attachment: HIVE-4550.1.patch

 local_mapred_error_cache fails on some hadoop versions
 --

 Key: HIVE-4550
 URL: https://issues.apache.org/jira/browse/HIVE-4550
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Attachments: HIVE-4550.1.patch


 I've tested it manually on the upcoming 1.3 version (branch 1).
 We do mask job_* ids, but not job_local* ids. The fix is to extend this to 
 both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int

2013-05-13 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-4551:
--

 Summary: ORC - HCatLoader integration has issues with 
smallint/tinyint promotions to Int
 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


This was initially reported from an e2e test run, with the following E2E test:

{code}
{
'name' = 'Hadoop_ORC_Write',
'tests' = [
{
 'num' = 1
,'hcat_prep'=q\
drop table if exists hadoop_orc;
create table hadoop_orc (
t tinyint,
si smallint,
i int,
b bigint,
f float,
d double,
s string)
stored as orc;\
,'hadoop' = q\
jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
:HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
,'result_table' = 'hadoop_orc'
,'sql' = q\select * from all100k;\
,'floatpostprocess' = 1
,'delimiter' = '   '
},
   ],
},
{code}

This fails with the following error:

{code}
2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
org.apache.hadoop.io.IntWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at 
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
at 
org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
... 12 more
2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int

2013-05-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656384#comment-13656384
 ] 

Sushanth Sowmyan commented on HIVE-4551:


The problem here is that the raw data encapsulated by HCatRecord and HCatSchema 
are out of synch, which was one of my worries back in HCATALOG-425 : 
https://issues.apache.org/jira/browse/HCATALOG-425?focusedCommentId=13439652page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13439652

Basically, the raw data contained in the smallint/tinyint columns are raw 
shorts and bytes, and we try to read it as an Int. In the case of rcfile, the 
underlying raw data is also stored as an IntWritable in the cases of smallint 
and tinyint, but not so in the case of orc. This leads to the following kind of 
calls in the rcfile case, and in the orc case:

RCFILE:
{noformat}
13/05/11 02:56:10 INFO mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe with properties 
{transient_lastDdlTime=1368266162, serialization.null.format=\N, 
columns=ti,si,i,bi,f,d,b, serialization.format=1, 
columns.types=int,int,int,bigint,float,double,boolean}
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:-3
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:9001
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:86400
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyLong:4294967297
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
== org.apache.hadoop.hive.serde2.lazy.LazyFloat:34.532
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
== org.apache.hadoop.hive.serde2.lazy.LazyDouble:2.184239842983489E15
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
== org.apache.hadoop.hive.serde2.lazy.LazyBoolean:true
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
== org.apache.hadoop.hive.serde2.lazy.LazyLong:0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
== org.apache.hadoop.hive.serde2.lazy.LazyFloat:0.0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
== org.apache.hadoop.hive.serde2.lazy.LazyDouble:0.0
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
== org.apache.hadoop.hive.serde2.lazy.LazyBoolean:false
== 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
{noformat}

ORC:
{noformat}
13/05/11 02:56:16 INFO mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.ql.io.orc.OrcSerde with properties 
{transient_lastDdlTime=1368266162, serialization.null.format=\N, 
columns=ti,si,i,bi,f,d,b, serialization.format=1, 
columns.types=int,int,int,bigint,float,double,boolean}
== org.apache.hadoop.hive.serde2.io.ByteWritable:-3
== 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector:int
13/05/11 02:56:16 WARN mapred.LocalJobRunner: job_local_0003
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
converting read value to tuple
at 
org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
org.apache.hadoop.io.IntWritable
at 

[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int

2013-05-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656388#comment-13656388
 ] 

Sushanth Sowmyan commented on HIVE-4551:


I'm attaching a patch for this, by doing the following:

a) Removing promotion logic from HCatSchema, keeping that pure so it reflects 
the table type.
b) Doing to conversion to appropriate pig types inside PigHCatUTil. This breaks 
Travis' original intent of having HCatRecord/HCatSchema do promotions for all 
M/R programs, but given that there was a bug in that conversion anyway, this 
breakage is not a backward-incompatible breakage.
c) If we intend to add back that support, then the correct way to do that, imo, 
is to add that promotion to HCatRecord's accessors, but leave HCatSchema alone.
d) I've also added a new Testcase to mimic the e2e test that failed, and so we 
can build on that from now on. I've also refactored more Loader/Storer tests to 
run against orc as well.



 ORC - HCatLoader integration has issues with smallint/tinyint promotions to 
 Int
 ---

 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 This was initially reported from an e2e test run, with the following E2E test:
 {code}
 {
 'name' = 'Hadoop_ORC_Write',
 'tests' = [
 {
  'num' = 1
 ,'hcat_prep'=q\
 drop table if exists hadoop_orc;
 create table hadoop_orc (
 t tinyint,
 si smallint,
 i int,
 b bigint,
 f float,
 d double,
 s string)
 stored as orc;\
 ,'hadoop' = q\
 jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
 :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
 ,'result_table' = 'hadoop_orc'
 ,'sql' = q\select * from all100k;\
 ,'floatpostprocess' = 1
 ,'delimiter' = '   '
 },
],
 },
 {code}
 This fails with the following error:
 {code}
 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
   at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
 org.apache.hadoop.io.IntWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
   at 
 org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
   ... 12 more
 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int

2013-05-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4551:
---

Attachment: 4551.patch

(patch attached)

 ORC - HCatLoader integration has issues with smallint/tinyint promotions to 
 Int
 ---

 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: 4551.patch


 This was initially reported from an e2e test run, with the following E2E test:
 {code}
 {
 'name' = 'Hadoop_ORC_Write',
 'tests' = [
 {
  'num' = 1
 ,'hcat_prep'=q\
 drop table if exists hadoop_orc;
 create table hadoop_orc (
 t tinyint,
 si smallint,
 i int,
 b bigint,
 f float,
 d double,
 s string)
 stored as orc;\
 ,'hadoop' = q\
 jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
 :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
 ,'result_table' = 'hadoop_orc'
 ,'sql' = q\select * from all100k;\
 ,'floatpostprocess' = 1
 ,'delimiter' = '   '
 },
],
 },
 {code}
 This fails with the following error:
 {code}
 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
   at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
 org.apache.hadoop.io.IntWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
   at 
 org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
   ... 12 more
 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4551) ORC - HCatLoader integration has issues with smallint/tinyint promotions to Int

2013-05-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656393#comment-13656393
 ] 

Sushanth Sowmyan commented on HIVE-4551:


[~traviscrawford], could you please have a look at this?

 ORC - HCatLoader integration has issues with smallint/tinyint promotions to 
 Int
 ---

 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: 4551.patch


 This was initially reported from an e2e test run, with the following E2E test:
 {code}
 {
 'name' = 'Hadoop_ORC_Write',
 'tests' = [
 {
  'num' = 1
 ,'hcat_prep'=q\
 drop table if exists hadoop_orc;
 create table hadoop_orc (
 t tinyint,
 si smallint,
 i int,
 b bigint,
 f float,
 d double,
 s string)
 stored as orc;\
 ,'hadoop' = q\
 jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
 :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
 ,'result_table' = 'hadoop_orc'
 ,'sql' = q\select * from all100k;\
 ,'floatpostprocess' = 1
 ,'delimiter' = '   '
 },
],
 },
 {code}
 This fails with the following error:
 {code}
 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
   at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
 org.apache.hadoop.io.IntWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
   at 
 org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
   ... 12 more
 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-4513 - disable hivehistory logs by default

2013-05-13 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11029/
---

(Updated May 13, 2013, 9:51 p.m.)


Review request for hive.


Changes
---

Changes in new patch - 
add @Override to interface functions being implemented in HiveHistoryImpl
Removing javadoc duplication in HiveHistoryImpl. It will automatically inherit 
the documentation from interface.
Logging the exception in code unrelated to patch, to partly address Brock's 
concern. Since the code is not part of the patch, I don't want to increase the 
scope to address that concern.


Description
---

HIVE-4513


This addresses bug HIVE-4513.
https://issues.apache.org/jira/browse/HIVE-4513


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 
  conf/hive-default.xml.template 3a7d1dc 
  data/conf/hive-site.xml 544ba35 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 
  ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 

Diff: https://reviews.apache.org/r/11029/diff/


Testing
---


Thanks,

Thejas Nair



[jira] [Updated] (HIVE-4513) disable hivehistory logs by default

2013-05-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4513:


Description: 
HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
hive query such as query string, plan , counters and MR job progress 
information.

There is no mechanism to delete these files and as a result they get 
accumulated over time, using up lot of disk space. 
I don't think this is used by most people, so I think it would better to turn 
this off by default. Jobtracker logs already capture most of this information, 
though it is not as structured as history logs.



  was:
HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
hive query such as query string, plan , counters and MR job progress 
information.

There is no mechanism to delete these files and as a result they get 
accumulated over time, using up lot of disk space. 
I don't think this is used by most people, so I think it would better to turn 
this off by default. Jobtracker logs already capture most of this information, 
though it is not as structured as history logs.

HIVE-4500 is introducing a new config parameter to turn this off, we should use 
that to turn this off by default.



 disable hivehistory logs by default
 ---

 Key: HIVE-4513
 URL: https://issues.apache.org/jira/browse/HIVE-4513
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4513.1.patch, HIVE-4513.2.patch, HIVE-4513.3.patch, 
 HIVE-4513.4.patch


 HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
 hive query such as query string, plan , counters and MR job progress 
 information.
 There is no mechanism to delete these files and as a result they get 
 accumulated over time, using up lot of disk space. 
 I don't think this is used by most people, so I think it would better to turn 
 this off by default. Jobtracker logs already capture most of this 
 information, though it is not as structured as history logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-4513 - disable hivehistory logs by default

2013-05-13 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11029/
---

(Updated May 13, 2013, 10:12 p.m.)


Review request for hive.


Changes
---

Updating review with background of the changes.


Description (updated)
---

HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
hive query such as query string, plan , counters and MR job progress 
information.

There is no mechanism to delete these files and as a result they get 
accumulated over time, using up lot of disk space. 
I don't think this is used by most people, so I think it would better to turn 
this off by default. Jobtracker logs already capture most of this information, 
though it is not as structured as history logs.

The change :
A new config parameter hive.session.history.enabled controls if the history-log 
is enabled. By default it is set to false.
SessionState initializes the HiveHIstory object. When this config is set to 
false, it creates a Proxy object that does not do anything. I did this instead 
of having SessionState return null, because that would add null checks in too 
many places. This keeps the code cleaner and avoids possibility of NPE.
As the proxy only works against interfaces, i created a HiveHistory interface, 
moved the implementation to HiveHistoryImpl. static functions were moved to 
HiveHistoryUtil .


This addresses bug HIVE-4513.
https://issues.apache.org/jira/browse/HIVE-4513


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 
  conf/hive-default.xml.template 3a7d1dc 
  data/conf/hive-site.xml 544ba35 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java fdd56db 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 
  ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 

Diff: https://reviews.apache.org/r/11029/diff/


Testing
---


Thanks,

Thejas Nair



Re: Review Request: HIVE-4513 - disable hivehistory logs by default

2013-05-13 Thread Thejas Nair


 On May 9, 2013, 4:37 p.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java, lines 
  71-73
  https://reviews.apache.org/r/11029/diff/1/?file=289274#file289274line71
 
  This is bad... I know it's not related to your change but can we fix 
  this?

I have made things slightly better by  logging the error. I looked at throwing 
an exception, but that would need changes in other classes to handle the 
exception correctly (Such as hive web interface classes).  Since this code is 
unrelated to the patch, and it is not a 1-2 liner, I think we should address 
that separately. 


- Thejas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11029/#review20380
---


On May 13, 2013, 10:12 p.m., Thejas Nair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11029/
 ---
 
 (Updated May 13, 2013, 10:12 p.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
 hive query such as query string, plan , counters and MR job progress 
 information.
 
 There is no mechanism to delete these files and as a result they get 
 accumulated over time, using up lot of disk space. 
 I don't think this is used by most people, so I think it would better to turn 
 this off by default. Jobtracker logs already capture most of this 
 information, though it is not as structured as history logs.
 
 The change :
 A new config parameter hive.session.history.enabled controls if the 
 history-log is enabled. By default it is set to false.
 SessionState initializes the HiveHIstory object. When this config is set to 
 false, it creates a Proxy object that does not do anything. I did this 
 instead of having SessionState return null, because that would add null 
 checks in too many places. This keeps the code cleaner and avoids possibility 
 of NPE.
 As the proxy only works against interfaces, i created a HiveHistory 
 interface, moved the implementation to HiveHistoryImpl. static functions were 
 moved to HiveHistoryUtil .
 
 
 This addresses bug HIVE-4513.
 https://issues.apache.org/jira/browse/HIVE-4513
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1672453 
   conf/hive-default.xml.template 3a7d1dc 
   data/conf/hive-site.xml 544ba35 
   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java e1c1ae3 
   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java 
 fdd56db 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 3d43451 
   ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java a783303 
 
 Diff: https://reviews.apache.org/r/11029/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Thejas Nair
 




[jira] [Updated] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration

2013-05-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-4551:
---

Summary: HCatLoader smallint/tinyint promotions to Int have issues with ORC 
integration  (was: ORC - HCatLoader integration has issues with 
smallint/tinyint promotions to Int)

 HCatLoader smallint/tinyint promotions to Int have issues with ORC integration
 --

 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: 4551.patch


 This was initially reported from an e2e test run, with the following E2E test:
 {code}
 {
 'name' = 'Hadoop_ORC_Write',
 'tests' = [
 {
  'num' = 1
 ,'hcat_prep'=q\
 drop table if exists hadoop_orc;
 create table hadoop_orc (
 t tinyint,
 si smallint,
 i int,
 b bigint,
 f float,
 d double,
 s string)
 stored as orc;\
 ,'hadoop' = q\
 jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
 :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
 ,'result_table' = 'hadoop_orc'
 ,'sql' = q\select * from all100k;\
 ,'floatpostprocess' = 1
 ,'delimiter' = '   '
 },
],
 },
 {code}
 This fails with the following error:
 {code}
 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
   at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
 org.apache.hadoop.io.IntWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
   at 
 org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
   ... 12 more
 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-05-13 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656479#comment-13656479
 ] 

Eric Hanson commented on HIVE-4525:
---

Yes, so you'd have to support both at least for an extended period of time. It 
would be a performance enhancement and you'd need to maintain backward 
compatibility for older data.

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D10755.1.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration

2013-05-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656493#comment-13656493
 ] 

Sushanth Sowmyan commented on HIVE-4551:


Also, a few more notes : 

a) With my patch that fixes this bug, HCatRecordSerDe still is doing the 
promotion, so HCatRecord does have the promoted data when reading off it, so 
promotion is still configurable in the current way. I intend to refactor this 
out in a new patch(details below)
b) Only the HCatSchema has been made pure in that it reflects the underlying 
data.

--

My eventual goal, post-bugfix, to clean this up is as follows:

a) HCatRecord and HCatSchema reflect underlying raw data and do no promotions.
b) Introduce a ConversionImpl, which defines various datatype conversion 
functions, which all default to returning the input, and having a config that 
allows a user which conversions are implemented.
c) Introduce a PromotedHCatRecord  PromotedHCatSchema that wrap 
HCatRecord/HCatSchema and use a ConversionImpl.
d) Implement a PigLoaderConversionImpl/PigStorerConversionImpl in 
hcat-pig-adapter, which implements the following: Short-Int promotion, 
Short-Int promotion, Boolean-Int promotion
e) Have HCatLoader/HCatStorer use the promoted versions of 
HCatRecord/HCatSchema which use the PigConversionImpl.
f) Remove the current HCatContext promotion parameters and make them be 
HCatLoader/HCatStorer parameters.


 HCatLoader smallint/tinyint promotions to Int have issues with ORC integration
 --

 Key: HIVE-4551
 URL: https://issues.apache.org/jira/browse/HIVE-4551
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: 4551.patch


 This was initially reported from an e2e test run, with the following E2E test:
 {code}
 {
 'name' = 'Hadoop_ORC_Write',
 'tests' = [
 {
  'num' = 1
 ,'hcat_prep'=q\
 drop table if exists hadoop_orc;
 create table hadoop_orc (
 t tinyint,
 si smallint,
 i int,
 b bigint,
 f float,
 d double,
 s string)
 stored as orc;\
 ,'hadoop' = q\
 jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
 :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
 ,'result_table' = 'hadoop_orc'
 ,'sql' = q\select * from all100k;\
 ,'floatpostprocess' = 1
 ,'delimiter' = '   '
 },
],
 },
 {code}
 This fails with the following error:
 {code}
 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
   at 
 org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
   at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
 org.apache.hadoop.io.IntWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
   at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
   at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
   at 
 

[jira] [Updated] (HIVE-4550) local_mapred_error_cache fails on some hadoop versions

2013-05-13 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4550:
-

Status: Patch Available  (was: Open)

 local_mapred_error_cache fails on some hadoop versions
 --

 Key: HIVE-4550
 URL: https://issues.apache.org/jira/browse/HIVE-4550
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Attachments: HIVE-4550.1.patch


 I've tested it manually on the upcoming 1.3 version (branch 1).
 We do mask job_* ids, but not job_local* ids. The fix is to extend this to 
 both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4475) Switch RCFile default to LazyBinaryColumnarSerDe

2013-05-13 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656512#comment-13656512
 ] 

Gunther Hagleitner commented on HIVE-4475:
--

review: https://reviews.facebook.net/D10785

 Switch RCFile default to LazyBinaryColumnarSerDe
 

 Key: HIVE-4475
 URL: https://issues.apache.org/jira/browse/HIVE-4475
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4475.1.patch


 For most workloads it seems LazyBinaryColumnarSerDe (binary) will perform 
 better than ColumnarSerDe (text). Not sure why ColumnarSerDe is the default, 
 but my guess is, that's for historical reasons. I suggest switching the 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4542) TestJdbcDriver2.testMetaDataGetSchemas fails because of unexpected database

2013-05-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4542:


Attachment: HIVE-4542.1.patch

HIVE-4542.1.patch - needs HIVE-4171 (HIVE-4171.4.patch) to be applied first.



 TestJdbcDriver2.testMetaDataGetSchemas fails because of unexpected database
 ---

 Key: HIVE-4542
 URL: https://issues.apache.org/jira/browse/HIVE-4542
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4542.1.patch


 The check for database name in TestJdbcDriver2.testMetaDataGetSchemas fails 
 with the error -
 {code}
 junit.framework.ComparisonFailure: expected:...efault but was:...bname
 {code}
 ie, a database called dbname is found, which it does not expect. This failure 
 will happen depending on the order in which the function get the databases, 
 if default database is the first one, it succeeds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4535) hive build fails with hadoop 0.20

2013-05-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4535:


Assignee: Thejas M Nair

 hive build fails with hadoop 0.20
 -

 Key: HIVE-4535
 URL: https://issues.apache.org/jira/browse/HIVE-4535
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4535.1.patch, HIVE-4535.2.patch


 ant  package -Dhadoop.mr.rev=20
 leads to - 
 {code}
 [javac] 
 /Users/thejas/hive_thejas_git/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java:382:
  cannot find symbol
 [javac] symbol  : method 
 join(java.lang.String,java.util.Listjava.lang.String)
 [javac] location: class org.apache.hadoop.util.StringUtils
 [javac]   StringUtils.join(,, incompatibleCols)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: Add Vectorized Substr

2013-05-13 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11106/#review20512
---



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42276

please add javadoc comment for purpose of class



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42278

put comment for why this is here



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42280

explain more clearly what this function does



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42281

if you use a negative start index -n, the existing Hive code 
(non-vectorized) seems to take the tail end total n characters. e.g. 
substr(foo, -2) is oo. If you use -n and n is greater than the string 
length, the output is the empty string. Please handle this case with the same 
behavior as non vectorized hive.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42282

please run ant checkstyle and follow suggestions, e.g. there is no blank 
after if before (



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42283

also set output value to empty string if output is null

outV.noNulls needs to get set for every case and doesn't get set here



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42289

len[0] - offset could be negative.

Do you need to use len[0] - (start[0] - offset)?

Make sure you have unit tests for case where start != 0.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
https://reviews.apache.org/r/11106/#comment42296

I've heard that if the common case is the first case, things can run 
faster. You could reverse these and make the test for offset  -1.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
https://reviews.apache.org/r/11106/#comment42305

need to handle negative start index case. It appears your code could get 
array out of bounds in that case.

Also, for substrLength = 0, result should be empty string



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
https://reviews.apache.org/r/11106/#comment42308

should set isRepeating to false for default case



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
https://reviews.apache.org/r/11106/#comment42306

need to check noNulls first. If noNulls then you can't look into isNull 
array or you could see invalid data



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
https://reviews.apache.org/r/11106/#comment42309

Is this output supposed to be null or empty string?

My add-hoc tests seemed to show it was empty string. You should double 
check.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
https://reviews.apache.org/r/11106/#comment42310

need to set outV.isNull[I] to true or false always, unless you are setting 
outV.noNulls to true.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
https://reviews.apache.org/r/11106/#comment42311

second argument to VEctorizedRowBatch constructor should not be use (it 
defaults to correct value)



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
https://reviews.apache.org/r/11106/#comment42314

argument is not needed -- use default constructor with no args



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
https://reviews.apache.org/r/11106/#comment42312

you need to test for some data with multi-byte characters. There is an 
example of that someplace else in the tests.



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
https://reviews.apache.org/r/11106/#comment42315

need to try to test the case where data start position is not 0



ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
https://reviews.apache.org/r/11106/#comment42313

need to verify the other rows besides 0 are always set to not null. The 
isNull entries for them could have been true by chance from a previous use of 
the batch


- Eric Hanson


On May 13, 2013, 9:54 p.m., Timothy Chen wrote:
 
 

[jira] [Created] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly

2013-05-13 Thread Sarvesh Sakalanaga (JIRA)
Sarvesh Sakalanaga created HIVE-4552:


 Summary: Vectorized RecordReader for ORC does not set the 
ColumnVector.IsRepeating correctly
 Key: HIVE-4552
 URL: https://issues.apache.org/jira/browse/HIVE-4552
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga


IsRepeating flag in ColumnVector is being set incorrectly by ORC 
RecordReader(RecordReaderImpl.java) and as such wrong results are being written 
by VectorFileSinkOperator. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4495) Implement vectorized string substr

2013-05-13 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13656578#comment-13656578
 ] 

Eric Hanson commented on HIVE-4495:
---

See my comments on the first version of the patch at 
https://reviews.apache.org/r/11106/

 Implement vectorized string substr
 --

 Key: HIVE-4495
 URL: https://issues.apache.org/jira/browse/HIVE-4495
 Project: Hive
  Issue Type: Sub-task
Reporter: Timothy Chen
Assignee: Timothy Chen



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-13 Thread tony murphy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/
---

Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and 
Remus Rusanu.


Description
---

This patch adds Column Column, and Column Scalar vectorized execution tests. 
These tests are generated in parallel with the vectorized expressions. The 
tests focus is on validating the column vector and the vectorized row batch 
metadata regarding nulls, repeating, and selection.

Overview of Changes:

CodeGen.java:
+ joinPath, getCamelCaseType, readFile and writeFile made static for use in 
TestCodeGen.java.
+ filter types now specify null as their output type rather than doesn't 
matter to make detection for test generation easier.
+ support for test generation added.

TestCodeGen.java  Templates: 
 TestClass.txt
 TestColumnColumnFilterVectorExpressionEvaluation.txt,
 TestColumnColumnOperationVectorExpressionEvaluation.txt,
 TestColumnScalarFilterVectorExpressionEvaluation.txt,
 TestColumnScalarOperationVectorExpressionEvaluation.txt
+This class is mutable and maintains a hashmap of TestSuiteClassName to test 
cases. The tests cases are added over the course of vectorized expressions 
class generation, with test classes being outputted at the end. For each column 
vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used 
to generate test cases across nulls and repeating dimensions. Based on the 
input column vector(s) nulls and repeating states the states of the output 
column vector (if there is one) is validated, along with the null vector. For 
filter operations the selection vector is validated against the generated data. 
Each template corresponds to a class representing a test suite.

VectorizedRowGroupUtil.java
+added methods generateLongColumnVector and generateDoubleColumnVector for 
generating the respective column vectors with optional nulls and/or repeating 
values.


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 53d9a7a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
 8a07567 

Diff: https://reviews.apache.org/r/11133/diff/


Testing
---

generated tests, and ran them.


Thanks,

tony murphy



Re: Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-13 Thread tony murphy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/
---

(Updated May 14, 2013, 12:27 a.m.)


Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and 
Remus Rusanu.


Description
---

This patch adds Column Column, and Column Scalar vectorized execution tests. 
These tests are generated in parallel with the vectorized expressions. The 
tests focus is on validating the column vector and the vectorized row batch 
metadata regarding nulls, repeating, and selection.

Overview of Changes:

CodeGen.java:
+ joinPath, getCamelCaseType, readFile and writeFile made static for use in 
TestCodeGen.java.
+ filter types now specify null as their output type rather than doesn't 
matter to make detection for test generation easier.
+ support for test generation added.

TestCodeGen.java  Templates: 
 TestClass.txt
 TestColumnColumnFilterVectorExpressionEvaluation.txt,
 TestColumnColumnOperationVectorExpressionEvaluation.txt,
 TestColumnScalarFilterVectorExpressionEvaluation.txt,
 TestColumnScalarOperationVectorExpressionEvaluation.txt
+This class is mutable and maintains a hashmap of TestSuiteClassName to test 
cases. The tests cases are added over the course of vectorized expressions 
class generation, with test classes being outputted at the end. For each column 
vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used 
to generate test cases across nulls and repeating dimensions. Based on the 
input column vector(s) nulls and repeating states the states of the output 
column vector (if there is one) is validated, along with the null vector. For 
filter operations the selection vector is validated against the generated data. 
Each template corresponds to a class representing a test suite.

VectorizedRowGroupUtil.java
+added methods generateLongColumnVector and generateDoubleColumnVector for 
generating the respective column vectors with optional nulls and/or repeating 
values.


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 53d9a7a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
 8a07567 

Diff: https://reviews.apache.org/r/11133/diff/


Testing
---

generated tests, and ran them.


Thanks,

tony murphy



[jira] [Created] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-13 Thread Tony Murphy (JIRA)
Tony Murphy created HIVE-4553:
-

 Summary: Column Column, and Column Scalar vectorized execution 
tests
 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch


review board review: https://reviews.apache.org/r/11133/

This patch adds Column Column, and Column Scalar vectorized execution tests. 
These tests are generated in parallel with the vectorized expressions. The 
tests focus is on validating the column vector and the vectorized row batch 
metadata regarding nulls, repeating, and selection.

Overview of Changes:

CodeGen.java:
+ joinPath, getCamelCaseType, readFile and writeFile made static for use in 
TestCodeGen.java.
+ filter types now specify null as their output type rather than doesn't 
matter to make detection for test generation easier.
+ support for test generation added.

TestCodeGen.java  Templates: 
 TestClass.txt
 TestColumnColumnFilterVectorExpressionEvaluation.txt,
 TestColumnColumnOperationVectorExpressionEvaluation.txt,
 TestColumnScalarFilterVectorExpressionEvaluation.txt,
 TestColumnScalarOperationVectorExpressionEvaluation.txt
+This class is mutable and maintains a hashmap of TestSuiteClassName to test 
cases. The tests cases are added over the course of vectorized expressions 
class generation, with test classes being outputted at the end. For each column 
vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used 
to generate test cases across nulls and repeating dimensions. Based on the 
input column vector(s) nulls and repeating states the states of the output 
column vector (if there is one) is validated, along with the null vector. For 
filter operations the selection vector is validated against the generated data. 
Each template corresponds to a class representing a test suite.

VectorizedRowGroupUtil.java
+added methods generateLongColumnVector and generateDoubleColumnVector for 
generating the respective column vectors with optional nulls and/or repeating 
values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: Column Column, and Column Scalar vectorized execution tests

2013-05-13 Thread tony murphy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11133/
---

(Updated May 14, 2013, 12:34 a.m.)


Review request for hive, Jitendra Pandey, Eric Hanson, Sarvesh Sakalanaga, and 
Remus Rusanu.


Description
---

This patch adds Column Column, and Column Scalar vectorized execution tests. 
These tests are generated in parallel with the vectorized expressions. The 
tests focus is on validating the column vector and the vectorized row batch 
metadata regarding nulls, repeating, and selection.

Overview of Changes:

CodeGen.java:
+ joinPath, getCamelCaseType, readFile and writeFile made static for use in 
TestCodeGen.java.
+ filter types now specify null as their output type rather than doesn't 
matter to make detection for test generation easier.
+ support for test generation added.

TestCodeGen.java  Templates: 
 TestClass.txt
 TestColumnColumnFilterVectorExpressionEvaluation.txt,
 TestColumnColumnOperationVectorExpressionEvaluation.txt,
 TestColumnScalarFilterVectorExpressionEvaluation.txt,
 TestColumnScalarOperationVectorExpressionEvaluation.txt
+This class is mutable and maintains a hashmap of TestSuiteClassName to test 
cases. The tests cases are added over the course of vectorized expressions 
class generation, with test classes being outputted at the end. For each column 
vector (inputs and/or outputs) a matrix of pairwise covering Booleans is used 
to generate test cases across nulls and repeating dimensions. Based on the 
input column vector(s) nulls and repeating states the states of the output 
column vector (if there is one) is validated, along with the null vector. For 
filter operations the selection vector is validated against the generated data. 
Each template corresponds to a class representing a test suite.

VectorizedRowGroupUtil.java
+added methods generateLongColumnVector and generateDoubleColumnVector for 
generating the respective column vectors with optional nulls and/or repeating 
values.


This addresses bug HIVE-4553.
https://issues.apache.org/jira/browse/HIVE-4553


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 53d9a7a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestClass.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestCodeGen.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnColumnOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarFilterVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/TestColumnScalarOperationVectorExpressionEvaluation.txt
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnColumnOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarFilterVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/TestColumnScalarOperationVectorExpressionEvaluation.java
 PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/VectorizedRowGroupGenUtil.java
 8a07567 

Diff: https://reviews.apache.org/r/11133/diff/


Testing
---

generated tests, and ran them.


Thanks,

tony murphy



[jira] [Updated] (HIVE-4553) Column Column, and Column Scalar vectorized execution tests

2013-05-13 Thread Tony Murphy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4553:
--

Attachment: HIVE-4553.patch

 Column Column, and Column Scalar vectorized execution tests
 ---

 Key: HIVE-4553
 URL: https://issues.apache.org/jira/browse/HIVE-4553
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4553.patch


 review board review: https://reviews.apache.org/r/11133/
 This patch adds Column Column, and Column Scalar vectorized execution tests. 
 These tests are generated in parallel with the vectorized expressions. The 
 tests focus is on validating the column vector and the vectorized row batch 
 metadata regarding nulls, repeating, and selection.
 Overview of Changes:
 CodeGen.java:
 + joinPath, getCamelCaseType, readFile and writeFile made static for use in 
 TestCodeGen.java.
 + filter types now specify null as their output type rather than doesn't 
 matter to make detection for test generation easier.
 + support for test generation added.
 TestCodeGen.java  Templates: 
  TestClass.txt
  TestColumnColumnFilterVectorExpressionEvaluation.txt,
  TestColumnColumnOperationVectorExpressionEvaluation.txt,
  TestColumnScalarFilterVectorExpressionEvaluation.txt,
  TestColumnScalarOperationVectorExpressionEvaluation.txt
 +This class is mutable and maintains a hashmap of TestSuiteClassName to test 
 cases. The tests cases are added over the course of vectorized expressions 
 class generation, with test classes being outputted at the end. For each 
 column vector (inputs and/or outputs) a matrix of pairwise covering Booleans 
 is used to generate test cases across nulls and repeating dimensions. Based 
 on the input column vector(s) nulls and repeating states the states of the 
 output column vector (if there is one) is validated, along with the null 
 vector. For filter operations the selection vector is validated against the 
 generated data. Each template corresponds to a class representing a test 
 suite.
 VectorizedRowGroupUtil.java
 +added methods generateLongColumnVector and generateDoubleColumnVector for 
 generating the respective column vectors with optional nulls and/or repeating 
 values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly

2013-05-13 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4552:
-

Attachment: Hive.4552.0.patch

Patch uploaded. 

 Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating 
 correctly
 ---

 Key: HIVE-4552
 URL: https://issues.apache.org/jira/browse/HIVE-4552
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: Hive.4552.0.patch


 IsRepeating flag in ColumnVector is being set incorrectly by ORC 
 RecordReader(RecordReaderImpl.java) and as such wrong results are being 
 written by VectorFileSinkOperator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4552) Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating correctly

2013-05-13 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4552:
-

Status: Patch Available  (was: Open)

 Vectorized RecordReader for ORC does not set the ColumnVector.IsRepeating 
 correctly
 ---

 Key: HIVE-4552
 URL: https://issues.apache.org/jira/browse/HIVE-4552
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: Hive.4552.0.patch


 IsRepeating flag in ColumnVector is being set incorrectly by ORC 
 RecordReader(RecordReaderImpl.java) and as such wrong results are being 
 written by VectorFileSinkOperator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2

2013-05-13 Thread Thejas Nair
Owen,
Where do I find the public keys you used to sign the files ? putting it in
http://apache.org/dist/hive/KEYS seems to be the convention so far. (I
found that location from similar location in  pig howtorelease doc
https://cwiki.apache.org/confluence/display/PIG/HowToRelease).
Thanks,
Thejas





On Mon, May 13, 2013 at 10:19 AM, Owen O'Malley omal...@apache.org wrote:

 On Saturday, I didn't include the Maven staging urls:

 Hive:
 https://repository.apache.org/content/repositories/orgapachehive-013/
 HCatalog:
 https://repository.apache.org/content/repositories/orgapachehcatalog-014/

 Thanks,
Owen


 On Sat, May 11, 2013 at 10:33 AM, Owen O'Malley omal...@apache.org
 wrote:

  Based on feedback from everyone, I have respun release candidate, RC2.
  Please take a look. We've fixed 7 problems with the previous RC:
  * Release notes were incorrect
   * HIVE-4018 - MapJoin failing with Distributed Cache error
   * HIVE-4421 - Improve memory usage by ORC dictionaries
   * HIVE-4500 - Ensure that HiveServer 2 closes log files.
   * HIVE-4494 - ORC map columns get class cast exception in some contexts
   * HIVE-4498 - Fix TestBeeLineWithArgs failure
   * HIVE-4505 - Hive can't load transforms with remote scripts
   * HIVE-4527 - Fix the eclipse template
 
  Source tag for RC2 is at:
 
  https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2
 
 
  Source tar ball and convenience binary artifacts can be found
  at: http://people.apache.org/~omalley/hive-0.11.0rc2/
 
  This release has many goodies including HiveServer2, integrated
  hcatalog, windowing and analytical functions, decimal data type,
  better query planning, performance enhancements and various bug fixes.
  In total, we resolved more than 350 issues. Full list of fixed issues
  can be found at:  http://s.apache.org/8Fr
 
 
  Voting will conclude in 72 hours.
 
  Hive PMC Members: Please test and vote.
 
  Thanks,
 
  Owen
 



Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2

2013-05-13 Thread Owen O'Malley
On Mon, May 13, 2013 at 6:14 PM, Thejas Nair the...@hortonworks.com wrote:

 Owen,
 Where do I find the public keys you used to sign the files ?


You can get them from:

https://people.apache.org/keys/group/hive.asc


 putting it in
 http://apache.org/dist/hive/KEYS seems to be the convention so far.


Having KEYS files was the way it was done before you could put your public
key into id.apache.org. Once a committer has their key uploaded, it is
automatically added to each of the groups they are in.


 (I
 found that location from similar location in  pig howtorelease doc
 https://cwiki.apache.org/confluence/display/PIG/HowToRelease).


We should update the KEYS file to automatically redirect to the dynamic
list of keys.

-- Owen


[jira] [Created] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-13 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-4554:
-

 Summary: Failed to create a table from existing file if file path 
has spaces
 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang


To reproduce the problem,

1. Create a table, say, person_age (name STRING, age INT).
2. Create a file whose name has a space in it, say, data set.txt.
3. Try to load the date in the file to the table.

The following error can be seen in the console:

hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
Loading data to table default.person_age
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask

Note: the error message is confusing.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-13 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4554:
--

Attachment: HIVE-4554.patch

Patch attempting to fix the issue.

 Failed to create a table from existing file if file path has spaces
 ---

 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
 Attachments: HIVE-4554.patch


 To reproduce the problem,
 1. Create a table, say, person_age (name STRING, age INT).
 2. Create a file whose name has a space in it, say, data set.txt.
 3. Try to load the date in the file to the table.
 The following error can be seen in the console:
 hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
 Loading data to table default.person_age
 Failed with exception Wrong file format. Please check the file's format.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 Note: the error message is confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-13 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4554:
--

Fix Version/s: 0.11.0
   Status: Patch Available  (was: Open)

 Failed to create a table from existing file if file path has spaces
 ---

 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
 Fix For: 0.11.0

 Attachments: HIVE-4554.patch


 To reproduce the problem,
 1. Create a table, say, person_age (name STRING, age INT).
 2. Create a file whose name has a space in it, say, data set.txt.
 3. Try to load the date in the file to the table.
 The following error can be seen in the console:
 hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
 Loading data to table default.person_age
 Failed with exception Wrong file format. Please check the file's format.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 Note: the error message is confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4554) Failed to create a table from existing file if file path has spaces

2013-05-13 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-4554:
--

Status: Open  (was: Patch Available)

 Failed to create a table from existing file if file path has spaces
 ---

 Key: HIVE-4554
 URL: https://issues.apache.org/jira/browse/HIVE-4554
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
 Fix For: 0.11.0

 Attachments: HIVE-4554.patch


 To reproduce the problem,
 1. Create a table, say, person_age (name STRING, age INT).
 2. Create a file whose name has a space in it, say, data set.txt.
 3. Try to load the date in the file to the table.
 The following error can be seen in the console:
 hive LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
 Loading data to table default.person_age
 Failed with exception Wrong file format. Please check the file's format.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 Note: the error message is confusing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira