date:20120906


[ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449461#comment-13449461
 ] 

Navis commented on HIVE-3427:
-

@Ashutosh,
You are right. build/ql/test/data/exports directory is used by many 
tests(exim~, etc.). 
How about changing test directory build/ql/test/data/exports to 
build/ql/test/data/exports/HIVE-3428 or something?


 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-3427.1.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3438) Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1

Namit Jain created HIVE-3438:


 Summary: Add tests for 'm' bigs tables sortmerge join with 'n' 
small tables where both m,n1
 Key: HIVE-3438
 URL: https://issues.apache.org/jira/browse/HIVE-3438
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Namit Jain
Assignee: Namit Jain


Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be good to 
add more tests which tests the above condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3438) Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both m,n1


[ 
https://issues.apache.org/jira/browse/HIVE-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449484#comment-13449484
 ] 

Namit Jain commented on HIVE-3438:
--

I have verified that the above scenarios work - it would be good to add those 
tests.

 Add tests for 'm' bigs tables sortmerge join with 'n' small tables where both 
 m,n1
 ---

 Key: HIVE-3438
 URL: https://issues.apache.org/jira/browse/HIVE-3438
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Namit Jain
Assignee: Namit Jain

 Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be good 
 to add more tests which tests the above condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3427:


Attachment: HIVE-3427.2.patch.txt

reproduce : ant package test -Dtestcase=TestCliDriver 
-Dqfile=exim_00_nonpart_empty.q,metadata_export_drop.q

Changed directory and test passed.

 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3283) bucket information should be used from the partition instead of the table


[ 
https://issues.apache.org/jira/browse/HIVE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449502#comment-13449502
 ] 

Namit Jain commented on HIVE-3283:
--

Once https://issues.apache.org/jira/browse/HIVE-3171 is in, it would be useful 
to have the partition metadata be used for bucketing information.

 bucket information should be used from the partition instead of the table
 -

 Key: HIVE-3283
 URL: https://issues.apache.org/jira/browse/HIVE-3283
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain

 Currently Hive uses the number of buckets from the table object.
 Ideally, the number of buckets from the partition should be used

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 3.0.1

2012-09-06 Thread Sushanth Sowmyan (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449514#comment-13449514
]

Sushanth Sowmyan commented on HIVE-2084:

@Carl : As an update, I discovered that with the newer DataNucleus, what's
happening is that Map types with null values cannot be persisted. This is a
problem because we stamp a comment field in the parametersMap irrespective of
whether a comment was provided or not, and this causes a failure during index
creation.

This is also the same issue that I refer to in HIVE-2800 where thrift has
similar issues, where the fix is the same.

Upgrade datanucleus from 2.0.3 to 3.0.1
---

Key: HIVE-2084
URL: https://issues.apache.org/jira/browse/HIVE-2084
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Ning Zhang
Assignee: Sushanth Sowmyan
Labels: datanucleus
Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2084.D2397.1.patch,
HIVE-2084.1.patch.txt, HIVE-2084.2.patch.txt, HIVE-2084.patch

It seems the datanucleus 2.2.3 does a better join in caching. The time it
takes to get the same set of partition objects takes about 1/4 of the time it
took for the first time. While with 2.0.3, it took almost the same amount of
time in the second execution. We should retest the test case mentioned in
HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2084) Upgrade datanucleus from 2.0.3 to 3.0.1

2012-09-06 Thread Andy Jefferson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449528#comment-13449528
 ] 

Andy Jefferson commented on HIVE-2084:
--

Obviously DataNucleus has testcases that persist Maps with null values, and 
they work (since all tests pass with every release), so clearly down to your 
map and how you're doing things.

 Upgrade datanucleus from 2.0.3 to 3.0.1
 ---

 Key: HIVE-2084
 URL: https://issues.apache.org/jira/browse/HIVE-2084
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Ning Zhang
Assignee: Sushanth Sowmyan
  Labels: datanucleus
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2084.D2397.1.patch, 
 HIVE-2084.1.patch.txt, HIVE-2084.2.patch.txt, HIVE-2084.patch


 It seems the datanucleus 2.2.3 does a better join in caching. The time it 
 takes to get the same set of partition objects takes about 1/4 of the time it 
 took for the first time. While with 2.0.3, it took almost the same amount of 
 time in the second execution. We should retest the test case mentioned in 
 HIVE-1853, HIVE-1862.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk

2012-09-06 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449636#comment-13449636
 ] 

Edward Capriolo commented on HIVE-3427:
---

As a follow up the economic tests should clean themselves up since that is the 
real issue here.

 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk

2012-09-06 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449637#comment-13449637
 ] 

Edward Capriolo commented on HIVE-3427:
---

*exim tests

 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-3436) Difference in exception string from native method causes script_pipe.q to fail on windows


 [ 
https://issues.apache.org/jira/browse/HIVE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3436.


   Resolution: Fixed
Fix Version/s: 0.10.0
 Assignee: Thejas M Nair

Committed to trunk. Thanks, Thejas!

  Difference in exception string from native method causes script_pipe.q to 
 fail on windows
 --

 Key: HIVE-3436
 URL: https://issues.apache.org/jira/browse/HIVE-3436
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.10.0

 Attachments: HIVE-3436.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2999) Offline build is not working


 [ 
https://issues.apache.org/jira/browse/HIVE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-2999.


   Resolution: Fixed
Fix Version/s: 0.10.0

Committed to trunk. Thanks, Navis!

 Offline build is not working
 

 Key: HIVE-2999
 URL: https://issues.apache.org/jira/browse/HIVE-2999
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-2999.1.patch.txt, HIVE-2999.2.patch.txt


 It's fine without -Doffline=true option. But with offline option (ant 
 -Doffline=true clean package), it's failing with error message like this.
 {noformat}
 ivy-retrieve:
  [echo] Project: common
 [ivy:retrieve] :: loading settings :: file = 
 /home/navis/apache/oss-hive/ivy/ivysettings.xml
 [ivy:retrieve] 
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]module not found: 
 org.apache.hadoop#hadoop-common;0.20.2
 [ivy:retrieve] local: tried
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/ivys/ivy.xml
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/jars/hadoop-common.jar
 [ivy:retrieve] apache-snapshot: tried
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] maven2: tried
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] datanucleus-repo: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] hadoop-source: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] hadoop-source2: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://archive.cloudera.com/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve]module not found: 
 org.apache.hadoop#hadoop-auth;0.20.2
 [ivy:retrieve] local: tried
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/ivys/ivy.xml
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/jars/hadoop-auth.jar
 [ivy:retrieve] apache-snapshot: tried
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar
 [ivy:retrieve] maven2: tried
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar
 [ivy:retrieve] datanucleus-repo: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar
 [ivy:retrieve] hadoop-source: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]

[jira] [Commented] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449751#comment-13449751
 ] 

Ashutosh Chauhan commented on HIVE-3427:


Navis,
Current patch fixes the problem. +1 will commit if tests pass. Thanks for your 
time for this one.

Ed,
Totally agree. Mind creating a new jira for it.

 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: HIVE-3323 ThriftSerde: Enable enum to string conversions

2012-09-06 Thread Travis Crawford

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6915/#review11101
---

Status update:

The CI job timed out (after 8 hours!!) so I'm looking into increasing the
global job runtime limit and rerunning the tests. When I verified the tests
pass I'll post this patch in the jira.

- Travis Crawford

On Sept. 6, 2012, 12:12 a.m., Travis Crawford wrote:

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6915/
---

(Updated Sept. 6, 2012, 12:12 a.m.)

Review request for hive and Ashutosh Chauhan.

Description
---

ThriftSerde: Enable enum to string conversions

This addresses bug HIVE-3323.
https://issues.apache.org/jira/browse/HIVE-3323

Diffs
-

ql/src/test/queries/clientpositive/convert_enum_to_string.q PRE-CREATION
ql/src/test/results/clientpositive/convert_enum_to_string.q.out
PRE-CREATION
serde/if/test/megastruct.thrift PRE-CREATION

serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
PRE-CREATION

serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java
PRE-CREATION

serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MyEnum.java
PRE-CREATION

serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
b21755e

serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/JavaStringObjectInspector.java
921ce2b

Diff: https://reviews.apache.org/r/6915/diff/

Testing
---

Running CI now after rebasing to master and changing the default to enabled.
Some preliminary feedback would be great though

https://travis.ci.cloudbees.com/job/HIVE-3323_enum_to_string/10/

To test, I added a new struct that contains an enum field, we check that its
schema is correctly described, and that this property can be enable/disabled
at runtime.

Something I'm not clear on with Hive is how to write more comprehensive tests
that involved more than just ql commands. For example, take a look at:

http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatHiveThriftCompatibility.java?view=markup

Here we see an example junit test I wrote that creates a file containing
thrift structs, creates the table, checks its schema, and ensures the query
returns expected output. With the Hive test suite all I add here are ql
commands that check the schema, since I'm not sure how to do the test setup.
I'm more than happy to add a more comprehensive test but would appreciate
some guidance to do that correctly.

Thanks,

Travis Crawford

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #128

2012-09-06 Thread Apache Jenkins Server

See 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/

--
[...truncated 10256 lines...]
 [echo] Project: odbc
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/odbc/src/conf
 does not exist.

ivy-resolve-test:
 [echo] Project: odbc

ivy-retrieve-test:
 [echo] Project: odbc

compile-test:
 [echo] Project: odbc

create-dirs:
 [echo] Project: serde
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/serde/src/test/resources
 does not exist.

init:
 [echo] Project: serde

ivy-init-settings:
 [echo] Project: serde

ivy-resolve:
 [echo] Project: serde
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-serde-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-serde-default.html

ivy-retrieve:
 [echo] Project: serde

dynamic-serde:

compile:
 [echo] Project: serde

ivy-resolve-test:
 [echo] Project: serde

ivy-retrieve-test:
 [echo] Project: serde

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/serde/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-service-default.html

ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/service/test/classes

test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/128/artifact/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling

[jira] [Updated] (HIVE-3306) SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key


 [ 
https://issues.apache.org/jira/browse/HIVE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3306:
-

   Resolution: Fixed
Fix Version/s: 0.10.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Navis

 SMBJoin/BucketMapJoin should be allowed only when join key expression is 
 exactly matches with sort/cluster key
 --

 Key: HIVE-3306
 URL: https://issues.apache.org/jira/browse/HIVE-3306
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.10.0

 Attachments: HIVE-3306.1.patch.txt


 CREATE TABLE bucket_small (key int, value string) CLUSTERED BY (key) SORTED 
 BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE 
 bucket_small;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE 
 bucket_small;
 CREATE TABLE bucket_big (key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 4 BUCKETS STORED AS TEXTFILE;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket3outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket4outof4.txt' INTO TABLE 
 bucket_big;
 select count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = 
 b.key;
 select /* + MAPJOIN(a) */ count(*) FROM bucket_small a JOIN bucket_big b ON 
 a.key + a.key = b.key;
 returns 116 (same) 
 But with BucketMapJoin or SMBJoin, it returns 61. But this should not be 
 allowed cause hash(a.key) != hash(a.key + a.key). 
 Bucket context should be utilized only with exact matching join expression 
 with sort/cluster key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics


 [ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Lu updated HIVE-3421:
--

Attachment: HIVE-3421.patch.4.txt

 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt


 Compute (estimate) top k values for each column, and put the most skewed 
 column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3422) Support partial partition specifications in when enabling/disabling protections in Hive

2012-09-06 Thread Jean Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449923#comment-13449923
 ] 

Jean Xu commented on HIVE-3422:
---

Phabricator diff: https://reviews.facebook.net/D5241

 Support partial partition specifications in when enabling/disabling 
 protections in Hive
 ---

 Key: HIVE-3422
 URL: https://issues.apache.org/jira/browse/HIVE-3422
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Jean Xu
Priority: Minor

 Currently if you have a table t with partition columns c1 and c2 the 
 following command works:
 ALTER TABLE t PARTITION (c1 = 'x', c2 = 'y') ENABLE NO_DROP;
 The following does not:
 ALTER TABLE t PARTITION (c1 = 'x') ENABLE NO_DROP;
 We would like all existing partitions for which c1 = 'x' to have NO_DROP 
 enabled when a user runs the above command

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)


 [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-3098:
---

Status: Open  (was: Patch Available)

Posting updated patch for unsecure-Hadoop.

 Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
 (Must cache UGIs.)
 -

 Key: HIVE-3098
 URL: https://issues.apache.org/jira/browse/HIVE-3098
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.9.0
 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
 turned on.
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: Hive-3098_(FS_closeAllForUGI()).patch, Hive_3098.patch


 The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
 the Oracle backend).
 The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
 in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
 100 instances of FileSystem, whose combined retained-mem consumed the 
 entire heap.
 It boiled down to hadoop::UserGroupInformation::equals() being implemented 
 such that the Subject member is compared for equality (==), and not 
 equivalence (.equals()). This causes equivalent UGI instances to compare as 
 unequal, and causes a new FileSystem instance to be created and cached.
 The UGI.equals() is so implemented, incidentally, as a fix for yet another 
 problem (HADOOP-6670); so it is unlikely that that implementation can be 
 modified.
 The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
 the Hive metastore), using an cache for UGI instances in the shims.
 I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
 test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2095) auto convert map join bug

2012-09-06 Thread Matt Kleiderman (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449933#comment-13449933
]

Matt Kleiderman commented on HIVE-2095:
---

I think I'm hitting this issue with an 0.7.1 installation - can you provide
information about how big the tables need to be in order to trigger the
NullPointerException?

auto convert map join bug
-

Key: HIVE-2095
URL: https://issues.apache.org/jira/browse/HIVE-2095
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.7.0
Reporter: He Yongqiang
Assignee: He Yongqiang
Fix For: 0.8.0

Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch

1)
when considering to choose one table as the big table candidate for a map
join, if at compile time, hive can find out that the total known size of all
other tables excluding the big table in consideration is bigger than a
configured value, this big table candidate is a bad one, and should not put
into plan. Otherwise, at runtime to filter this out may cause more time.
2)
added a null check for back up tasks. Otherwise will see NullPointerException
3)
CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise
it will make wrong decision.
4)
changes made to the ConditionalResolverCommonJoin: added pathToAliases,
aliasToSize (alias's input size that is known at compile time, by
inputSummary), and intermediate dir path.
So the logic is, go over all the pathToAliases, and for each path, if it is
from intermediate dir path, add this path's size to all aliases. And finally
based on the size information and others like aliasToTask to choose the big
table.
5)
Conditional task's children contains wrong options, which may cause join fail
or incorrect results. Basically when getting all possible children for the
conditional task, should use a whitelist of big tables. Only tables in this
while list can be considered as a big table.
Here is the logic:
+ * Get a list of big table candidates. Only the tables in the returned set
can
+ * be used as big table in the join operation.
+ *
+ * The logic here is to scan the join condition array from left to right.
If
+ * see a inner join and the bigTableCandidates is empty, add both side of
this
+ * inner join to big table candidates. If see a left outer join, and the
+ * bigTableCandidates is empty, add the left side to it, and if the
+ * bigTableCandidates is not empty, do nothing (which means the
+ * bigTableCandidates is from left side). If see a right outer join, clear
the
+ * bigTableCandidates, and add right side to the bigTableCandidates, it
means
+ * the right side of a right outer join always win. If see a full outer
join,
+ * return null immediately (no one can be the big table, can not do a
+ * mapjoin).

[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)


 [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-3098:
---

Attachment: hive-3098.patch

Updated patch that fixes the leak in TUGIBasedProcessor (alongside the fix in 
HadoopThriftAuthBridge20S.)

 Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
 (Must cache UGIs.)
 -

 Key: HIVE-3098
 URL: https://issues.apache.org/jira/browse/HIVE-3098
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.9.0
 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
 turned on.
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, 
 Hive_3098.patch


 The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
 the Oracle backend).
 The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
 in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
 100 instances of FileSystem, whose combined retained-mem consumed the 
 entire heap.
 It boiled down to hadoop::UserGroupInformation::equals() being implemented 
 such that the Subject member is compared for equality (==), and not 
 equivalence (.equals()). This causes equivalent UGI instances to compare as 
 unequal, and causes a new FileSystem instance to be created and cached.
 The UGI.equals() is so implemented, incidentally, as a fix for yet another 
 problem (HADOOP-6670); so it is unlikely that that implementation can be 
 modified.
 The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
 the Hive metastore), using an cache for UGI instances in the shims.
 I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
 test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)


[ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449937#comment-13449937
 ] 

Mithun Radhakrishnan commented on HIVE-3098:


Thanks, Ashutosh and Alan. The new patch looks good.

 Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
 (Must cache UGIs.)
 -

 Key: HIVE-3098
 URL: https://issues.apache.org/jira/browse/HIVE-3098
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.9.0
 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
 turned on.
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, 
 Hive_3098.patch


 The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
 the Oracle backend).
 The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
 in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
 100 instances of FileSystem, whose combined retained-mem consumed the 
 entire heap.
 It boiled down to hadoop::UserGroupInformation::equals() being implemented 
 such that the Subject member is compared for equality (==), and not 
 equivalence (.equals()). This causes equivalent UGI instances to compare as 
 unequal, and causes a new FileSystem instance to be created and cached.
 The UGI.equals() is so implemented, incidentally, as a fix for yet another 
 problem (HADOOP-6670); so it is unlikely that that implementation can be 
 modified.
 The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
 the Hive metastore), using an cache for UGI instances in the shims.
 I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
 test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)


 [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-3098:
---

Status: Patch Available  (was: Open)

 Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
 (Must cache UGIs.)
 -

 Key: HIVE-3098
 URL: https://issues.apache.org/jira/browse/HIVE-3098
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.9.0
 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
 turned on.
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, 
 Hive_3098.patch


 The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
 the Oracle backend).
 The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
 in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
 100 instances of FileSystem, whose combined retained-mem consumed the 
 entire heap.
 It boiled down to hadoop::UserGroupInformation::equals() being implemented 
 such that the Subject member is compared for equality (==), and not 
 equivalence (.equals()). This causes equivalent UGI instances to compare as 
 unequal, and causes a new FileSystem instance to be created and cached.
 The UGI.equals() is so implemented, incidentally, as a fix for yet another 
 problem (HADOOP-6670); so it is unlikely that that implementation can be 
 modified.
 The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
 the Hive metastore), using an cache for UGI instances in the shims.
 I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
 test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3439) PARTITIONED BY clause in CREATE TABLE is order-dependent

2012-09-06 Thread Jonathan Natkins (JIRA)

Jonathan Natkins created HIVE-3439:
--

 Summary: PARTITIONED BY clause in CREATE TABLE is order-dependent
 Key: HIVE-3439
 URL: https://issues.apache.org/jira/browse/HIVE-3439
 Project: Hive
  Issue Type: Bug
Reporter: Jonathan Natkins


hive create external table foo (a int) location '/user/natty/foo' partitioned 
by (b int);
FAILED: Parse Error: line 1:61 mismatched input 'partitioned' expecting EOF 
near ''/user/natty/foo''

hive create external table foo (a int) partitioned by (b int) location 
'/user/natty/foo';
OK
Time taken: 0.051 seconds


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed

2012-09-06 Thread Brian Bloniarz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449978#comment-13449978
 ] 

Brian Bloniarz commented on HIVE-1898:
--

I think Luke is right -- maybe the bug title should be changed to simply say 
data with newlines won't work in Text/LazySimpleSerDe tables?

I haven't tested it, but would STORED AS SEQUENCEFILE tables be immune to this 
problem?

 The ESCAPED BY clause does not seem to pick up newlines in colums and the 
 line terminator cannot be changed
 ---

 Key: HIVE-1898
 URL: https://issues.apache.org/jira/browse/HIVE-1898
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.5.0
Reporter: Josh Patterson
Priority: Minor

 If I want to preserve data in columns which contains a newline (webcrawling 
 for instance) I cannot set the ESCAPED BY clause to escape these out (other 
 characters such as commas escape fine, however). This may be due to the line 
 terminators, which are locked to be newlines, are picked up first, and then 
 fields processed. 
 This seems to be related to:
 SerDe should escape some special characters
 https://issues.apache.org/jira/browse/HIVE-136
 and
 Implement LINES TERMINATED BY
 https://issues.apache.org/jira/browse/HIVE-302
 where at comment: 
 https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435
 This is not fixable currently because the line terminator is determined by 
 LineRecordReader.LineReader which is in the Hadoop land.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3436) Difference in exception string from native method causes script_pipe.q to fail on windows

2012-09-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450010#comment-13450010
 ] 

Hudson commented on HIVE-3436:
--

Integrated in Hive-trunk-h0.21 #1651 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1651/])
HIVE-3436 :  Difference in exception string from native method causes 
script_pipe.q to fail on windows (Thejas Nair via Ashutosh Chauhan) (Revision 
1381597)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381597
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java


  Difference in exception string from native method causes script_pipe.q to 
 fail on windows
 --

 Key: HIVE-3436
 URL: https://issues.apache.org/jira/browse/HIVE-3436
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.10.0

 Attachments: HIVE-3436.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1651 - Still Failing

2012-09-06 Thread Apache Jenkins Server

Changes for Build #1638
[namit] HIVE-3393 get_json_object and json_tuple should use Jackson library
(Kevin Wilfong via namit)


Changes for Build #1639

Changes for Build #1640
[ecapriolo] HIVE-3068 Export table metadata as JSON on table drop (Andrew 
Chalfant via egc)


Changes for Build #1641

Changes for Build #1642
[hashutosh] HIVE-3338 : Archives broken for hadoop 1.0 (Vikram Dixit via 
Ashutosh Chauhan)


Changes for Build #1643

Changes for Build #1644

Changes for Build #1645
[cws] HIVE-3413. Fix pdk.PluginTest on hadoop23 (Zhenxiao Luo via cws)


Changes for Build #1646
[cws] HIVE-3056. Ability to bulk update location field in Db/Table/Partition 
records (Shreepadma Venugopalan via cws)

[cws] HIVE-3416 [jira] Fix 
TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS when running Hive on 
hadoop23
(Zhenxiao Luo via Carl Steinbach)

Summary:
HIVE-3416: Fix TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS when 
running Hive on hadoop23

TestAvroSerdeUtils determinSchemaCanReadSchemaFromHDFS is failing when running 
hive on hadoop23:

$ant very-clean package -Dhadoop.version=0.23.1 -Dhadoop-0.23.version=0.23.1 
-Dhadoop.mr.rev=23

$ant test -Dhadoop.version=0.23.1 -Dhadoop-0.23.version=0.23.1 
-Dhadoop.mr.rev=23 -Dtestcase=TestAvroSerdeUtils

 testcase classname=org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils 
name=determineSchemaCanReadSchemaFromHDFS time=0.21
error message=org/apache/hadoop/net/StaticMapping 
type=java.lang.NoClassDefFoundErrorjava.lang.NoClassDefFoundError: 
org/apache/hadoop/net/StaticMapping
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:534)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:489)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:360)
at 
org.apache.hadoop.hive.serde2.avro.TestAvroSerdeUtils.determineSchemaCanReadSchemaFromHDFS(TestAvroSerdeUtils.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.net.StaticMapping
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 25 more
/error
  /testcase

Test Plan: EMPTY

Reviewers: JIRA

Differential Revision: https://reviews.facebook.net/D5025

[cws] HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 
(008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws)

[cws] HIVE-3412. Fix TestCliDriver.repair on Hadoop 0.23.3, 3.0.0, and 
2.2.0-alpha (Zhenxiao Luo via cws)


Changes for Build #1647

Changes for Build #1648
[namit] HIVE-3429 Bucket map join involving table with more than 1 partition 
column causes 
FileNotFoundException (Kevin Wilfong via namit)


Changes for Build #1649
[hashutosh] HIVE-3075 : Improve HiveMetaStore logging (Travis Crawford via 
Ashutosh Chauhan)


Changes for Build #1650
[hashutosh] HIVE-3340 : shims unit test failures fails further test progress 
(Giridharan Kesavan via Ashutosh Chauhan)


Changes for Build #1651
[hashutosh]

[jira] [Created] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21

Zhenxiao Luo created HIVE-3440:
--

 Summary: Fix pdk PluginTest failing on trunk-h0.21
 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


Get the failure when running on hadoop21, triggered directly from pdk(when 
triggered from builtin, pdk test is passed).

Here is the execution log:

2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(256)) - job_local_0001
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 13 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
at 
org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:441)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
Caused by:

[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics


 [ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Lu updated HIVE-3421:
--

Attachment: HIVE-967.5.patch.txt

 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, 
 HIVE-967.5.patch.txt


 Compute (estimate) top k values for each column, and put the most skewed 
 column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21


[ 
https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450041#comment-13450041
 ] 

Zhenxiao Luo commented on HIVE-3440:


This is NOT running on hadoop23, even on the current trunk, build with:
$ant test -Dtest.continue.on.failure=false

Could get the error.

Also found that it happens here:
https://builds.apache.org/job/Hive-trunk-h0.21/1651/consoleFull

Seems like missing jackson mapper library.

 Fix pdk PluginTest failing on trunk-h0.21
 -

 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 Get the failure when running on hadoop21, triggered directly from pdk(when 
 triggered from builtin, pdk test is passed).
 Here is the execution log:
 2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
 (LocalJobRunner.java:run(256)) - job_local_0001
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/map/ObjectMapper
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
 at 
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
 at

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #128

2012-09-06 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/

--
[...truncated 36447 lines...]
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-09-06_14-08-39_828_6118424854506059128/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_1954180276.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] Copying file: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-09-06_14-08-44_161_4340349784355273135/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-09-06_14-08-44_161_4340349784355273135/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_1344012563.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_505766291.txt
[junit] Hive history 
file=https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/128/artifact/hive/build/service/tmp/hive_job_log_jenkins_201209061408_321993572.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] Copying

[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21


[ 
https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450056#comment-13450056
 ] 

Zhenxiao Luo commented on HIVE-3440:


Found that this is due to missing jackson mapper library.

From the build log, we could see it start failing after HIVE-3393 is commited:

Build#1637 is passed:
https://builds.apache.org/job/Hive-trunk-h0.21/1637/consoleFull

Build#1638 is failing:
https://builds.apache.org/job/Hive-trunk-h0.21/1638/consoleFull

I think to fix it. We need to put jackson-mapper dependency into ql, so that 
when pdk is running GenericUDTFJSONTuple.java, ObjectMapper initialization, no 
such NoClassDefFoundError.

 Fix pdk PluginTest failing on trunk-h0.21
 -

 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 Get the failure when running on hadoop21, triggered directly from pdk(when 
 triggered from builtin, pdk test is passed).
 Here is the execution log:
 2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
 (LocalJobRunner.java:run(256)) - job_local_0001
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/map/ObjectMapper
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
 at 
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
 at

[jira] [Commented] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21


[ 
https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450061#comment-13450061
 ] 

Zhenxiao Luo commented on HIVE-3440:


Review Request submitted at:
https://reviews.facebook.net/D5265

With this patch, pdk pluginTest is passed when triggered by both builtin and 
pdk:
$ant test -Dtest.continue.on.failure=false

 Fix pdk PluginTest failing on trunk-h0.21
 -

 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0

 Attachments: HIVE-3440.1.patch.txt


 Get the failure when running on hadoop21, triggered directly from pdk(when 
 triggered from builtin, pdk test is passed).
 Here is the execution log:
 2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
 (LocalJobRunner.java:run(256)) - job_local_0001
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/map/ObjectMapper
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
 at 
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434)
 at

[jira] [Updated] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21


 [ 
https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-3440:
---

Attachment: HIVE-3440.1.patch.txt

 Fix pdk PluginTest failing on trunk-h0.21
 -

 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0

 Attachments: HIVE-3440.1.patch.txt


 Get the failure when running on hadoop21, triggered directly from pdk(when 
 triggered from builtin, pdk test is passed).
 Here is the execution log:
 2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
 (LocalJobRunner.java:run(256)) - job_local_0001
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/map/ObjectMapper
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
 at 
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166)
 at

[jira] [Updated] (HIVE-3440) Fix pdk PluginTest failing on trunk-h0.21


 [ 
https://issues.apache.org/jira/browse/HIVE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-3440:
---

Status: Patch Available  (was: Open)

 Fix pdk PluginTest failing on trunk-h0.21
 -

 Key: HIVE-3440
 URL: https://issues.apache.org/jira/browse/HIVE-3440
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0

 Attachments: HIVE-3440.1.patch.txt


 Get the failure when running on hadoop21, triggered directly from pdk(when 
 triggered from builtin, pdk test is passed).
 Here is the execution log:
 2012-09-06 13:46:05,646 WARN  mapred.LocalJobRunner 
 (LocalJobRunner.java:run(256)) - job_local_0001
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 5 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 10 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 13 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 18 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/codehaus/jackson/map/ObjectMapper
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.clinit(GenericUDTFJSONTuple.java:54)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:472)
 at 
 org.apache.hadoop.hive.ql.exec.DefaultUDFMethodResolver.getEvalMethod(DefaultUDFMethodResolver.java:59)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:154)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:98)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:898)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:924)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:358)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:434)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:390)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166)
 at

[jira] [Created] (HIVE-3441) testcases escape1,escape2 fail on windows

2012-09-06 Thread Thejas M Nair (JIRA)

Thejas M Nair created HIVE-3441:
---

 Summary: testcases escape1,escape2 fail on windows
 Key: HIVE-3441
 URL: https://issues.apache.org/jira/browse/HIVE-3441
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.10.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3441) testcases escape1,escape2 fail on windows

2012-09-06 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450121#comment-13450121
 ] 

Thejas M Nair commented on HIVE-3441:
-

The tests fail because the partitions inserted have have partition column 
strings that have strings that are not accepted in file names on windows.


 testcases escape1,escape2 fail on windows
 -

 Key: HIVE-3441
 URL: https://issues.apache.org/jira/browse/HIVE-3441
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.10.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics


 [ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Lu updated HIVE-3421:
--

Description: 
Compute (estimate) top k values statistics for each column, and put the most 
skewed column into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

  was:
Compute (estimate) top k values for each column, and put the most skewed column 
into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf


 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, 
 HIVE-967.5.patch.txt


 Compute (estimate) top k values statistics for each column, and put the most 
 skewed column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics


 [ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Lu updated HIVE-3421:
--

Description: 
Compute (estimate) top k values statistic for each column, and put the most 
skewed column into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

  was:
Compute (estimate) top k values statistics for each column, and put the most 
skewed column into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf


 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, 
 HIVE-967.5.patch.txt


 Compute (estimate) top k values statistic for each column, and put the most 
 skewed column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3421) Column Level Top K Values Statistics


 [ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Lu updated HIVE-3421:
--

Description: 
Compute (estimate) top k values statistics for each column, and put the most 
skewed column into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

  was:
Compute (estimate) top k values statistic for each column, and put the most 
skewed column into skewed info, if user hasn't specified skew.

This feature depends on ListBucketing (create table skewed on) 
https://cwiki.apache.org/Hive/listbucketing.html.

All column topk can be added to skewed info, if in the future skewed info 
supports multiple independent columns.

The TopK algorithm is based on this paper:
http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf


 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.txt, 
 HIVE-967.5.patch.txt


 Compute (estimate) top k values statistics for each column, and put the most 
 skewed column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

Zhenxiao Luo created HIVE-3442:
--

 Summary: AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not 
working when creating external table
 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


After creating a table and load data into it, I could check that the table is 
created successfully, and data is inside:

DROP TABLE IF EXISTS ml_items;

CREATE TABLE ml_items(id INT,
  title STRING,
  release_date STRING,
  video_release_date STRING,
  imdb_url STRING,
  unknown_genre TINYINT,
  action TINYINT,
  adventure TINYINT,
  animation TINYINT,
  children TINYINT,
  comedy TINYINT,
  crime TINYINT,
  documentary TINYINT,
  drama TINYINT,
  fantasy TINYINT,
  film_noir TINYINT,
  horror TINYINT,
  musical TINYINT,
  mystery TINYINT,
  romance TINYINT,
  sci_fi TINYINT,
  thriller TINYINT,
  war TINYINT,
  western TINYINT)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
  STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;

select * from ml_items ORDER BY id ASC;

While, the following create external table with AvroSerDe is not working:

DROP TABLE IF EXISTS ml_items_as_avro;
CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';

describe ml_items_as_avro;

INSERT OVERWRITE TABLE ml_items_as_avro
  SELECT id, title,
imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
crime,
documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
sci_fi, thriller, war, western
  FROM ml_items;

ml_items_as_avro is not created with expected schema, as shown in the describe 
ml_items_as_avro output. The output is below:


PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2999) Offline build is not working

2012-09-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450183#comment-13450183
 ] 

Hudson commented on HIVE-2999:
--

Integrated in Hive-trunk-h0.21 #1652 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1652/])
HIVE-2999 : Offline build is not working (Navis via Ashutosh Chauhan) 
(Revision 1381643)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381643
Files : 
* /hive/trunk/builtins/ivy.xml
* /hive/trunk/common/ivy.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/serde/ivy.xml


 Offline build is not working
 

 Key: HIVE-2999
 URL: https://issues.apache.org/jira/browse/HIVE-2999
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-2999.1.patch.txt, HIVE-2999.2.patch.txt


 It's fine without -Doffline=true option. But with offline option (ant 
 -Doffline=true clean package), it's failing with error message like this.
 {noformat}
 ivy-retrieve:
  [echo] Project: common
 [ivy:retrieve] :: loading settings :: file = 
 /home/navis/apache/oss-hive/ivy/ivysettings.xml
 [ivy:retrieve] 
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]module not found: 
 org.apache.hadoop#hadoop-common;0.20.2
 [ivy:retrieve] local: tried
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/ivys/ivy.xml
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-common/0.20.2/jars/hadoop-common.jar
 [ivy:retrieve] apache-snapshot: tried
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] maven2: tried
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] datanucleus-repo: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://www.datanucleus.org/downloads/maven2/org/apache/hadoop/hadoop-common/0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] hadoop-source: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://mirror.facebook.net/facebook/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve] hadoop-source2: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-common;0.20.2!hadoop-common.jar:
 [ivy:retrieve]  
 http://archive.cloudera.com/hive-deps/hadoop/core/hadoop-common-0.20.2/hadoop-common-0.20.2.jar
 [ivy:retrieve]module not found: 
 org.apache.hadoop#hadoop-auth;0.20.2
 [ivy:retrieve] local: tried
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/ivys/ivy.xml
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 /home/navis/.ivy2/local/org.apache.hadoop/hadoop-auth/0.20.2/jars/hadoop-auth.jar
 [ivy:retrieve] apache-snapshot: tried
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar
 [ivy:retrieve] maven2: tried
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.pom
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:
 [ivy:retrieve]  
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/0.20.2/hadoop-auth-0.20.2.jar
 [ivy:retrieve] datanucleus-repo: tried
 [ivy:retrieve]  -- artifact 
 org.apache.hadoop#hadoop-auth;0.20.2!hadoop-auth.jar:

[jira] [Commented] (HIVE-3306) SMBJoin/BucketMapJoin should be allowed only when join key expression is exactly matches with sort/cluster key

2012-09-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450184#comment-13450184
 ] 

Hudson commented on HIVE-3306:
--

Integrated in Hive-trunk-h0.21 #1652 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1652/])
HIVE-3306 SMBJoin/BucketMapJoin should be allowed only when join key 
expression is exactly matches
with sort/cluster key (Navis via namit) (Revision 1381669)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1381669
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java
* /hive/trunk/ql/src/test/queries/clientpositive/bucket_map_join_1.q
* /hive/trunk/ql/src/test/queries/clientpositive/bucket_map_join_2.q
* /hive/trunk/ql/src/test/queries/clientpositive/bucketmapjoin_negative3.q
* /hive/trunk/ql/src/test/results/clientpositive/bucket_map_join_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket_map_join_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out


 SMBJoin/BucketMapJoin should be allowed only when join key expression is 
 exactly matches with sort/cluster key
 --

 Key: HIVE-3306
 URL: https://issues.apache.org/jira/browse/HIVE-3306
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.10.0

 Attachments: HIVE-3306.1.patch.txt


 CREATE TABLE bucket_small (key int, value string) CLUSTERED BY (key) SORTED 
 BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE 
 bucket_small;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE 
 bucket_small;
 CREATE TABLE bucket_big (key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 4 BUCKETS STORED AS TEXTFILE;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket1outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket2outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket3outof4.txt' INTO TABLE 
 bucket_big;
 load data local inpath 
 '/home/navis/apache/oss-hive/data/files/srcsortbucket4outof4.txt' INTO TABLE 
 bucket_big;
 select count(*) FROM bucket_small a JOIN bucket_big b ON a.key + a.key = 
 b.key;
 select /* + MAPJOIN(a) */ count(*) FROM bucket_small a JOIN bucket_big b ON 
 a.key + a.key = b.key;
 returns 116 (same) 
 But with BucketMapJoin or SMBJoin, it returns 61. But this should not be 
 allowed cause hash(a.key) != hash(a.key + a.key). 
 Bucket context should be utilized only with exact matching join expression 
 with sort/cluster key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450191#comment-13450191
 ] 

Zhenxiao Luo commented on HIVE-3442:


CC'd Jakob. So that if there is any AvroSerDe usage error, Jakob's comments and 
suggesions are always welcome.

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column number/types are different 'ml_items_as_avro': Table 
 insclause-0 has 7 columns, but

[jira] [Commented] (HIVE-3388) Improve Performance of UDF PERCENTILE_APPROX()

2012-09-06 Thread Siying Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450192#comment-13450192
 ] 

Siying Dong commented on HIVE-3388:
---

+1

 Improve Performance of UDF PERCENTILE_APPROX()
 --

 Key: HIVE-3388
 URL: https://issues.apache.org/jira/browse/HIVE-3388
 Project: Hive
  Issue Type: Task
Reporter: Rongrong Zhong
Assignee: Rongrong Zhong
Priority: Minor
 Attachments: HIVE-3388.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450198#comment-13450198
 ] 

Jakob Homan commented on HIVE-3442:
---

The docs are out of date (my fault).  schema.url and schema.literal got changed 
to avro.schema.url and avro.schema.literal during the move to Apache, to be 
more specific to Avro.  Try with those.  I'll update the wiki.

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450203#comment-13450203
 ] 

Zhenxiao Luo commented on HIVE-3442:


@Jakob:

Thanks a lot. I tried avro.schema.url, seems still not working:

PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.


 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450202#comment-13450202
 ] 

Jakob Homan commented on HIVE-3442:
---

updated the wiki.

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column number/types are different 'ml_items_as_avro': Table 
 insclause-0 has 7 columns, but query has 22 columns.

--
This message is automatically generated by JIRA.
If you think it was

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450206#comment-13450206
 ] 

Zhenxiao Luo commented on HIVE-3442:


Also tried avro.schema.literal, seems not working:


PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.

@Jakob:

I will trace the code to see what is wrong. Any comments are appreciated.

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450209#comment-13450209
 ] 

Jakob Homan commented on HIVE-3442:
---

bq. 
'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc'
Is this a valid URL? Is it accessible from the metastore?

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column number/types are different 'ml_items_as_avro': Table

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450215#comment-13450215
 ] 

Zhenxiao Luo commented on HIVE-3442:


@Jakob:

Thanks a lot. Get it working with the following valid URL:

DROP TABLE IF EXISTS ml_items_as_avro;
CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='file:${system:test.src.data.dir}/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';

describe ml_items_as_avro;

INSERT OVERWRITE TABLE ml_items_as_avro
  SELECT id, title,
imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
crime,
documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
sci_fi, thriller, war, western
  FROM ml_items;

How about I resolve this as Not A Bug?

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT

[jira] [Updated] (HIVE-3411) Filter predicates on outer join overlapped on single alias is not handled properly


 [ 
https://issues.apache.org/jira/browse/HIVE-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3411:


Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-3381)

 Filter predicates on outer join overlapped on single alias is not handled 
 properly
 --

 Key: HIVE-3411
 URL: https://issues.apache.org/jira/browse/HIVE-3411
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
 Environment: ubuntu 10.10
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3411.1.patch.txt


 Currently, join predicates on outer join are evaluated in join operator (or 
 HashSink for MapJoin) and the result value is tagged to end of each values(as 
 a boolean), which is used for joining values. But when predicates are 
 overlapped on single alias, all the predicates are evaluated with AND 
 conjunction, which makes invalid result. 
 For example with table a with values,
 {noformat}
 100 40
 100 50
 100 60
 {noformat}
 Query below has overlapped predicates on alias b, which is making all the 
 values on b are tagged with true(filtered)
 {noformat}
 select * from a right outer join a b on (a.key=b.key AND a.value=50 AND 
 b.value=50) left outer join a c on (b.key=c.key AND b.value=60 AND 
 c.value=60);
 NULL  NULL100 40  NULLNULL
 NULL  NULL100 50  NULLNULL
 NULL  NULL100 60  NULLNULL
 -- Join predicate
 Join Operator
   condition map:
Right Outer Join0 to 1
Left Outer Join1 to 2
   condition expressions:
 0 {VALUE._col0} {VALUE._col1}
 1 {VALUE._col0} {VALUE._col1}
 2 {VALUE._col0} {VALUE._col1}
   filter predicates:
 0 
 1 {(VALUE._col1 = 50)} {(VALUE._col1 = 60)}
 2 
 {noformat}
 but this should be 
 {noformat}
 NULL  NULL100 40  NULLNULL
 100   50  100 50  NULLNULL
 NULL  NULL100 60  100 60
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450217#comment-13450217
 ] 

Jakob Homan commented on HIVE-3442:
---

Sounds good.  

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column number/types are different 'ml_items_as_avro': Table 
 insclause-0 has 7 columns, but query has 22 columns.

--
This message is automatically generated by JIRA.
If you think it was

[jira] [Commented] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23

2012-09-06 Thread Chris Drome (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450216#comment-13450216
 ] 

Chris Drome commented on HIVE-3437:
---

I'm actively working on this JIRA, but was not able to assign it to myself.

 0.23 compatibility: fix unit tests when building against 0.23
 -

 Key: HIVE-3437
 URL: https://issues.apache.org/jira/browse/HIVE-3437
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.9.1
Reporter: Chris Drome

 Many unit tests fail as a result of building the code against hadoop 0.23. 
 Initial focus will be to fix 0.9.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table


 [ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo resolved HIVE-3442.


Resolution: Not A Problem

Get help from Jakob. It is actually an invalid use of AvroSerDe. Not a bug.

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column number/types are different 'ml_items_as_avro': Table 
 insclause-0 has 7 columns, but query has 22 columns.

--
This message is automatically

[jira] [Created] (HIVE-3443) Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key

2012-09-06 Thread Shreepadma Venugopalan (JIRA)

Shreepadma Venugopalan created HIVE-3443:


 Summary: Hive Metatool should take serde_param_key from the user 
to allow for changes to avro serde's schema url key
 Key: HIVE-3443
 URL: https://issues.apache.org/jira/browse/HIVE-3443
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical


Hive Metatool should take serde_param_key from the user to allow for chanes to 
avro serde's schema url key. In the past avro.schema.url key used to be 
called schema.url. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3443) Hive Metatool should take serde_param_key from the user to allow for changes to avro serde's schema url key

2012-09-06 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450242#comment-13450242
 ] 

Shreepadma Venugopalan commented on HIVE-3443:
--

Support for Hive MetaTool was added in HIVE-3056.

 Hive Metatool should take serde_param_key from the user to allow for changes 
 to avro serde's schema url key
 ---

 Key: HIVE-3443
 URL: https://issues.apache.org/jira/browse/HIVE-3443
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
Priority: Critical

 Hive Metatool should take serde_param_key from the user to allow for chanes 
 to avro serde's schema url key. In the past avro.schema.url key used to be 
 called schema.url. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23


 [ 
https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3437:
---

Assignee: Chris Drome

 0.23 compatibility: fix unit tests when building against 0.23
 -

 Key: HIVE-3437
 URL: https://issues.apache.org/jira/browse/HIVE-3437
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.9.1
Reporter: Chris Drome
Assignee: Chris Drome

 Many unit tests fail as a result of building the code against hadoop 0.23. 
 Initial focus will be to fix 0.9.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3437) 0.23 compatibility: fix unit tests when building against 0.23


[ 
https://issues.apache.org/jira/browse/HIVE-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450295#comment-13450295
 ] 

Ashutosh Chauhan commented on HIVE-3437:


Assigned to Chris. Chris, also added you to contributors list. So, you can 
assign yourself any other jiras.

 0.23 compatibility: fix unit tests when building against 0.23
 -

 Key: HIVE-3437
 URL: https://issues.apache.org/jira/browse/HIVE-3437
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.9.1
Reporter: Chris Drome
Assignee: Chris Drome

 Many unit tests fail as a result of building the code against hadoop 0.23. 
 Initial focus will be to fix 0.9.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)


[ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450302#comment-13450302
 ] 

Ashutosh Chauhan commented on HIVE-3098:


+1 will commit if tests pass.

 Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
 (Must cache UGIs.)
 -

 Key: HIVE-3098
 URL: https://issues.apache.org/jira/browse/HIVE-3098
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.9.0
 Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
 turned on.
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: Hive-3098_(FS_closeAllForUGI()).patch, hive-3098.patch, 
 Hive_3098.patch


 The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
 the Oracle backend).
 The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
 in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
 100 instances of FileSystem, whose combined retained-mem consumed the 
 entire heap.
 It boiled down to hadoop::UserGroupInformation::equals() being implemented 
 such that the Subject member is compared for equality (==), and not 
 equivalence (.equals()). This causes equivalent UGI instances to compare as 
 unequal, and causes a new FileSystem instance to be created and cached.
 The UGI.equals() is so implemented, incidentally, as a fix for yet another 
 problem (HADOOP-6670); so it is unlikely that that implementation can be 
 modified.
 The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
 the Hive metastore), using an cache for UGI instances in the shims.
 I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
 test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3427) Newly added test testCliDriver_metadata_export_drop is consistently failing on trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3427:
---

   Resolution: Fixed
Fix Version/s: 0.10.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Newly added test testCliDriver_metadata_export_drop is consistently failing 
 on trunk
 

 Key: HIVE-3427
 URL: https://issues.apache.org/jira/browse/HIVE-3427
 Project: Hive
  Issue Type: Test
Affects Versions: 0.10.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3427.1.patch.txt, HIVE-3427.2.patch.txt


 I think its a new test which was added via HIVE-3068

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3086) Skewed Join Optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3086:
-

Status: Open  (was: Patch Available)

comments from Kevin

 Skewed Join Optimization
 

 Key: HIVE-3086
 URL: https://issues.apache.org/jira/browse/HIVE-3086
 Project: Hive
  Issue Type: New Feature
Reporter: Nadeem Moidu
Assignee: Namit Jain
 Attachments: hive.3086.1.patch


 During a join operation, if one of the columns has a skewed key, it can cause 
 that particular reducer to become the bottleneck. The following feature will 
 address it:
 https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3444) Support reading columns containing line separator

Navis created HIVE-3444:
---

 Summary: Support reading columns containing line separator
 Key: HIVE-3444
 URL: https://issues.apache.org/jira/browse/HIVE-3444
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial


Currently, LazySimpleSerde cannot handle columns including newline character 
cause hadoop splits rows by newline character. If the overhead counting fields 
by full scan and merging partial lines is tolerable, multi-lined column can be 
reconstructed at runtime.

But with this method, multi-lined column should not be located at the last of 
the row. This is just a idea for HIVE-1898.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3444) Support reading columns containing line separator


 [ 
https://issues.apache.org/jira/browse/HIVE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3444:


Status: Patch Available  (was: Open)

https://reviews.facebook.net/D5277

 Support reading columns containing line separator
 -

 Key: HIVE-3444
 URL: https://issues.apache.org/jira/browse/HIVE-3444
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3444.1.patch.txt


 Currently, LazySimpleSerde cannot handle columns including newline character 
 cause hadoop splits rows by newline character. If the overhead counting 
 fields by full scan and merging partial lines is tolerable, multi-lined 
 column can be reconstructed at runtime.
 But with this method, multi-lined column should not be located at the last of 
 the row. This is just a idea for HIVE-1898.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3444) Support reading columns containing line separator


 [ 
https://issues.apache.org/jira/browse/HIVE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3444:


Attachment: HIVE-3444.1.patch.txt

 Support reading columns containing line separator
 -

 Key: HIVE-3444
 URL: https://issues.apache.org/jira/browse/HIVE-3444
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3444.1.patch.txt


 Currently, LazySimpleSerde cannot handle columns including newline character 
 cause hadoop splits rows by newline character. If the overhead counting 
 fields by full scan and merging partial lines is tolerable, multi-lined 
 column can be reconstructed at runtime.
 But with this method, multi-lined column should not be located at the last of 
 the row. This is just a idea for HIVE-1898.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3086) Skewed Join Optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3086:
-

Status: Patch Available  (was: Open)

addressed comments

 Skewed Join Optimization
 

 Key: HIVE-3086
 URL: https://issues.apache.org/jira/browse/HIVE-3086
 Project: Hive
  Issue Type: New Feature
Reporter: Nadeem Moidu
Assignee: Namit Jain
 Attachments: hive.3086.1.patch, hive.3086.2.patch


 During a join operation, if one of the columns has a skewed key, it can cause 
 that particular reducer to become the bottleneck. The following feature will 
 address it:
 https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3086) Skewed Join Optimization