[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546661#comment-13546661
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed 
down to map stage".

  Sorry, my earlier comments were assuming that the threshold is for number of 
rows

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:483 Coming to a 
earlier comment from Sivaramakrishnan Narayanan, would it be simpler if this 
was the number of rows ?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:414 Define 
40 as a constant somewhere

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


> Some limit can be pushed down to map stage
> --
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
>  Issue Type: Bug
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546649#comment-13546649
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed 
down to map stage".

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:75 
remove the TODO
  ql/src/test/queries/clientpositive/limit_pushdown.q:51 There is no test where 
the limit is > hive.limit.pushdown.heap.threshold.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:87 
Do you want to compare the threshold with the actual limit here ?


REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


> Some limit can be pushed down to map stage
> --
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
>  Issue Type: Bug
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3562:
-

Status: Open  (was: Patch Available)

comments

> Some limit can be pushed down to map stage
> --
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
>  Issue Type: Bug
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546643#comment-13546643
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision "HIVE-3562 [jira] Some limit can be pushed 
down to map stage".

INLINE COMMENTS
  conf/hive-default.xml.template:1434 Can you add more details here - a example 
query would really help ?
  ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special 
about 40 ?

  set hive.limit.pushdown.heap.threshold explicitly at the beginning of the 
test, makes the
  test easier to maintain in the long run.

  ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference 
between this and line 3 ?

  ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is 
not correct.

  Let us say, the values are
  v1
  v2
  ..
  v10
  v11
  v12
  ..
  v20

  The first mapper does not have v8-10, so it emits v1-v7, v11-v13
  The second mapper contains data for all values, but it only emits v1-v10

  Since it does not involves a order by, it is possible that the data for v11 
will get picked up, which does not contain data from the second mapper. If you 
are pushing the limit up, you should create an additional MR job which orders 
the rows - in the above example, making sure that only v1-v10 are picked up.

  Am I missing something here ?

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


> Some limit can be pushed down to map stage
> --
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
>  Issue Type: Bug
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546631#comment-13546631
 ] 

Phabricator commented on HIVE-3853:
---

njain has accepted the revision "HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD".

REVISION DETAIL
  https://reviews.facebook.net/D7767

BRANCH
  DPAL-1956

To: JIRA, njain, navis
Cc: njain


> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546629#comment-13546629
 ] 

Namit Jain commented on HIVE-3853:
--

+1

> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3825) Add Operator level Hooks

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546616#comment-13546616
 ] 

Namit Jain commented on HIVE-3825:
--

Look at optrstat_groupby.q for an example.

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.9.patch

> explain dependency should show the dependencies hierarchically in presence of 
> views
> ---
>
> Key: HIVE-3803
> URL: https://issues.apache.org/jira/browse/HIVE-3803
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
> hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, 
> hive.3803.8.patch, hive.3803.9.patch
>
>
> It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3871) show number of mappers/reducers as part of explain extended

2013-01-07 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3871:


 Summary: show number of mappers/reducers as part of explain 
extended
 Key: HIVE-3871
 URL: https://issues.apache.org/jira/browse/HIVE-3871
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain


It would be useful to show the number of mappers/reducers as part of explain 
extended.
For the MR jobs referencing intermediate data, the number can be approximate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.8.patch

> explain dependency should show the dependencies hierarchically in presence of 
> views
> ---
>
> Key: HIVE-3803
> URL: https://issues.apache.org/jira/browse/HIVE-3803
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
> hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, 
> hive.3803.8.patch
>
>
> It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546609#comment-13546609
 ] 

Namit Jain commented on HIVE-3585:
--

The main reason that contrib exists is to add new features/projects which are 
being tested, may take some time to
mature, and are reasonably stand-alone, so that they dont need many changes in 
existing code. New serdes/fileformats/udfs
are good usecases for them.

I dont see why is testing/development in contrib so difficult or different as 
compared to development in any other component.
This is the reason why contrib was added, so new stand-alone components can 
bake. We can definitely move it from contrib, once
it is mature/safe.

Why is development in contrib such a bad idea ?

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3853:


Status: Patch Available  (was: Open)

> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546578#comment-13546578
 ] 

Phabricator commented on HIVE-3853:
---

navis has commented on the revision "HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD".

  I've heard annotation information is a part of class definition, which cannot 
be overwritten in runtime.

REVISION DETAIL
  https://reviews.facebook.net/D7767

To: JIRA, navis
Cc: njain


> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3853:
--

Attachment: HIVE-3853.D7767.2.patch

navis updated the revision "HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD".
Reviewers: JIRA

  Addressed comments


REVISION DETAIL
  https://reviews.facebook.net/D7767

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimeStamp.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnixTimeStamp.java
  ql/src/test/queries/clientpositive/udf_to_unix_timestamp.q
  ql/src/test/queries/clientpositive/udf_unix_timestamp.q
  ql/src/test/results/clientpositive/show_functions.q.out
  ql/src/test/results/clientpositive/udf5.q.out
  ql/src/test/results/clientpositive/udf_to_unix_timestamp.q.out
  ql/src/test/results/clientpositive/udf_unix_timestamp.q.out

To: JIRA, navis
Cc: njain


> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546535#comment-13546535
 ] 

Ashutosh Chauhan commented on HIVE-3789:


+1

> Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
> 
>
> Key: HIVE-3789
> URL: https://issues.apache.org/jira/browse/HIVE-3789
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 0.9.0, 0.10.0
> Environment: Hadooop 0.23.5, JDK 1.6.0_31
>Reporter: Chris Drome
>Assignee: Arup Malakar
> Attachments: HIVE-3789.branch-0.9_1.patch, 
> HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch
>
>
> Rolling back to before this patch shows that the unit tests are passing, 
> after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546517#comment-13546517
 ] 

Russell Jurney commented on HIVE-3585:
--

This ticket now has 5 votes, and 22 watchers. Support for a Trevni builtin is 
overwhelming.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546516#comment-13546516
 ] 

Russell Jurney commented on HIVE-3585:
--

He, HCatalog uses Hive Serde. By adding the Trevni builtin for Apache Hive, 
Apache Hive, Shark, Apache HCatalog and Apache Pig will all get Trevni support. 
Synergy, baby!

Apache Trevni is part of an actual Apache top-level project, Apache Avro, so it 
is nothing like Zebra, which I notice you reported yourself for addition in 
HIVE-781. Avro and Trevni are specifically designed for Hadoop workloads, and 
other tools like Pig are including Trevni immediately.


> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Blue tables in Hive xdocs

2013-01-07 Thread Lefty Leverenz
Tables in Hive xdocs have a default background color that's rather
overpowering (see "Hive interactive shell commands" in
http://hive.apache.org/docs/r0.9.0/language_manual/cli.html).  I'm working
on a new doc that has lots of tables, so I tried to change the color to
white (or any quieter color) but had no luck.

Is this an Anakia issue, or Velocity?  Does anyone know how to set the
color either cell-by-cell or for the whole table?

Thanks for any help or pointers to help.


– Lefty Leverenz


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546400#comment-13546400
 ] 

Carl Steinbach commented on HIVE-3585:
--

The only concrete difference between core and contrib that I'm aware of is that 
the latter doesn't appear on Hive's classpath by default. As such I can only 
see two advantages to putting code in contrib: 1) it makes it harder for folks 
to use, and 2) it makes it harder for us to test. Did I miss anything?

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang edited comment on HIVE-3585 at 1/7/13 10:40 PM:
-

contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

>From http://incubator.apache.org/hcatalog/docs/r0.4.0/, another good place i 
>can think of is the hcatalog project. But i don't know if hcatalog itself 
>includes custom data format support or not.

  was (Author: he yongqiang):
contrib is a good place for any projects that is not mature. There are so 
many custom data formats out there, it does not make sense to support all of 
them in core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  

  
> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang commented on HIVE-3585:


contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  


> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-3789:
---

Attachment: HIVE-3789.branch-0.9_2.patch
HIVE-3789.trunk.2.patch

Patch with reverted checkPath()

> Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
> 
>
> Key: HIVE-3789
> URL: https://issues.apache.org/jira/browse/HIVE-3789
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 0.9.0, 0.10.0
> Environment: Hadooop 0.23.5, JDK 1.6.0_31
>Reporter: Chris Drome
>Assignee: Arup Malakar
> Attachments: HIVE-3789.branch-0.9_1.patch, 
> HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch
>
>
> Rolling back to before this patch shows that the unit tests are passing, 
> after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546285#comment-13546285
 ] 

Arup Malakar commented on HIVE-3789:


Hi Ashutosh, you are right. My concern was that checkPath() should look for 
pfile:// scheme in the path that is passed. It  

For the test cases to pass adding resolvePath() is sufficient. I will submit a 
patch without the modification in checkPath().

> Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
> 
>
> Key: HIVE-3789
> URL: https://issues.apache.org/jira/browse/HIVE-3789
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Tests
>Affects Versions: 0.9.0, 0.10.0
> Environment: Hadooop 0.23.5, JDK 1.6.0_31
>Reporter: Chris Drome
>Assignee: Arup Malakar
> Attachments: HIVE-3789.branch-0.9_1.patch, HIVE-3789.trunk.1.patch
>
>
> Rolling back to before this patch shows that the unit tests are passing, 
> after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-07 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546256#comment-13546256
 ] 

Mark Grover commented on HIVE-2693:
---

Non-committer +1

Namit, any thoughts on the UDF method selection logic?

> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-all.patch, 
> HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
> HIVE-2693-take3.patch, HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546230#comment-13546230
 ] 

Gang Tim Liu commented on HIVE-3773:


thank you for great point.

Yes, it can. In addition, it can solve much complexer queries like join and 
will bring other benefits.

This issue is targeted to solve the simple use case in a simple way. It will 
benefit general purpose including the use case where configuration of 2206 is 
not turned on. 

> Share input scan by unions across multiple queries
> --
>
> Key: HIVE-3773
> URL: https://issues.apache.org/jira/browse/HIVE-3773
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> Consider a query like:
> select * from
> (
>   select key, 1 as value, count(1) from src group by key
> union all
>   select 1 as key, value, count(1) from src group by value
> union all
>   select key, value, count(1) from src group by key, value
> ) s;
> src is scanned multiple times currently (one per sub-query).
> This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546210#comment-13546210
 ] 

Carl Steinbach commented on HIVE-3585:
--

bq. HBaseSerde is first added to contrib and then moved to core later.

And what did this accomplish? Wouldn't it have been better to put it in core to 
begin with? In fact, can anyone tell me why we shouldn't abolish contrib 
altogether?

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546170#comment-13546170
 ] 

Sean Busbey commented on HIVE-3585:
---

[~namita] Trevni defines a columnar format that can be used with different 
serialization systems. I believe initial efforts across different components 
are planning to use Avro for serialization.

Eventually, Trevni support should also work for Thrift and Protobufs.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546152#comment-13546152
 ] 

He Yongqiang commented on HIVE-3585:


HBaseSerde is first added to contrib and then moved to core later.
  
bq. Pig is adding TrevniStorage as a builtin, and interoperability is desired.
I think interoperability is not a problem no matter where the code residents.

> Integrate Trevni as another columnar oriented file format
> -
>
> Key: HIVE-3585
> URL: https://issues.apache.org/jira/browse/HIVE-3585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: alex gemini
>Assignee: Mark Wagner
>Priority: Minor
>
> add new avro module trevni as another columnar format.New columnar format 
> need a columnar SerDe,seems fastutil is a good choice.the shark project use 
> fastutil library as columnar serde library but it seems too large (almost 
> 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-07 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546150#comment-13546150
 ] 

Yin Huai commented on HIVE-2206:


[~liuzongquan] The latest patch was developed based on hive trunk revision 
1410581.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546139#comment-13546139
 ] 

Ashutosh Chauhan commented on HIVE-3773:


Isn't this already implemented in HIVE-2206 ?

> Share input scan by unions across multiple queries
> --
>
> Key: HIVE-3773
> URL: https://issues.apache.org/jira/browse/HIVE-3773
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> Consider a query like:
> select * from
> (
>   select key, 1 as value, count(1) from src group by key
> union all
>   select 1 as key, value, count(1) from src group by value
> union all
>   select key, value, count(1) from src group by key, value
> ) s;
> src is scanned multiple times currently (one per sub-query).
> This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3773 started by Gang Tim Liu.

> Share input scan by unions across multiple queries
> --
>
> Key: HIVE-3773
> URL: https://issues.apache.org/jira/browse/HIVE-3773
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Gang Tim Liu
>
> Consider a query like:
> select * from
> (
>   select key, 1 as value, count(1) from src group by key
> union all
>   select 1 as key, value, count(1) from src group by value
> union all
>   select key, value, count(1) from src group by key, value
> ) s;
> src is scanned multiple times currently (one per sub-query).
> This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546111#comment-13546111
 ] 

Hudson commented on HIVE-3431:
--

Integrated in Hive-trunk-h0.21 #1900 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1900/])
HIVE-3431 : Avoid race conditions while downloading resources from 
non-local filesystem (Navis via Ashutosh Chauhan) (Revision 1429916)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429916
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java


> Avoid race conditions while downloading resources from non-local filesystem
> ---
>
> Key: HIVE-3431
> URL: https://issues.apache.org/jira/browse/HIVE-3431
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 0.10.0
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
> HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch
>
>
> "add resource " command downloads the resource file to location 
> specified by conf "hive.downloaded.resources.dir" in local file system. But 
> when the command above is executed concurrently to hive-server for same file, 
> some client fails by VM crash, which is caused by overwritten file by other 
> requests.
> So there should be a configuration to provide per request location for add 
> resource command, something like "set 
> hiveconf:hive.downloaded.resources.dir=temporary"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1900 - Still Failing

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1899

Changes for Build #1900
[hashutosh] HIVE-3431 : Avoid race conditions while downloading resources from 
non-local filesystem (Navis via Ashutosh Chauhan)




No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1900)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1900/ to 
view the results.

[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546100#comment-13546100
 ] 

Edward Capriolo commented on HIVE-2935:
---

Yes you should not have two versions of thrift in your cp or stuff like this 
can happen even though the data may be wire compatible the libs usually are not.



> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HS2-changed-files-only.patch, HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3870.


Resolution: Duplicate

Dupe of HIVE-3869

> SELECT foo, NULL UNION ALL SELECT bar, baz fails
> 
>
> Key: HIVE-3870
> URL: https://issues.apache.org/jira/browse/HIVE-3870
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: David Morel
>
> In order to avoid the curse of the last reducer by using a left outer join 
> where most joined rows woudl be NULLs, I rewrote the query as:
> {code}
> SELECT * FROM (
> SELECT
> A.user_id id,
> B.created
> FROM (
> SELECT DISTINCT user_id
> FROM users
> ) A
> JOIN
> buyhist B
> ON
> A.user_id = B.user_id
> AND B.created >= '2013-01-01'
> UNION ALL
> SELECT
> DISTINCT(user_id) id,
> NULL created
> FROM users
> ) foo;
> {code}
> The expection thrown is this:
> {code}
> 2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapred.Child.main(Child.java:264)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
>   ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
>   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>   ... 14 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
>   ... 17 more
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
>   ... 22 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
>   at java.lang.String.valueOf(String.java:2826)
>   at java.lang.StringBuilder.append(StringBuilder.java:115)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
>   ... 22 more
> {code}
> The 
> "org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)"
>  caught my attention, so I replaced NULL by an empty string:
> {code}
> ...
> UNION ALL
> SELECT
> DISTINCT(user_id) id,
> '' created
> {code}
> Shouldn't the query parser accept the form using NULL, or at least output a 
> message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information 

[jira] [Resolved] (HIVE-3697) External JAR files on HDFS can lead to race condition with hive.downloaded.resources.dir

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3697.


   Resolution: Fixed
Fix Version/s: 0.11.0

HIVE-3431 should fix this issue. Please reopen if you find otherwise.

> External JAR files on HDFS can lead to race condition with 
> hive.downloaded.resources.dir
> 
>
> Key: HIVE-3697
> URL: https://issues.apache.org/jira/browse/HIVE-3697
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Chris McConnell
> Fix For: 0.11.0
>
>
> I've seen situations where utilizing JAR files on HDFS can cause job failures 
> via CNFE or JVM crashes. 
> This is difficult to replicate, seems to be related to JAR size, latency 
> between client and HDFS cluster, but I've got some example stack traces 
> below. Seems that the calls made to FileSystem (copyToLocal) which are static 
> and will be executed to delete the current local copy can cause the file(s) 
> to be removed during job processing.
> We should consider changing the default for hive.downloaded.resources.dir to 
> include some level of uniqueness per job. We should not consider 
> hive.session.id however, as execution of multiple statements via the same 
> user/session which might access the same JAR files will utilize the same 
> session.
> A proposal might be to utilize System.nanoTime() -- which might be enough to 
> avoid the issue, although it's not perfect (depends on JVM and system for 
> level of precision) as part of the default 
> (/tmp/${user.name}/resources/System.nanoTime()/). 
> If anyone else has hit this, would like to capture environment information as 
> well. Perhaps there is something else at play here. 
> Here are some examples of the errors:
> for i in {0..2}; do hive -S -f query.q& done
> [2] 48405
> [3] 48406
> [4] 48407
> % #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0x7) at pc=0x7fb10bd931f0, pid=48407, tid=140398456698624
> #
> # JRE version: 6.0_31-b04
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [libzip.so+0xb1f0]  __int128+0x60
> #
> # An error report file with more information is saved as:
> # /home/.../hs_err_pid48407.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> java.lang.NoClassDefFoundError: com/example/udf/Lower
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:485)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.ClassNotFoundException: com.example.udf.Lower
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClas

[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3431:
---

   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

> Avoid race conditions while downloading resources from non-local filesystem
> ---
>
> Key: HIVE-3431
> URL: https://issues.apache.org/jira/browse/HIVE-3431
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 0.10.0
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
> HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch
>
>
> "add resource " command downloads the resource file to location 
> specified by conf "hive.downloaded.resources.dir" in local file system. But 
> when the command above is executed concurrently to hive-server for same file, 
> some client fails by VM crash, which is caused by overwritten file by other 
> requests.
> So there should be a configuration to provide per request location for add 
> resource command, something like "set 
> hiveconf:hive.downloaded.resources.dir=temporary"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3431:
---

Summary: Avoid race conditions while downloading resources from non-local 
filesystem  (was: Resources on non-local file system should be downloaded to 
temporary directory sometimes)

> Avoid race conditions while downloading resources from non-local filesystem
> ---
>
> Key: HIVE-3431
> URL: https://issues.apache.org/jira/browse/HIVE-3431
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 0.10.0
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
> HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch
>
>
> "add resource " command downloads the resource file to location 
> specified by conf "hive.downloaded.resources.dir" in local file system. But 
> when the command above is executed concurrently to hive-server for same file, 
> some client fails by VM crash, which is caused by overwritten file by other 
> requests.
> So there should be a configuration to provide per request location for add 
> resource command, something like "set 
> hiveconf:hive.downloaded.resources.dir=temporary"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3869) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread David Morel (JIRA)
David Morel created HIVE-3869:
-

 Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails
 Key: HIVE-3869
 URL: https://issues.apache.org/jira/browse/HIVE-3869
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: David Morel


In order to avoid the curse of the last reducer by using a left outer join 
where most joined rows woudl be NULLs, I rewrote the query as:
{code}

SELECT * FROM (
SELECT
A.user_id id,
B.created
FROM (
SELECT DISTINCT user_id
FROM users
) A
JOIN
buyhist B
ON
A.user_id = B.user_id
AND B.created >= '2013-01-01'
UNION ALL
SELECT
DISTINCT(user_id) id,
NULL created
FROM users
) foo;
{code}

The expection thrown is this:

{code}
2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 22 more
{code}

The 
"org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)"
 caught my attention, so I replaced NULL by an empty string:

{code}
...
UNION ALL
SELECT
DISTINCT(user_id) id,
'' created
{code}

Shouldn't the query parser accept the form using NULL, or at least output a 
message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread David Morel (JIRA)
David Morel created HIVE-3870:
-

 Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails
 Key: HIVE-3870
 URL: https://issues.apache.org/jira/browse/HIVE-3870
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: David Morel


In order to avoid the curse of the last reducer by using a left outer join 
where most joined rows woudl be NULLs, I rewrote the query as:
{code}

SELECT * FROM (
SELECT
A.user_id id,
B.created
FROM (
SELECT DISTINCT user_id
FROM users
) A
JOIN
buyhist B
ON
A.user_id = B.user_id
AND B.created >= '2013-01-01'
UNION ALL
SELECT
DISTINCT(user_id) id,
NULL created
FROM users
) foo;
{code}

The expection thrown is this:

{code}
2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 22 more
{code}

The 
"org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)"
 caught my attention, so I replaced NULL by an empty string:

{code}
...
UNION ALL
SELECT
DISTINCT(user_id) id,
'' created
{code}

Shouldn't the query parser accept the form using NULL, or at least output a 
message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #253

2013-01-07 Thread Apache Jenkins Server
See 


--
[...truncated 9916 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/serde/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml
[ivy:report] Processing 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml
 to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-service-default.html

ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/service/test/classes

test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml
[ivy:report] Processing 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java
 against hadoop 1.0.0 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-1.0.0)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java
 against hadoop 0.23.3 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.23.3)

ivy-init-settings:
 [echo] Project: shim

Hive-trunk-h0.21 - Build # 1899 - Failure

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1899



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1899)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1899/ to 
view the results.

Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #27

2013-01-07 Thread Apache Jenkins Server
See 

--
[...truncated 8145 lines...]
 [echo] Project: common

create-dirs:
 [echo] Project: serde
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: serde

create-dirs:
 [echo] Project: metastore
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: metastore

create-dirs:
 [echo] Project: ql

init:
 [echo] Project: ql

create-dirs:
 [echo] Project: contrib
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: contrib

create-dirs:
 [echo] Project: service
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: service

create-dirs:
 [echo] Project: cli
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: cli

create-dirs:
 [echo] Project: jdbc
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: jdbc

create-dirs:
 [echo] Project: hwi
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: hwi

create-dirs:
 [echo] Project: hbase-handler
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: hbase-handler

create-dirs:
 [echo] Project: pdk
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: pdk

create-dirs:
 [echo] Project: builtins
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: builtins

jar:
 [echo] Project: hive

create-dirs:
 [echo] Project: shims
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build-shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.20.2 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build-shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 1.0.0 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build-shims:
 [echo] Project: shims
 [echo] Compiling 


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-07 Thread Liu Zongquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545984#comment-13545984
 ] 

Liu Zongquan commented on HIVE-2206:


If I plan to merge HIVE-2206 into the hive source code, which branch should I 
use? Can someone tell me?

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545973#comment-13545973
 ] 

Nicolas Fouché commented on HIVE-2935:
--

Using CDH 4.1.2, which includes this patch. I think there's a problem with 
hive-jdbc which includes a JDBC driver for the two version of hiveserver.

For the first version of hiveserver, hive-jdbc-0.9.0-cdh4.1.2 depends on 
libthrift-1.5.0, which defines org.apache.thrift.TServiceClient as an Interface.

For hiveserver2, hive-jdbc-0.9.0-cdh4.1.2 depends on 
hive-service-0.9.0-cdh4.1.2, which depends on hive-service-0.9.0-cdh4.1.2. The 
later seems to include code from libthrift, and defines 
org.apache.thrift.TServiceClient as an abstract class.

Thus this happens:

java.lang.IncompatibleClassChangeError: class 
org.apache.hive.service.cli.thrift.TCLIService$Client has interface 
org.apache.thrift.TServiceClient as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(Unknown Source)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$000(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:157)
at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:96)

Of course, I just have to remove libthrift from my libpath. But I just wanted 
to make Carl Steinbach know. (I used maven-dependency-plugin to get all 
dependent JARs, without thinking about what would be useless, or incompatible)

> Implement HiveServer2
> -
>
> Key: HIVE-2935
> URL: https://issues.apache.org/jira/browse/HIVE-2935
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>  Labels: HiveServer2
> Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
> HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
> HS2-changed-files-only.patch, HS2-with-thrift-patch-rebased.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3842) Remove redundant test codes

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545875#comment-13545875
 ] 

Hudson commented on HIVE-3842:
--

Integrated in Hive-trunk-h0.21 #1898 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1898/])
HIVE-3842 Remove redundant test codes
(Navis via namit) (Revision 1429682)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429682
Files : 
* /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
* /hive/trunk/hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestNegativeCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestParse.vm
* /hive/trunk/ql/src/test/templates/TestParseNegative.vm


> Remove redundant test codes
> ---
>
> Key: HIVE-3842
> URL: https://issues.apache.org/jira/browse/HIVE-3842
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.11.0
>
> Attachments: HIVE-3842.D7773.1.patch
>
>
> Currently hive writes same test code again and again for each test, making 
> test class huge (50k line for ql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3300) LOAD DATA INPATH fails if a hdfs file with same name is added to table

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545874#comment-13545874
 ] 

Hudson commented on HIVE-3300:
--

Integrated in Hive-trunk-h0.21 #1898 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1898/])
HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added to 
table
(Navis via namit) (Revision 1429686)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1429686
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* /hive/trunk/ql/src/test/queries/clientpositive/load_fs2.q
* /hive/trunk/ql/src/test/results/clientpositive/load_fs2.q.out


> LOAD DATA INPATH fails if a hdfs file with same name is added to table
> --
>
> Key: HIVE-3300
> URL: https://issues.apache.org/jira/browse/HIVE-3300
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 0.10.0
> Environment: ubuntu linux, hadoop 1.0.3, hive 0.9
>Reporter: Bejoy KS
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3300.1.patch.txt, HIVE-3300.D4383.3.patch, 
> HIVE-3300.D4383.4.patch
>
>
> If we are loading data from local fs to hive tables using 'LOAD DATA LOCAL 
> INPATH' and if a file with the same name exists in the table's location then 
> the new file will be suffixed by *_copy_1.
> But if we do the 'LOAD DATA INPATH'  for a file in hdfs then there is no 
> rename happening but just a move task is getting triggered. Since a file with 
> same name exists in same hdfs location, hadoop fs move operation throws an 
> error.
> hive> LOAD DATA INPATH '/userdata/bejoy/site.txt' INTO TABLE test.site;
> Loading data to table test.site
> Failed with exception null
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> hive> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1898 - Fixed

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1896

Changes for Build #1897

Changes for Build #1898
[namit] HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added 
to table
(Navis via namit)

[namit] HIVE-3842 Remove redundant test codes
(Navis via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1898)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1898/ to 
view the results.

[jira] [Commented] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545860#comment-13545860
 ] 

binlijin commented on HIVE-3868:


The reason is:
We use HBase's Bytes to convert long and other data type to byte data and store 
in hbase.
Then use hive to analysis the data in hbase.

> Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
> ---
>
> Key: HIVE-3868
> URL: https://issues.apache.org/jira/browse/HIVE-3868
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.9.0
>Reporter: binlijin
>
> In LazyHBaseRow,
> {code}
>   private Object uncheckedGetField(int fieldID) {
>   // it is a column i.e. a column-family with column-qualifier
>   byte [] res = result.getValue(colMap.familyNameBytes, 
> colMap.qualifierNameBytes);
>   if (res == null) {
> return null;
>   } else {
> ref = new ByteArrayRef();
> ref.setData(res);
>   }
>   if (ref != null) {
> fields[fieldID].init(ref, 0, ref.getData().length);
>   }
>   }
>   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
> data (Long), 
>   it will use LazyLong to parse this data and will return NULL value, 
>   it should use Bytes.toLong(res.getData()) to parse this byte data
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Description: 
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), 
  it will use LazyLong to parse this data and will return NULL value, 
  it should use Bytes.toLong(res.getData()) to parse this byte data
{code}

  was:
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), it will use LazyLong to parse this data and will return NULL value, it 
should use Bytes.toLong(res.getData()) to parse this byte data
{code}


> Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
> ---
>
> Key: HIVE-3868
> URL: https://issues.apache.org/jira/browse/HIVE-3868
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.9.0
>Reporter: binlijin
>
> In LazyHBaseRow,
> {code}
>   private Object uncheckedGetField(int fieldID) {
>   // it is a column i.e. a column-family with column-qualifier
>   byte [] res = result.getValue(colMap.familyNameBytes, 
> colMap.qualifierNameBytes);
>   if (res == null) {
> return null;
>   } else {
> ref = new ByteArrayRef();
> ref.setData(res);
>   }
>   if (ref != null) {
> fields[fieldID].init(ref, 0, ref.getData().length);
>   }
>   }
>   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
> data (Long), 
>   it will use LazyLong to parse this data and will return NULL value, 
>   it should use Bytes.toLong(res.getData()) to parse this byte data
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Description: 
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), it will use LazyLong to parse this data and will return NULL value, it 
should use Bytes.toLong(res.getData()) to parse this byte data
{code}

  was:
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }

{code}


> Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
> ---
>
> Key: HIVE-3868
> URL: https://issues.apache.org/jira/browse/HIVE-3868
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.9.0
>Reporter: binlijin
>
> In LazyHBaseRow,
> {code}
>   private Object uncheckedGetField(int fieldID) {
>   // it is a column i.e. a column-family with column-qualifier
>   byte [] res = result.getValue(colMap.familyNameBytes, 
> colMap.qualifierNameBytes);
>   if (res == null) {
> return null;
>   } else {
> ref = new ByteArrayRef();
> ref.setData(res);
>   }
>   if (ref != null) {
> fields[fieldID].init(ref, 0, ref.getData().length);
>   }
>   }
>   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
> data (Long), it will use LazyLong to parse this data and will return NULL 
> value, it should use Bytes.toLong(res.getData()) to parse this byte data
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and

2013-01-07 Thread binlijin (JIRA)
binlijin created HIVE-3868:
--

 Summary: Use Hive‘s serde to parse HBase’s byte Data in 
LazyHBaseRow and
 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin


In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Summary: Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow  (was: 
Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and)

> Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
> ---
>
> Key: HIVE-3868
> URL: https://issues.apache.org/jira/browse/HIVE-3868
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.9.0
>Reporter: binlijin
>
> In LazyHBaseRow,
> {code}
>   private Object uncheckedGetField(int fieldID) {
>   // it is a column i.e. a column-family with column-qualifier
>   byte [] res = result.getValue(colMap.familyNameBytes, 
> colMap.qualifierNameBytes);
>   if (res == null) {
> return null;
>   } else {
> ref = new ByteArrayRef();
> ref.setData(res);
>   }
>   if (ref != null) {
> fields[fieldID].init(ref, 0, ref.getData().length);
>   }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3852:
-

Status: Open  (was: Patch Available)

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545819#comment-13545819
 ] 

Namit Jain commented on HIVE-3852:
--

[~navis], I had a higher level question.
Should we have this optimization now ?
I mean, is this really needed with map-side aggregates, or can we remove this 
code completely ?

> Multi-groupby optimization fails when same distinct column is used twice or 
> more
> 
>
> Key: HIVE-3852
> URL: https://issues.apache.org/jira/browse/HIVE-3852
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3852.D7737.1.patch
>
>
> {code}
> FROM INPUT
> INSERT OVERWRITE TABLE dest1 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key
> INSERT OVERWRITE TABLE dest2 
> SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
> substr(INPUT.value,5)) GROUP BY INPUT.key;
> {code}
> fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.7.patch

> explain dependency should show the dependencies hierarchically in presence of 
> views
> ---
>
> Key: HIVE-3803
> URL: https://issues.apache.org/jira/browse/HIVE-3803
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
> hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch
>
>
> It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3853:
-

Status: Open  (was: Patch Available)

comments

> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545816#comment-13545816
 ] 

Phabricator commented on HIVE-3853:
---

njain has commented on the revision "HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD".

  This calls for deterministic not being an annotation -
  by any chance, do you know if the annotation can be overwritten dynamically --
  otherwise duplicate function is OK

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimestamp.java:52
 Can you share the code between this and unix_timestamp ?

  I mean create a common class,and both the functions can extend that.

REVISION DETAIL
  https://reviews.facebook.net/D7767

To: JIRA, navis
Cc: njain


> UDF unix_timestamp is deterministic if an argument is given, but it treated 
> as non-deterministic preventing PPD
> ---
>
> Key: HIVE-3853
> URL: https://issues.apache.org/jira/browse/HIVE-3853
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>  Labels: udf
> Attachments: HIVE-3853.D7767.1.patch
>
>
> unix_timestamp is declared as a non-deterministic function. But if user 
> provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

Status: Open  (was: Patch Available)

A lot of tests are failing - can you debug ?

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira