[jira] [Commented] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository

2015-02-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317581#comment-14317581
 ] 

Edward Capriolo commented on HIVE-9664:
---

Just so you know. You can use a groovy for writing UDFs currently, and groovy 
has some @GRAB support for chasing down dependencies. Check hives COMPILE syntax

 Hive add jar command should be able to download and add jars from a 
 repository
 

 Key: HIVE-9664
 URL: https://issues.apache.org/jira/browse/HIVE-9664
 Project: Hive
  Issue Type: Improvement
Reporter: Anant Nag
  Labels: hive

 Currently Hive's add jar command takes a local path to the dependency jar. 
 This clutters the local file-system as users may forget to remove this jar 
 later
 It would be nice if Hive supported a Gradle like notation to download the jar 
 from a repository.
 Example:  add jar org:module:version
 
 It should also be backward compatible and should take jar from the local 
 file-system as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8614) Upgrade hive to use tez version 0.5.2-SNAPSHOT

2015-02-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312431#comment-14312431
 ] 

Edward Capriolo commented on HIVE-8614:
---

This is bad. Hive should not depend on SNAPSHOT releases.  Why do we keep doing 
this?

 Upgrade hive to use tez version 0.5.2-SNAPSHOT
 --

 Key: HIVE-8614
 URL: https://issues.apache.org/jira/browse/HIVE-8614
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8614.1.patch, HIVE-8614.2.patch, HIVE-8614.3.patch, 
 HIVE-8614.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly

2014-11-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218906#comment-14218906
 ] 

Edward Capriolo commented on HIVE-8848:
---

Nulls are supposed to be stored as a literal \N

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly

2014-11-19 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-8848:
--
Status: Open  (was: Patch Available)

You can re-open if I am wrong but in TextInputFormats null is '\N' I think this 
is defined in LazySimpleSerde

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly

2014-11-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218908#comment-14218908
 ] 

Edward Capriolo edited comment on HIVE-8848 at 11/20/14 2:27 AM:
-

You can re-submit and merge if I am wrong but in TextInputFormats null is '\N' 
I think this is defined in LazySimpleSerde


was (Author: appodictic):
You can re-open if I am wrong but in TextInputFormats null is '\N' I think this 
is defined in LazySimpleSerde

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-11-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207314#comment-14207314
 ] 

Edward Capriolo commented on HIVE-5538:
---

I do not like the idea of turning on vectorize by default until we have a way 
to test both code paths, and am -1 until this is addressed. 

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Matt McCline
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch, 
 HIVE-5538.61.patch, HIVE-5538.62.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1434) Cassandra Storage Handler

2014-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098623#comment-14098623
 ] 

Edward Capriolo commented on HIVE-1434:
---

There is going to be no jira. I am doing the code here 
https://github.com/edwardcapriolo/hive-cassandra-ng/blob/master/src/main/java/io/teknek/hive/cassandra/CassandraSerde.java
 

Please do not share this link. I have not had time to commit the licence file 
yet and I would not want it to end up into 50 others peoples github again. 

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, 
 cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, 
 hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, 
 hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, 
 hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-08-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088074#comment-14088074
 ] 

Edward Capriolo commented on HIVE-5538:
---

{quote}If so, one possibility is to turn it on only for unit tests {quote}
I would not suggest this. We would be saying, Hive 0.15 is tested and ready 
for release! A user would download and use hive 0.15 and if they found a bug 
the reason would be because we are not actually testing the code we shipped. 

Unless we plan on removing the non-vectorized code path we have to test it. To 
do that we need the answer to some important questions:
* Is vector ALWAYS better/faster?
* Can vector capable of EVERYTHING non vector can not do?

Until we can answer yes to both of the above points, we can not remove the 
non-vectorized code paths. Until we remove the non-vectorized code paths we 
have to test them.

As I said above I think we need a stanza at the top of the Q files that defines 
permutations of testing parameters. 

--testwith vectorized+mr, vectorized+tez, !vectorized+mr

--testwith (hive.local.mode=true hive.localmode=false)  etc. I think that is 
the only way to keep the project sane.

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-07-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062176#comment-14062176
 ] 

Edward Capriolo commented on HIVE-5538:
---

I would suggest that we handle this by putting lines at the top of the .Q files 
that specify which permutation of ways this class need to be tested maybe like

--testwith vectorized+mr, vectorized+tez, !vectorized+mr

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-07-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062456#comment-14062456
 ] 

Edward Capriolo commented on HIVE-5538:
---

This is especially relavant since we are also developing spark support giving 
us another testing permutation :(

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7025) Support retention on hive tables

2014-07-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054910#comment-14054910
 ] 

Edward Capriolo commented on HIVE-7025:
---

I like this. I coded something like this by hand at my last job. I did run into 
issue where tables with huge number of partitions caused memory issues and had 
to page / limit the number of objects I would access in one go.

Anyway your test case does not include a partitioned table. I am assuming the 
reinvention is set on the table but you are using the create time of the 
partition? It might be good to include that in your unit test.

 Support retention on hive tables
 

 Key: HIVE-7025
 URL: https://issues.apache.org/jira/browse/HIVE-7025
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7025.1.patch.txt, HIVE-7025.2.patch.txt, 
 HIVE-7025.3.patch.txt


 Add self destruction properties for temporary tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3392) Hive unnecessarily validates table SerDes when dropping a table

2014-06-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040802#comment-14040802
 ] 

Edward Capriolo commented on HIVE-3392:
---

Please feel free to take over the review. I will not have any time at the 
moment. Thanks!

 Hive unnecessarily validates table SerDes when dropping a table
 ---

 Key: HIVE-3392
 URL: https://issues.apache.org/jira/browse/HIVE-3392
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jonathan Natkins
Assignee: Navis
  Labels: patch
 Attachments: HIVE-3392.2.patch.txt, HIVE-3392.3.patch.txt, 
 HIVE-3392.Test Case - with_trunk_version.txt


 natty@hadoop1:~$ hive
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive create table test (a int) row format serde 'hive.serde.JSONSerDe';  
   
 OK
 Time taken: 2.399 seconds
 natty@hadoop1:~$ hive
 hive drop table test;

 FAILED: Hive Internal Error: 
 java.lang.RuntimeException(MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
  SerDe hive.serde.JSONSerDe does not exist))
 java.lang.RuntimeException: 
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException SerDe 
 hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:262)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
   at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:162)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:943)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:700)
   at 
 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:210)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 SerDe com.cloudera.hive.serde.JSONSerDe does not exist)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:211)
   at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
   ... 20 more
 hive add jar 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar;
 Added 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
  to class path
 Added resource: 
 /home/natty/source/sample-code/custom-serdes/target/custom-serdes-1.0-SNAPSHOT.jar
 hive drop table test;
 OK
 Time taken: 0.658 seconds
 hive 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5857) Reduce tasks do not work in uber mode in YARN

2014-06-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029325#comment-14029325
 ] 

Edward Capriolo commented on HIVE-5857:
---

{code}
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
{code}

Can we remove this code? This bothers me. It is not self documenting all. Can 
we use if statements to determine when the file should be there and when it 
should not. 

Something like:
if (job.hasNoReduceWork()){
  retur null;
} else {
throw RuntimeException(work should be found but was not + expectedPathToFile);

 Reduce tasks do not work in uber mode in YARN
 -

 Key: HIVE-5857
 URL: https://issues.apache.org/jira/browse/HIVE-5857
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: Adam Kawa
Assignee: Adam Kawa
Priority: Critical
  Labels: plan, uber-jar, uberization, yarn
 Fix For: 0.13.0

 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch, 
 HIVE-5857.3.patch


 A Hive query fails when it tries to run a reduce task in uber mode in YARN.
 The NullPointerException is thrown in the ExecReducer.configure method, 
 because the plan file (reduce.xml) for a reduce task is not found.
 The Utilities.getBaseWork method is expected to return BaseWork object, but 
 it returns NULL due to FileNotFoundException. 
 {code}
 // org.apache.hadoop.hive.ql.exec.Utilities
 public static BaseWork getBaseWork(Configuration conf, String name) {
   ...
 try {
 ...
   if (gWork == null) {
 Path localPath;
 if (ShimLoader.getHadoopShims().isLocalMode(conf)) {
   localPath = path;
 } else {
   localPath = new Path(name);
 }
 InputStream in = new FileInputStream(localPath.toUri().getPath());
 BaseWork ret = deserializePlan(in);
 
   }
   return gWork;
 } catch (FileNotFoundException fnf) {
   // happens. e.g.: no reduce work.
   LOG.debug(No plan file found: +path);
   return null;
 } ...
 }
 {code}
 It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method 
 returns true, because immediately before running a reduce task, 
 org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to 
 local mode (mapreduce.framework.name is changed from yarn to local). 
 On the other hand map tasks run successfully, because its configuration is 
 not changed and still remains yarn.
 {code}
 // org.apache.hadoop.mapred.LocalContainerLauncher
 private void runSubtask(..) {
   ...
   conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
   conf.set(MRConfig.MASTER_ADDRESS, local);  // bypass shuffle
   ReduceTask reduce = (ReduceTask)task;
   reduce.setConf(conf);  
   reduce.run(conf, umbilical);
 }
 {code}
 A super quick fix could just an additional if-branch, where we check if we 
 run a reduce task in uber mode, and then look for a plan file in a different 
 location.
 *Java stacktrace*
 {code}
 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml
 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 7 more
 

[jira] [Resolved] (HIVE-1434) Cassandra Storage Handler

2014-06-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1434.
---

Resolution: Won't Fix

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, 
 cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, 
 hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, 
 hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, 
 hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1434) Cassandra Storage Handler

2014-06-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025539#comment-14025539
 ] 

Edward Capriolo commented on HIVE-1434:
---

This feature is a complete utter failure. It was never committed to hive. It 
was never committed to cassandra. I find ~40 forks of the code that are likely 
derivative works that make no reference to me or hive and all types of people 
are now asserting copyright over it. I am closing this issue and making a clean 
room implementation of a new handler.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-1434-r1182878.patch, cas-handle.tar.gz, 
 cass_handler.diff, hive-1434-1.txt, hive-1434-2-patch.txt, 
 hive-1434-2011-02-26.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-14.patch.txt, 
 hive-1434-3-patch.txt, hive-1434-4-patch.txt, hive-1434-5.patch.txt, 
 hive-1434.2011-02-27.diff.txt, hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7115) Support a mechanism for running hive locally that doesnt require having a hadoop executable.

2014-06-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025767#comment-14025767
 ] 

Edward Capriolo commented on HIVE-7115:
---

That would be really nice especially if it could be extended to dependent 
projects 
https://github.com/edwardcapriolo/hive_test requires lots of trickery to launch 
a hive process.

 Support a mechanism for running hive locally that doesnt require having a 
 hadoop executable.
 

 Key: HIVE-7115
 URL: https://issues.apache.org/jira/browse/HIVE-7115
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure, Tests
Reporter: jay vyas

 Mapreduce has a local mode by default, and likewise, tools such as pig and 
 SOLR do as well, maybe we can have a first class local mode for hive 
 also. 
 For local integration testing of a hadoop app, it would be nice if we could 
 fire up a local hive instance which didnt require bin/hadoop for running 
 local jobs.  This would allow us to maintain polyglot hadoop applications 
 much easier by incorporating hive into the integration tests.  For example:
 {noformat}
 LocalHiveInstance hive = new LocalHiveInstance();
 hive.set(course,crochet)l
 hive.runScript(hive_flow.ql)l
 {noformat} 
 Would essentially run a local hive query which mirrors
 {noformat}
 hive -f hive_flow.ql -hiveconf course=crochet
 {noformat{ 
 It seems like thee might be a simple way to do this, at least for small data 
 sets, by putting some kind of alternative (i.e. in memory) execution 
 environment under hive, if one is not already underway ?  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-06-02 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016115#comment-14016115
 ] 

Edward Capriolo commented on HIVE-5538:
---

To be clear we need a long term solution to rigorously test both code paths. 
Defaulting vectorization on could lead to rot in non vectorized code paths.

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5538) Turn on vectorization by default.

2014-06-02 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016114#comment-14016114
 ] 

Edward Capriolo commented on HIVE-5538:
---

Do we thing we are rushing this? Besides these test errors a vectorization udfs 
bug was reported on the mailing list this week. Is it prudent to switch this? 
If we switch this how will the original code path be tested?

 Turn on vectorization by default.
 -

 Key: HIVE-5538
 URL: https://issues.apache.org/jira/browse/HIVE-5538
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5538.1.patch, HIVE-5538.2.patch, HIVE-5538.3.patch, 
 HIVE-5538.4.patch, HIVE-5538.5.patch, HIVE-5538.5.patch, HIVE-5538.6.patch


   Vectorization should be turned on by default, so that users don't have to 
 specifically enable vectorization. 
   Vectorization code validates and ensures that a query falls back to row 
 mode if it is not supported on vectorized code path. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7121) Use murmur hash to distribute HiveKey

2014-05-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008377#comment-14008377
 ] 

Edward Capriolo commented on HIVE-7121:
---

Does this effect bucketed tables? I think it does and then we can not just 
change the hash code because that would break assumptions of what is in the 
bucket. IE i create a bucket in hive 12, and in hive 13 different data would be 
in the bucket.

I think this is why:

org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_quotedid_smb

These tests are failing. If this is the case we need a way or recording the 
hashcode in the metadata for the table.

 Use murmur hash to distribute HiveKey
 -

 Key: HIVE-7121
 URL: https://issues.apache.org/jira/browse/HIVE-7121
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7121.1.patch, HIVE-7121.WIP.patch


 The current hashCode implementation produces poor parallelism when dealing 
 with single integers or doubles.
 And for partitioned inserts into a 1 bucket table, there is a significant 
 hotspot on Reducer #31.
 Removing the magic number 31 and using a more normal hash algorithm would 
 help fix these hotspots.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7025) TTL on hive tables

2014-05-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992808#comment-13992808
 ] 

Edward Capriolo commented on HIVE-7025:
---

We do something similar however we also have the ability to delete partitions 
over a certain age. Hive already has a property inside every table called 
retention that we could consider using.

This code is a good first step but I have one question. Isn't this code rather 
racey? If we have multiple CLIs running threads they could all be 
simultaneously deleting tables, and a CLI with a system with a misconfiguration 
clock could potentially delete all the tables. I think if we do this it should 
be a stand alone piece. 

 TTL on hive tables
 --

 Key: HIVE-7025
 URL: https://issues.apache.org/jira/browse/HIVE-7025
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7025.1.patch.txt


 Add self destruction properties for temporary tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6469) skipTrash option in hive command line

2014-04-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977685#comment-13977685
 ] 

Edward Capriolo commented on HIVE-6469:
---

If a user is willing to commit an optional syntax that does not cause a 
language ambiguity I think we should allow the user to add the feature.

Rational : currently dfs -rm allows an optional --skip trash.  Normal users are 
able to control if a delete skips trash or not, regardless of how admins set 
the trash feature.

A natual extension is to extend this functionality to drop table.



 skipTrash option in hive command line
 -

 Key: HIVE-6469
 URL: https://issues.apache.org/jira/browse/HIVE-6469
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.12.0
Reporter: Jayesh
 Fix For: 0.12.1

 Attachments: HIVE-6469.patch


 hive drop table command deletes the data from HDFS warehouse and puts it into 
 Trash.
 Currently there is no way to provide flag to tell warehouse to skip trash 
 while deleting table data.
 This ticket is to add skipTrash feature in hive command-line, that looks as 
 following. 
 hive -e drop table skipTrash testTable
 This would be good feature to add, so that user can specify when not to put 
 data into trash directory and thus not to fill hdfs space instead of relying 
 on trash interval and policy configuration to take care of disk filling issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973210#comment-13973210
 ] 

Edward Capriolo commented on HIVE-1608:
---

It is not much. SequenceFile + none (codec) only ads some block information 
around text. I still thing sequence by default is a good idea. It makes it 
easier to add compression later without sacrificing split- ablility. 

 use sequencefile as the default for storing intermediate results
 

 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-1608.patch


 The only argument for having a text file for storing intermediate results 
 seems to be better debuggability.
 But, tailing a sequence file is possible, and it should be more space 
 efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6212) Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space

2014-04-13 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-6212.
---

Resolution: Won't Fix

Contact presto developers. 

 Using Presto-0.56 for sql query,but HiveServer the console print 
 java.lang.OutOfMemoryError: Java heap space
 

 Key: HIVE-6212
 URL: https://issues.apache.org/jira/browse/HIVE-6212
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
 Environment: HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56
Reporter: apachehadoop
 Fix For: 0.11.0


 Hi friends:
 Now I can't open the page 
 https://groups.google.com/forum/#!forum/presto-users ,so show my question 
 here.
 I have started hiveserver and started presto-server on a machine with 
 commands below:
 hive --service hiveserver -p 9083
 ./launcher run
 When I use the presto-client-cli command ./presto --server localhost:9083 
 --catalog hive --schema default ,the console shows presto:default,input the 
 command as show tables the console prints Error running command: 
 java.nio.channels.ClosedChannelException,
 and the hiveserver console print as below:
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 Exception in thread pool-1-thread-1 java.lang.OutOfMemoryError: Java heap 
 space
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 my configuration file below:
 node.properties
 node.environment=production
 node.id=cc4a1bbf-5b98-4935-9fde-2cf1c98e8774
 node.data-dir=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/data
 config.properties
 coordinator=true
 datasources=jmx
 http-server.http.port=8080
 presto-metastore.db.type=h2
 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/db/MetaStore
 task.max-memory=1GB
 discovery-server.enabled=true
 discovery.uri=http://slave4:8080
 jvm.config
 -server
 -Xmx16G
 -XX:+UseConcMarkSweepGC
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+CMSClassUnloadingEnabled
 -XX:+AggressiveOpts
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
 -XX:PermSize=150M
 -XX:MaxPermSize=150M
 -XX:ReservedCodeCacheSize=150M
 -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.56/presto-server-0.56/lib/floatingdecimal-0.1.jar
 log.properties
 com.facebook.presto=DEBUG
 catalog/hive.properties
 connector.name=hive-cdh4
 hive.metastore.uri=thrift://slave4:9083
 HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56
 Last I had increased the Java heap size for the Hive metastore,but it still 
 given me the same error informations ,please help me to check if that is a 
 bug of CDH5.Now I have no idea,god !
 please help me ,thanks.
 **
 
 **
 Add some informations as below:
 Help,help,help!
 I have test prest-server-0.55 and 0.56 and 0.57 on CDH4 +hive-0.10 or 
 hive-0.11,but it still shown error informations above.
 ON coordinator machine the directory etc and configuration files as below:
 =coordinator 
  config.properties:
 coordinator=true
 datasources=jmx
 http-server.http.port=8080
 presto-metastore.db.type=h2
 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.55/presto/db/MetaStore
 task.max-memory=1GB
 discovery-server.enabled=true
 discovery.uri=http://name:8080
 --jvm.config:
 -server
 -Xmx4G
 -XX:+UseConcMarkSweepGC
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+CMSClassUnloadingEnabled
 -XX:+AggressiveOpts
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
 -XX:PermSize=150M
 -XX:MaxPermSize=150M
 -XX:ReservedCodeCacheSize=150M
 -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.55/presto-server-0.55/lib/floatingdecimal-0.1.jar
 

[jira] [Commented] (HIVE-6212) Using Presto-0.56 for sql query,but HiveServer the console print java.lang.OutOfMemoryError: Java heap space

2014-04-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967874#comment-13967874
 ] 

Edward Capriolo commented on HIVE-6212:
---

We dont support presto.

 Using Presto-0.56 for sql query,but HiveServer the console print 
 java.lang.OutOfMemoryError: Java heap space
 

 Key: HIVE-6212
 URL: https://issues.apache.org/jira/browse/HIVE-6212
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
 Environment: HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56
Reporter: apachehadoop
 Fix For: 0.11.0


 Hi friends:
 Now I can't open the page 
 https://groups.google.com/forum/#!forum/presto-users ,so show my question 
 here.
 I have started hiveserver and started presto-server on a machine with 
 commands below:
 hive --service hiveserver -p 9083
 ./launcher run
 When I use the presto-client-cli command ./presto --server localhost:9083 
 --catalog hive --schema default ,the console shows presto:default,input the 
 command as show tables the console prints Error running command: 
 java.nio.channels.ClosedChannelException,
 and the hiveserver console print as below:
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 Exception in thread pool-1-thread-1 java.lang.OutOfMemoryError: Java heap 
 space
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:353)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:215)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 my configuration file below:
 node.properties
 node.environment=production
 node.id=cc4a1bbf-5b98-4935-9fde-2cf1c98e8774
 node.data-dir=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/data
 config.properties
 coordinator=true
 datasources=jmx
 http-server.http.port=8080
 presto-metastore.db.type=h2
 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.56/presto/db/MetaStore
 task.max-memory=1GB
 discovery-server.enabled=true
 discovery.uri=http://slave4:8080
 jvm.config
 -server
 -Xmx16G
 -XX:+UseConcMarkSweepGC
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+CMSClassUnloadingEnabled
 -XX:+AggressiveOpts
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
 -XX:PermSize=150M
 -XX:MaxPermSize=150M
 -XX:ReservedCodeCacheSize=150M
 -Xbootclasspath/p:/home/hadoop/cloudera-5.0.0/presto-0.56/presto-server-0.56/lib/floatingdecimal-0.1.jar
 log.properties
 com.facebook.presto=DEBUG
 catalog/hive.properties
 connector.name=hive-cdh4
 hive.metastore.uri=thrift://slave4:9083
 HADOOP ENVIRONMENT IS CDH5+CDH5-HIVE-0.11+PRESTO-0.56
 Last I had increased the Java heap size for the Hive metastore,but it still 
 given me the same error informations ,please help me to check if that is a 
 bug of CDH5.Now I have no idea,god !
 please help me ,thanks.
 **
 
 **
 Add some informations as below:
 Help,help,help!
 I have test prest-server-0.55 and 0.56 and 0.57 on CDH4 +hive-0.10 or 
 hive-0.11,but it still shown error informations above.
 ON coordinator machine the directory etc and configuration files as below:
 =coordinator 
  config.properties:
 coordinator=true
 datasources=jmx
 http-server.http.port=8080
 presto-metastore.db.type=h2
 presto-metastore.db.filename=/home/hadoop/cloudera-5.0.0/presto-0.55/presto/db/MetaStore
 task.max-memory=1GB
 discovery-server.enabled=true
 discovery.uri=http://name:8080
 --jvm.config:
 -server
 -Xmx4G
 -XX:+UseConcMarkSweepGC
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+CMSClassUnloadingEnabled
 -XX:+AggressiveOpts
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
 -XX:PermSize=150M
 -XX:MaxPermSize=150M
 -XX:ReservedCodeCacheSize=150M
 

[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961224#comment-13961224
 ] 

Edward Capriolo commented on HIVE-1608:
---

If the sequence file is not compressed it is actually larger then the text 
file...

 use sequencefile as the default for storing intermediate results
 

 Key: HIVE-1608
 URL: https://issues.apache.org/jira/browse/HIVE-1608
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Namit Jain
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-1608.patch


 The only argument for having a text file for storing intermediate results 
 seems to be better debuggability.
 But, tailing a sequence file is possible, and it should be more space 
 efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the source command

2014-03-28 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951675#comment-13951675
 ] 

Edward Capriolo commented on HIVE-6570:
---

No major concern the release note is enough information. Sorry I was not paying 
attention to this thread. Please proceed.

 Hive variable substitution does not work with the source command
 --

 Key: HIVE-6570
 URL: https://issues.apache.org/jira/browse/HIVE-6570
 Project: Hive
  Issue Type: Bug
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6570.1.patch


 The following does not work:
 {code}
 source ${hivevar:test-dir}/test.q;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the source command

2014-03-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932506#comment-13932506
 ] 

Edward Capriolo commented on HIVE-6570:
---

WE should make a release note if someone has  $ in there file hive might now 
try to interpret it.

 Hive variable substitution does not work with the source command
 --

 Key: HIVE-6570
 URL: https://issues.apache.org/jira/browse/HIVE-6570
 Project: Hive
  Issue Type: Bug
Reporter: Anthony Hsu
Assignee: Anthony Hsu
 Attachments: HIVE-6570.1.patch


 The following does not work:
 {code}
 source ${hivevar:test-dir}/test.q;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6311) Design a new logo?

2014-01-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882432#comment-13882432
 ] 

Edward Capriolo commented on HIVE-6311:
---

I really like the hive logo. I surely can be re-drawn high res etc, but 
fundamentally I like the elephant/bee hybrid. 

 Design a new logo?
 --

 Key: HIVE-6311
 URL: https://issues.apache.org/jira/browse/HIVE-6311
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland

 I have heard some folks saying we should create a new logo so I am creating a 
 jira for their comment,



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6167) Allow user-defined functions to be qualified with database name

2014-01-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868485#comment-13868485
 ] 

Edward Capriolo commented on HIVE-6167:
---

In my opinion we must keep the current syntax working as is. Current users of 
hive do not want there scripts to break just to match a standard. If we wish to 
add new syntax that matches a given standard that makes sense. I do not think 
the current standard forbids keeping our current syntax and functionality. Also 
realistically we have to be practical. Users have sessions, most users are not 
going to care what database/schema a function is associated with. Most are 
going to want global functions. Most people are not going to have so many 
functions that a conflict would ever arise. Lets not make and solve problems we 
really don't have. 

 Allow user-defined functions to be qualified with database name
 ---

 Key: HIVE-6167
 URL: https://issues.apache.org/jira/browse/HIVE-6167
 Project: Hive
  Issue Type: Sub-task
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere

 Function names in Hive are currently unqualified and there is a single 
 namespace for all function names. This task would allow users to define 
 temporary UDFs (and eventually permanent UDFs) with a database name, such as:
 CREATE TEMPORARY FUNCTION userdb.myfunc 'myudfclass';



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6171) Use Paths consistently - V

2014-01-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866153#comment-13866153
 ] 

Edward Capriolo commented on HIVE-6171:
---

If I had to be picky I see methods named somethingURI( in patch. Convention now 
is somethingUri. Not critical or required by any stretch

 Use Paths consistently - V
 --

 Key: HIVE-6171
 URL: https://issues.apache.org/jira/browse/HIVE-6171
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-6171.patch


 Next in series for consistent usage of Paths in Hive.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6047) Permanent UDFs in Hive

2014-01-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864890#comment-13864890
 ] 

Edward Capriolo commented on HIVE-6047:
---

Theoreticallly you could compile anything, even input formats or serdes, but I 
do not imagine anyone using it that way.

 Permanent UDFs in Hive
 --

 Key: HIVE-6047
 URL: https://issues.apache.org/jira/browse/HIVE-6047
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: PermanentFunctionsinHive.pdf


 Currently Hive only supports temporary UDFs which must be re-registered when 
 starting up a Hive session. Provide some support to register permanent UDFs 
 with Hive. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6100) Introduce basic set operations as UDFs

2014-01-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863147#comment-13863147
 ] 

Edward Capriolo commented on HIVE-6100:
---

Having UDFs would still be useful. I use a lot of nested structures. We end up 
doing really complicated and kinda slow lateral view / join queries to do set 
operations sometimes. Having UDFs that did things on complex types could help 
in many situations.

 Introduce basic set operations as UDFs
 --

 Key: HIVE-6100
 URL: https://issues.apache.org/jira/browse/HIVE-6100
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0


 Introduce basic set operations:
 1. Intersection: The intersection of A and B, denoted by A ∩ B, is the set of 
 all things that are members of both A and B.
 select set_intersection(arr_a, arr_b) from dual
 2. Union: The union of A and B, denoted by A ∪ B, is the set of all things 
 that are members of either A or B.
 select set_union(arr_a, arr_b) from dual
 3. Symmetric difference: the symmetric difference of two sets is the set of 
 elements which are in either of the sets and not in their intersection.
 select set_symdiff(arr_a, arr_b) from dual



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6100) Introduce basic set operations as UDFs

2014-01-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863402#comment-13863402
 ] 

Edward Capriolo commented on HIVE-6100:
---

I think Alan and I are speaking of two different things both of which are valid.

Form the title of the Jira I was assuming the user meant this.
{pre}
create table a ( listint x , list int y)
select union (x,y) 
{pre}

But what Alan is discussing is perfectly valid as well.

 Introduce basic set operations as UDFs
 --

 Key: HIVE-6100
 URL: https://issues.apache.org/jira/browse/HIVE-6100
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0


 Introduce basic set operations:
 1. Intersection: The intersection of A and B, denoted by A ∩ B, is the set of 
 all things that are members of both A and B.
 select set_intersection(arr_a, arr_b) from dual
 2. Union: The union of A and B, denoted by A ∪ B, is the set of all things 
 that are members of either A or B.
 select set_union(arr_a, arr_b) from dual
 3. Symmetric difference: the symmetric difference of two sets is the set of 
 elements which are in either of the sets and not in their intersection.
 select set_symdiff(arr_a, arr_b) from dual



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6047) Permanent UDFs in Hive

2014-01-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863870#comment-13863870
 ] 

Edward Capriolo commented on HIVE-6047:
---

We just added the ability to write UDFs as groovy, can those be persisted as 
well it would be easier to save the groovy string rather then the compiled 
classes.

 Permanent UDFs in Hive
 --

 Key: HIVE-6047
 URL: https://issues.apache.org/jira/browse/HIVE-6047
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: PermanentFunctionsinHive.pdf


 Currently Hive only supports temporary UDFs which must be re-registered when 
 starting up a Hive session. Provide some support to register permanent UDFs 
 with Hive. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841601#comment-13841601
 ] 

Edward Capriolo commented on HIVE-5783:
---

Why does support need to be build directly into the semantic analyzer? I think 
input format/serde's should be decoupled from the hive code as much as 
possible. hard codes like this make it hard to evolve support. I *think* you 
should be only adding the libs as a dependency to the pom files and building 
some tests. 

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.11.0

 Attachments: hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841674#comment-13841674
 ] 

Edward Capriolo commented on HIVE-5783:
---

{quote}
regarding the support being built into the semantic analyzer, I mimicked what 
was done for ORC support{quote}
I think that was done before maven. I am sure there is a reason why RCFILE, 
ORCFILE and this add there own syntax, but this is something we might not want 
to copy-and-paste repeat just because the last person did it that way. 


 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.11.0

 Attachments: hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841693#comment-13841693
 ] 

Edward Capriolo edited comment on HIVE-5783 at 12/6/13 8:55 PM:


{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept. hive does not have a dedicated cleanup crew 
to handle all the non-sexy features :)


was (Author: appodictic):
{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.11.0

 Attachments: hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841693#comment-13841693
 ] 

Edward Capriolo commented on HIVE-5783:
---

{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Fix For: 0.11.0

 Attachments: hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5875) task : collect list of hive configuration params whose default should change

2013-11-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830393#comment-13830393
 ] 

Edward Capriolo commented on HIVE-5875:
---

hive.mapred.mode=strict
hive.cli.print.header=true
auotcreate.scheam=false


 task : collect list of hive configuration params whose default should change
 

 Key: HIVE-5875
 URL: https://issues.apache.org/jira/browse/HIVE-5875
 Project: Hive
  Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair

 The immediate motivation for this was the ticket HIVE-4485 . Beeline prints 
 NULLs as empty strings. This is not a desirable behavior. But if we fix it, 
 it breaks backward compatibility. 
 But we should not be burdening all users with mistakes of the past, specially 
 the users who are new to hive. As hadoop and hive adoption increases 
 proportion of 'new' users will continue to increase.
 We need a way to let users choose between backward compatible behavior and 
 more sensible behavior.  How this is implemented can be discussed in a 
 separate jira. 
 The purpose of this *Task* jira is just to collect list of config flags whose 
 current default is not the desirable one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-11-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825611#comment-13825611
 ] 

Edward Capriolo commented on HIVE-5317:
---

I have two fundamental problems with this concept.
{quote}
The only requirement is that the file format must be able to support a rowid. 
With things like text and sequence file this can be done via a byte offset.
{quote}

This is a good reason not to do this. Things that  only work for some formats 
create fragmentation. What about format's that do not have a row id? What if 
the user is already using the key for something else like data?

{quote}
Once an hour a log of transactions is exported from a RDBS and the fact tables 
need to be updated (up to 1m rows) to reflect the new data. The transactions 
are a combination of inserts, updates, and deletes. The table is partitioned 
and bucketed.
{quote}

What this ticket describes seems like a bad use case for hive. Why would the 
user not simply create a new table partitioned by hour? What is the need to 
transaction ally in-place update a table? 

It seems like the better solution would be for the user to log these updates 
themselves and then export the table with a tool like squoop periodically.  

I see this as a really complicated piece of work, for a narrow use case, and I 
have a very difficult time believing adding transactions to hive to support 
this is the right answer.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-11-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825619#comment-13825619
 ] 

Edward Capriolo commented on HIVE-5317:
---

By the way. I do work like this very often, and having tables that update 
periodically cause a lot of problems. The first is when you have to re-compute 
a result 4 days later.

You do not want a fresh up-to-date table, you want the table as it existed 4 
days ago. When you want to troubleshoot a result you do not want your 
intermediate tables trampled over. When you want to rebuild a months worth of 
results you want to launch 31 jobs in parallel not 31 jobs in series. 

In fact in programming hive I suggest ALWAYS partitioning this dimension tables 
by time and NOT doing what this ticket is describing for the reasons above (and 
more)

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-11-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826191#comment-13826191
 ] 

Edward Capriolo commented on HIVE-5317:
---

{quote}
Ed,
If you don't use the insert, update, and delete commands, they won't impact 
your use of Hive. On the other hand, there are a wide number of users who need 
ACID and updates.
{quote}

Why don't those users just use an acid database?

{quote}
The dimension tables have primary keys and are typically bucketed and sorted on 
those keys.
{quote}

All the use cases defined seem to be exactly what hive is not built for.
1) Hive does not do much/any optimization of a table when it is sorted.
2) Hive tables do not have primary keys
3) Hive is not made to play with tables of only a few rows

It seems like the idea is to turn hive and hive metastore into a once shot 
database for processes that can easily be done differently. 

{quote}
Once a day a small set (up to 100k rows) of records need to be deleted for 
regulatory compliance.
{quote}
1. squoop export to rdbms
2. run query on rdbms
3. write back to hive.

I am not ready to vote -1, but I am struggling to understand why anyone would 
want to use hive to solve the use cases described. This seems like a square peg 
in a round hole solution. It feels like something that belongs outside of hive.

It feels a lot like this:
http://db.cs.yale.edu/hadoopdb/hadoopdb.html


 





 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-11-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826207#comment-13826207
 ] 

Edward Capriolo commented on HIVE-5317:
---

{quote}
In theory the base can be in any format, but ORC will be required for v1
{quote}
This is exactly what I talk about when I talk about fragmentation. Hive can not 
be a system where features only work when using a specific input format. The 
feature must be applicable to more then just the single file format. Taging 
other file formats in the LATER bothers me. Wouldn't the community have 
more utility of something that worked against a TextFormat was written first, 
then later against other formats. I know about the stinger initiative, 
developing features that only work with specific input formats does not seem 
like the correct course of action. It goes against our core design principals:

https://cwiki.apache.org/confluence/display/Hive/Home

Hive does not mandate read or written data be in the Hive format---there is 
no such thing. Hive works equally well on Thrift, control delimited, or your 
specialized data formats. Please see File Format and SerDe in the Developer 
Guide for details.


 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13818289#comment-13818289
 ] 

Edward Capriolo commented on HIVE-5731:
---

{quote}
GenericUDF class is the latest and recommended base class for any UDFs.
This JIRA is to change the current UDFDate* classes extended from GenericUDF.
{quote}

Had anyone done performance evaluation on the speed of a UDF vs a generic UDF. 
I understand the motivation in the vectorized case, but are users of the 
non-vectorized case getting less performance. If I knew the performance was 
negligible I would not care, but I have not seen any numbers and I am wondering 
if we have considered the implications of this.

 Use new GenericUDF instead of basic UDF for UDFDate* classes 
 -

 Key: HIVE-5731
 URL: https://issues.apache.org/jira/browse/HIVE-5731
 Project: Hive
  Issue Type: Improvement
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
 HIVE-5731.4.patch


 GenericUDF class is the latest and recommended base class for any UDFs.
 This JIRA is to change the current UDFDate* classes extended from GenericUDF.
 The general benefit of GenericUDF is described in comments as
 * The GenericUDF are superior to normal UDFs in the following ways: 1. It can
 accept arguments of complex types, and return complex types. 2. It can 
 accept
 variable length of arguments. 3. It can accept an infinite number of 
 function
 signature - for example, it's easy to write a GenericUDF that accepts
 arrayint, arrayarrayint and so on (arbitrary levels of nesting). 4. 
 It
 can do short-circuit evaluations using DeferedObject.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5107) Change hive's build to maven

2013-11-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13817813#comment-13817813
 ] 

Edward Capriolo commented on HIVE-5107:
---

Generally it is better to but unit tests closest to the code it is testing. 
This makes it easier to determine test coverage.

Integration tests usually involve testing across modules.

Ideally we want tests to be localized. Someone working in hive-avro should not 
have to run tests unrelated to avro to add a feature, I think that is what we 
are aiming for, clean separation and easier testing without a full run.

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5602) Micro optimize select operator

2013-10-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808331#comment-13808331
 ] 

Edward Capriolo commented on HIVE-5602:
---

Thanks for looking.

 Micro optimize select operator
 --

 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5602.2.patch.txt, HIVE-5602.patch.1.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5602) Micro optimize select operator

2013-10-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5602:
--

Attachment: HIVE-5602.2.patch.txt

 Micro optimize select operator
 --

 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5602.2.patch.txt, HIVE-5602.patch.1.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5643) ZooKeeperHiveLockManager.getQuorumServers incorrectly appends the custom zk port to quorum hosts

2013-10-27 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806347#comment-13806347
 ] 

Edward Capriolo commented on HIVE-5643:
---

+1

 ZooKeeperHiveLockManager.getQuorumServers incorrectly appends the custom zk 
 port to quorum hosts
 

 Key: HIVE-5643
 URL: https://issues.apache.org/jira/browse/HIVE-5643
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.12.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.13.0

 Attachments: HIVE-5643.1.patch.txt


 ZooKeeperHiveLockManager calls the below method to construct the connection 
 string for ZooKeeper connection.
 {code}
   private static String getQuorumServers(HiveConf conf) {
 String hosts = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_QUORUM);
 String port = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_CLIENT_PORT);
 return hosts + : + port;
   }
 {code}
 For example:
 HIVE_ZOOKEEPER_QUORUM=node1, node2, node3
 HIVE_ZOOKEEPER_CLIENT_PORT=
 Connection string given to ZooKeeper object is node1, node2, node3:. 
 ZooKeeper consider the default port as 2181 for hostnames that don't have any 
 port. 
 This works fine as long as HIVE_ZOOKEEPER_CLIENT_PORT is 2181. If it is 
 different then ZooKeeper client object tries to connect to node1 and node2 on 
 port 2181 which always fails. So it has only one choice the last host which 
 receives all the load from Hive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5610) Merge maven branch into trunk

2013-10-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806176#comment-13806176
 ] 

Edward Capriolo commented on HIVE-5610:
---

[~brocknoland] All looks good to me. +1 Lets prepare a wiki doc on maven, and 
documented the simple changes building, testing, etc. Then we can pull the 
trigger this change.

 Merge maven branch into trunk
 -

 Key: HIVE-5610
 URL: https://issues.apache.org/jira/browse/HIVE-5610
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland

 With HIVE-5566 nearing completion we will be nearly ready to merge the maven 
 branch to trunk. The following tasks will be done post-merge:
 * HIVE-5611 - Add assembly (i.e.) tar creation to pom
 * HIVE-5612 - Add ability to re-generate generated code stored in source 
 control
 The merge process will be as follows:
 1) svn merge ^/hive/branches/maven
 2) Commit result
 3) Modify the following line in maven-rollforward.sh:
 {noformat}
   mv $source $target
 {noformat}
 to
 {noformat}
   svn mv $source $target
 {noformat}
 4) Execute maven-rollfward.sh
 5) Commit result 
 6) Update trunk-mr1.properties and trunk-mr2.properties on the ptesting host, 
 adding the following:
 {noformat}
 mavenEnvOpts = -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128 
 testCasePropertyName = test
 buildTool = maven
 unitTests.directories = ./
 {noformat}
 Notes:
 * To build everything you must:
 {noformat}
 $ mvn clean install -DskipTests
 $ cd itests
 $ mvn clean install -DskipTests
 {noformat}
 because itests (any tests that has cyclical dependencies or requires that the 
 packages be built) is not part of the root reactor build.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5655) Hive incorrecly handles divide-by-zero case

2013-10-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806197#comment-13806197
 ] 

Edward Capriolo commented on HIVE-5655:
---

+1 . [~xuefuz] We(I) recently committed a new system that runs udf tests 
through the operator chain. Maybe you want to base your junit test on that.

see  ./ql/src/test/org/apache/hadoop/hive/ql/testutil/BaseScalarUdfTest.java

 Hive incorrecly handles divide-by-zero case
 ---

 Key: HIVE-5655
 URL: https://issues.apache.org/jira/browse/HIVE-5655
 Project: Hive
  Issue Type: Improvement
  Components: Types
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5655.1.patch, HIVE-5655.patch


 Unlike other databases, Hive currently has only one mode (default mode) 
 regarding error handling, in which NULL value is returned. However, in case 
 of divide-by-zero, Hive demonstrated a different behavior.
 {code}
 hive select 5/0 from tmp2 limit 1;
 Total MapReduce jobs = 1
 ...
 Total MapReduce CPU Time Spent: 860 msec
 OK
 Infinity
 {code}
 The correct behaviour should be Hive returning NULL instead in order to be 
 consistent w.r.t error handling. (BTW, the same situation is handled 
 corrected for decimal type.)
 MySQL has server modes control the behaviour. By default, NULL is returned. 
 For instance,
 {code}
 mysql select 3/0 from dual;
 +--+
 | 3/0  |
 +--+
 | NULL |
 +--+
 1 row in set (0.00 sec)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5613) Subquery support: disallow nesting of SubQueries

2013-10-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802252#comment-13802252
 ] 

Edward Capriolo commented on HIVE-5613:
---

I do not understand this issue from the description. Are we discussing 
disallowing sub queries that already work? Or are we discussing more stringent 
syntax checking?

 Subquery support: disallow nesting of SubQueries
 

 Key: HIVE-5613
 URL: https://issues.apache.org/jira/browse/HIVE-5613
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4965) Add support so that PTFs can stream their output; Windowing PTF should do this

2013-10-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802544#comment-13802544
 ] 

Edward Capriolo commented on HIVE-4965:
---


HIVE-4965.D12615.1.patch . There are several lint errors in this patch
+while(pItr.hasNext())
+{

int i=0;

i=0;
+for(i=0; i  iPart.getOutputOI().getAllStructFieldRefs().size(); i++) {

int i =0;

 Add support so that PTFs can stream their output; Windowing PTF should do this
 --

 Key: HIVE-4965
 URL: https://issues.apache.org/jira/browse/HIVE-4965
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4965.D12033.1.patch, HIVE-4965.D12615.1.patch


 There is no need to create an output PTF Partition for the last PTF in a 
 chain. For the Windowing PTF this should give a perf. boost; we avoid 
 creating temporary results for each UDAF; avoid populating an output 
 Partition.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5602) Micro optimize select operator

2013-10-21 Thread Edward Capriolo (JIRA)
Edward Capriolo created HIVE-5602:
-

 Summary: Micro optimize select operator
 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5602) Micro optimize select operator

2013-10-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5602:
--

Attachment: HIVE-5602.patch.1.txt

 Micro optimize select operator
 --

 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5602.patch.1.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5602) Micro optimize select operator

2013-10-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5602:
--

Assignee: Edward Capriolo
  Status: Patch Available  (was: Open)

 Micro optimize select operator
 --

 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5602.patch.1.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5602) Micro optimize select operator

2013-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801368#comment-13801368
 ] 

Edward Capriolo commented on HIVE-5602:
---

SELECT operator is doing try catch inside a for loop each column when it does 
not need to. Additionally we are making a function call each row to check 
conf.isSelectComputeNoStart()

I micro-benched before and after the change and showed a minimal bonus, please 
review.
{pre}
13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1 rows
13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 10 rows
13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 100 rows
13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1000 rows
13/10/21 20:29:29 INFO exec.FilterOperator: 0 forwarding 1 rows
13/10/21 20:29:30 INFO exec.FilterOperator: 0 forwarding 10 rows
13/10/21 20:29:31 INFO exec.FilterOperator: 0 forwarding 100 rows
13/10/21 20:29:33 INFO exec.FilterOperator: 0 forwarding 200 rows
13/10/21 20:29:34 INFO exec.FilterOperator: 0 forwarding 300 rows
13/10/21 20:29:36 INFO exec.FilterOperator: 0 forwarding 400 rows
13/10/21 20:29:38 INFO exec.FilterOperator: 0 forwarding 500 rows
13/10/21 20:29:40 INFO exec.FilterOperator: 0 forwarding 600 rows
13/10/21 20:29:41 INFO exec.FilterOperator: 0 forwarding 700 rows
13/10/21 20:29:43 INFO exec.FilterOperator: 0 forwarding 800 rows
13/10/21 20:29:45 INFO exec.FilterOperator: 0 forwarding 900 rows
13/10/21 20:29:46 INFO exec.FilterOperator: 0 forwarding 1000 rows

13/10/21 20:31:36 INFO exec.FilterOperator: Initialization Done 0 FIL
13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 1 rows
13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 10 rows
13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 100 rows
13/10/21 20:31:36 INFO exec.FilterOperator: 0 forwarding 1000 rows
13/10/21 20:31:37 INFO exec.FilterOperator: 0 forwarding 1 rows
13/10/21 20:31:37 INFO exec.FilterOperator: 0 forwarding 10 rows
13/10/21 20:31:38 INFO exec.FilterOperator: 0 forwarding 100 rows
13/10/21 20:31:40 INFO exec.FilterOperator: 0 forwarding 200 rows
13/10/21 20:31:41 INFO exec.FilterOperator: 0 forwarding 300 rows
13/10/21 20:31:43 INFO exec.FilterOperator: 0 forwarding 400 rows
13/10/21 20:31:45 INFO exec.FilterOperator: 0 forwarding 500 rows
13/10/21 20:31:46 INFO exec.FilterOperator: 0 forwarding 600 rows
13/10/21 20:31:48 INFO exec.FilterOperator: 0 forwarding 700 rows
13/10/21 20:31:49 INFO exec.FilterOperator: 0 forwarding 800 rows
13/10/21 20:31:51 INFO exec.FilterOperator: 0 forwarding 900 rows
13/10/21 20:31:53 INFO exec.FilterOperator: 0 forwarding 1000 rows
{pre}

 Micro optimize select operator
 --

 Key: HIVE-5602
 URL: https://issues.apache.org/jira/browse/HIVE-5602
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5602.patch.1.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5592) Add an option to convert enum as structvalue:int as of Hive 0.8

2013-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801415#comment-13801415
 ] 

Edward Capriolo commented on HIVE-5592:
---

If this is true we need to fix this asap.

 Add an option to convert enum as structvalue:int as of Hive 0.8
 -

 Key: HIVE-5592
 URL: https://issues.apache.org/jira/browse/HIVE-5592
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Jie Li

 HIVE-3323 introduced the incompatible change: Hive handling of enum types has 
 been changed to always return the string value rather than structvalue:int. 
 But it didn't add the option hive.data.convert.enum.to.string  as planned 
 and thus broke all Enum usage prior to 0.10.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5600) Fix PTest2 Maven support

2013-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801487#comment-13801487
 ] 

Edward Capriolo commented on HIVE-5600:
---

+1

 Fix PTest2 Maven support
 

 Key: HIVE-5600
 URL: https://issues.apache.org/jira/browse/HIVE-5600
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5600.patch


 At present we don't download all the dependencies required in the source prep 
 phase therefore tests fail when the maven repo has been cleared.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5563) Skip reading columns in ORC for count(*)

2013-10-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797028#comment-13797028
 ] 

Edward Capriolo commented on HIVE-5563:
---

Just a note/question? How is rcfile effected by these changes. Do we have api 
fragmentation going on are both formats effected? I am not seeing any 
end-to-end test in hive-5546 what are we doing to prevent code rot, and to 
ensure this mistake does not happen again?

 Skip reading columns in ORC for count(*)
 

 Key: HIVE-5563
 URL: https://issues.apache.org/jira/browse/HIVE-5563
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley

 With HIVE-4113, the semantics of ColumnProjectionUtils.getReadColumnIds was 
 fixed so that an empty list means no columns instead of all columns. (Except 
 the caveat of the override of ColumnProjectionUtils.isReadAllColumns.)
 However, ORC's reader wasn't updated so it still reads all columns.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4175) Injection of emptyFile into input splits for empty partitions causes Deserializer to fail

2013-10-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797033#comment-13797033
 ] 

Edward Capriolo commented on HIVE-4175:
---

I bet this is something hive/ hive combine input format is going. I have 
noticed random issues around empty partitions before that were recently fixed. 
Also note that protobuf was recently updated from 2.4 - 2.5

 Injection of emptyFile into input splits for empty partitions causes 
 Deserializer to fail
 -

 Key: HIVE-4175
 URL: https://issues.apache.org/jira/browse/HIVE-4175
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
 Environment: CDH4.2, using MR1
Reporter: James Kebinger
Priority: Minor

 My deserializer is expecting to receive one of 2 different subclasses of 
 Writable, but in certain circumstances it receives an empty instance of 
 org.apache.hadoop.io.Text. This only happens for task attempts where I 
 observe the file called emptyFile in the list of input splits. 
 I'm doing queries over an external year/month/day partitioned table that have 
 eagerly created partitions for, so as of today for example, I may do a query 
 where year = 2013 and month = 3 which includes empty partitions.
 In the course of investigation I downloaded the sequence files to confirm 
 they were ok. Once I realized that processing of empty partitions was to 
 blame, I am able to work around the issue by bounding my queries to populated 
 partitions.
 Can the need for the emptyFile be eliminated in the case where there's 
 already a bunch of splits being processed? Failing that, can the mapper 
 detect the current input is from emptyFile and not call the deserializer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5567) Add better protection code for SARGs

2013-10-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797194#comment-13797194
 ] 

Edward Capriolo commented on HIVE-5567:
---

Is there a reason that decimal can not be supported, or is the support for 
decimal incomplete?

If SARG can support decimal we might be better off not adding protection, 
instead we should ensure that our unit tests cover all types.



 Add better protection code for SARGs
 

 Key: HIVE-5567
 URL: https://issues.apache.org/jira/browse/HIVE-5567
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently, the SARG parser gets a NPE when the push down predicate uses a 
 type like decimal that isn't supported.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5567) Add better protection code for SARGs

2013-10-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797201#comment-13797201
 ] 

Edward Capriolo commented on HIVE-5567:
---

In other words we do not want to create fragmentation. If types certain types 
can not work with predicate-pushdown that is a problem we should address. 

 Add better protection code for SARGs
 

 Key: HIVE-5567
 URL: https://issues.apache.org/jira/browse/HIVE-5567
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently, the SARG parser gets a NPE when the push down predicate uses a 
 type like decimal that isn't supported.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-2419) CREATE TABLE AS SELECT should create warehouse directory

2013-10-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795383#comment-13795383
 ] 

Edward Capriolo commented on HIVE-2419:
---

What if /user/hive does not exist?
What if /user does not exist?

Maybe it is better to let people make the directories themselves. Or simply 
have a pre-flight startup check in init scripts or a java main.

 CREATE TABLE AS SELECT should create warehouse directory
 

 Key: HIVE-2419
 URL: https://issues.apache.org/jira/browse/HIVE-2419
 Project: Hive
  Issue Type: Bug
Reporter: David Phillips
 Attachments: HIVE-2419.1.patch


 If you run a CTAS statement on a fresh Hive install without a warehouse 
 directory (as is the case with Amazon EMR), it runs the query but errors out 
 at the end:
 {quote}
 hive create table foo as select * from t_message limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 ...
 Ended Job = job_201108301753_0001
 Moving data to: 
 hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo
 Failed with exception Unable to rename: 
 hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/var/lib/hive_07_1/tmp/scratch/hive_2011-08-30_18-04-36_809_6130923980133666976/-ext-10001
  to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 {quote}
 This is different behavior from a simple CREATE TABLE, which creates the 
 warehouse directory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array

2013-10-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4943:
--

Fix Version/s: 0.13.0

 An explode function that includes the item's position in the array
 --

 Key: HIVE-4943
 URL: https://issues.apache.org/jira/browse/HIVE-4943
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Niko Stahl
  Labels: patch
 Fix For: 0.13.0

 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 A function that explodes an array and includes an output column with the 
 position of each item in the original array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array

2013-10-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4943:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolved. Thanks Niko. Next time tag me as a watcher or make more noise if the 
patch takes so long.

 An explode function that includes the item's position in the array
 --

 Key: HIVE-4943
 URL: https://issues.apache.org/jira/browse/HIVE-4943
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Niko Stahl
  Labels: patch
 Fix For: 0.13.0

 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 A function that explodes an array and includes an output column with the 
 position of each item in the original array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4943) An explode function that includes the item's position in the array

2013-10-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793375#comment-13793375
 ] 

Edward Capriolo commented on HIVE-4943:
---

+1. Let me re-upload the patch after it retests I will commit.

 An explode function that includes the item's position in the array
 --

 Key: HIVE-4943
 URL: https://issues.apache.org/jira/browse/HIVE-4943
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Niko Stahl
  Labels: patch
 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 A function that explodes an array and includes an output column with the 
 position of each item in the original array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array

2013-10-12 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4943:
--

Attachment: HIVE-4943.3.patch

 An explode function that includes the item's position in the array
 --

 Key: HIVE-4943
 URL: https://issues.apache.org/jira/browse/HIVE-4943
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Niko Stahl
  Labels: patch
 Attachments: HIVE-4943.1.patch, HIVE-4943.2.patch, HIVE-4943.3.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 A function that explodes an array and includes an output column with the 
 position of each item in the original array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5252) Add ql syntax for inline java code creation

2013-10-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792672#comment-13792672
 ] 

Edward Capriolo commented on HIVE-5252:
---

NP it can wait a day.

 Add ql syntax for inline java code creation
 ---

 Key: HIVE-5252
 URL: https://issues.apache.org/jira/browse/HIVE-5252
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5252.1.patch.txt, HIVE-5252.2.patch.txt


 Something to the effect of compile 'my code here' using 'groovycompiler'.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5518) ADD JAR should add entries to local classpath

2013-10-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792844#comment-13792844
 ] 

Edward Capriolo commented on HIVE-5518:
---

Anecdotally. Anything required as part of an input format needs to be on the 
aux_path. They are needed to read the data, where as UDFs need not be on the 
aux_path as they are used inside operators. It would be great if we could unify 
these concepts without making the classpath needed to launch every job very 
large. 

 ADD JAR should add entries to local classpath
 -

 Key: HIVE-5518
 URL: https://issues.apache.org/jira/browse/HIVE-5518
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: Nick Dimiduk

 Jars referenced in {{ADD JAR}} statements are not made available on the 
 immediate classpath. That means they're useless for scripts which need to 
 initialize external output formats for job submission (ie, hbase storage 
 handler). Is this expected behavior?
 For example, the table 'pagecounts_hbase' is an hbase table defined using the 
 HBaseStorageHandler
 {noformat}
 $ cat foo.hql
 ADD FILE /etc/hbase/conf/hbase-site.xml;
 ADD JAR /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-68-hadoop2.jar;
 FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 
 'en/q%' LIMIT 10;
 $ hive -f foo.hql
 ...
 Added resource: /etc/hbase/conf/hbase-site.xml
 Added /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar to class 
 path
 Added resource: /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar
 ...
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
 [29/1858]
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at 
 org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:305)
 at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:98)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:989)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:892)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:730)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:707)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1196)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at 
 

[jira] [Commented] (HIVE-5494) Vectorization throws exception with nested UDF.

2013-10-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792856#comment-13792856
 ] 

Edward Capriolo commented on HIVE-5494:
---

Looks good. Thank you for adding the end-to-end test.

 Vectorization throws exception with nested UDF.
 ---

 Key: HIVE-5494
 URL: https://issues.apache.org/jira/browse/HIVE-5494
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5494.1.patch, HIVE-5494.2.patch


 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: 
 GenericUDFAbs, is not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133)
 ... 41 more
 FAILED: RuntimeException java.lang.reflect.InvocationTargetException
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5430) NOT expression doesn't handle nulls correctly.

2013-10-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792970#comment-13792970
 ] 

Edward Capriolo commented on HIVE-5430:
---

I agree with that. I thought we had a similar issue open that would use 
standard udfs inside the vectorized ones. I do not agree with calling them 
legacy though. We should pick a better nomenclature, possible 
non-vectorized or something.

 NOT expression doesn't handle nulls correctly.
 --

 Key: HIVE-5430
 URL: https://issues.apache.org/jira/browse/HIVE-5430
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5430.1.patch, HIVE-5430.2.patch, HIVE-5430.3.patch, 
 HIVE-5430.4.patch


 NOT expression doesn't handle nulls correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793064#comment-13793064
 ] 

Edward Capriolo commented on HIVE-5423:
---

It would be really good to go + 1 on this. Then I can begine the process of 
removing many rather slow .q tests.

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.6.patch.txt, HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5494) Vectorization throws exception with nested UDF.

2013-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791617#comment-13791617
 ] 

Edward Capriolo commented on HIVE-5494:
---

Q. Should we be testing with null values?  
1. Should we be testing results. This test only shows that we are no longer 
throwing an exception at this point, but we are not showing the feature works 
in any meaningful way. After this test can't we just end up with another 
exception later in the code?

 Vectorization throws exception with nested UDF.
 ---

 Key: HIVE-5494
 URL: https://issues.apache.org/jira/browse/HIVE-5494
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5494.1.patch


 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: 
 GenericUDFAbs, is not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133)
 ... 41 more
 FAILED: RuntimeException java.lang.reflect.InvocationTargetException
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HIVE-5494) Vectorization throws exception with nested UDF.

2013-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791617#comment-13791617
 ] 

Edward Capriolo edited comment on HIVE-5494 at 10/10/13 4:06 PM:
-

Q. Should we be testing with null values?  
1. Should we be testing results. This test only shows that we are no longer 
throwing an exception at this point, but we are not showing the feature works 
in any meaningful way. After this test can't we just end up with another 
exception later in the code?

I think we need an end to end test here:
a
5
null
1
select sum ( abs(a) ) from table
6



was (Author: appodictic):
Q. Should we be testing with null values?  
1. Should we be testing results. This test only shows that we are no longer 
throwing an exception at this point, but we are not showing the feature works 
in any meaningful way. After this test can't we just end up with another 
exception later in the code?

 Vectorization throws exception with nested UDF.
 ---

 Key: HIVE-5494
 URL: https://issues.apache.org/jira/browse/HIVE-5494
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5494.1.patch


 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Udf: 
 GenericUDFAbs, is not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:465)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:274)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getAggregatorExpression(VectorizationContext.java:1512)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.init(VectorGroupByOperator.java:133)
 ... 41 more
 FAILED: RuntimeException java.lang.reflect.InvocationTargetException
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5518) ADD JAR should add entries to local classpath

2013-10-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792331#comment-13792331
 ] 

Edward Capriolo commented on HIVE-5518:
---

Lets look into this. I do not see a reason why the auxpath and add jar list can 
not be combined. It sure would make many things easier.

 ADD JAR should add entries to local classpath
 -

 Key: HIVE-5518
 URL: https://issues.apache.org/jira/browse/HIVE-5518
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.12.0
Reporter: Nick Dimiduk

 Jars referenced in {{ADD JAR}} statements are not made available on the 
 immediate classpath. That means they're useless for scripts which need to 
 initialize external output formats for job submission (ie, hbase storage 
 handler). Is this expected behavior?
 For example, the table 'pagecounts_hbase' is an hbase table defined using the 
 HBaseStorageHandler
 {noformat}
 $ cat foo.hql
 ADD FILE /etc/hbase/conf/hbase-site.xml;
 ADD JAR /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-server-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-client-0.96.0.2.0.6.0-68-hadoop2.jar;
 ADD JAR /usr/lib/hbase/lib/hbase-protocol-0.96.0.2.0.6.0-68-hadoop2.jar;
 FROM pgc INSERT INTO TABLE pagecounts_hbase SELECT pgc.* WHERE rowkey LIKE 
 'en/q%' LIMIT 10;
 $ hive -f foo.hql
 ...
 Added resource: /etc/hbase/conf/hbase-site.xml
 Added /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar to class 
 path
 Added resource: /usr/lib/hbase/lib/hbase-common-0.96.0.2.0.6.0-68-hadoop2.jar
 ...
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
 [29/1858]
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:266)
 at 
 org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:305)
 at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:98)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:989)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:892)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:730)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:707)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1196)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1053)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8342)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
 at 
 

[jira] [Commented] (HIVE-5252) Add ql syntax for inline java code creation

2013-10-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790531#comment-13790531
 ] 

Edward Capriolo commented on HIVE-5252:
---

Groovyc (groovy compiler) requires ant. Ant is on our classpath for development 
but we need to add it as a ql. dep because otherwise it does not get added to 
hive/lib in the package.

 Add ql syntax for inline java code creation
 ---

 Key: HIVE-5252
 URL: https://issues.apache.org/jira/browse/HIVE-5252
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5252.1.patch.txt


 Something to the effect of compile 'my code here' using 'groovycompiler'.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation

2013-10-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5252:
--

Attachment: HIVE-5252.2.patch.txt

 Add ql syntax for inline java code creation
 ---

 Key: HIVE-5252
 URL: https://issues.apache.org/jira/browse/HIVE-5252
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5252.1.patch.txt, HIVE-5252.2.patch.txt


 Something to the effect of compile 'my code here' using 'groovycompiler'.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5491) Some lazy DeferredObjects inspectors are fat

2013-10-08 Thread Edward Capriolo (JIRA)
Edward Capriolo created HIVE-5491:
-

 Summary: Some lazy DeferredObjects inspectors are fat
 Key: HIVE-5491
 URL: https://issues.apache.org/jira/browse/HIVE-5491
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Priority: Minor


I was looking at some of the implementations of DeferredObject. I found that 
some carry two extra-properties:
boolean eager;
boolean eval;

Where eval is used to track if the obj is initiated. My thinking is that these 
extra properties make the objects fat and if removed it allows us to fit more 
lazy objects in the same memory. 




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5491) Some lazy DeferredObjects inspectors are fat

2013-10-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5491:
--

Attachment: HIVE-5491.1.patch.txt

Hive will not tolerate fat lazy code! jk

 Some lazy DeferredObjects inspectors are fat
 --

 Key: HIVE-5491
 URL: https://issues.apache.org/jira/browse/HIVE-5491
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5491.1.patch.txt


 I was looking at some of the implementations of DeferredObject. I found that 
 some carry two extra-properties:
 boolean eager;
 boolean eval;
 Where eval is used to track if the obj is initiated. My thinking is that 
 these extra properties make the objects fat and if removed it allows us to 
 fit more lazy objects in the same memory. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5491) Some lazy DeferredObjects inspectors are fat

2013-10-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5491:
--

Assignee: Edward Capriolo
  Status: Patch Available  (was: Open)

 Some lazy DeferredObjects inspectors are fat
 --

 Key: HIVE-5491
 URL: https://issues.apache.org/jira/browse/HIVE-5491
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5491.1.patch.txt


 I was looking at some of the implementations of DeferredObject. I found that 
 some carry two extra-properties:
 boolean eager;
 boolean eval;
 Where eval is used to track if the obj is initiated. My thinking is that 
 these extra properties make the objects fat and if removed it allows us to 
 fit more lazy objects in the same memory. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5497) Hive trunk broken against hadoop 0.20.2

2013-10-08 Thread Edward Capriolo (JIRA)
Edward Capriolo created HIVE-5497:
-

 Summary: Hive trunk broken against hadoop 0.20.2
 Key: HIVE-5497
 URL: https://issues.apache.org/jira/browse/HIVE-5497
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo
Priority: Blocker


ommon-0.13.0-SNAPSHOT.jar!/hive-log4j.properties
hive compile `import org.apache.hadoop.hive.ql.exec.UDF \;
 public class Pyth extends UDF {
   public double evaluate(double a, double b){
 return Math.sqrt((a*a) + (b*b)) \;
   }
 } ` AS GROOVY NAMED Pyth.groovy;
Added /tmp/0_1381290655403.jar to class path
Added resource: /tmp/0_1381290655403.jar
hive create temporary function Pyth as 'Pyth';
OK
Time taken: 0.445 seconds
hive select Pyth(a,b) from a;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Exception in thread main java.lang.UnsupportedOperationException: Kerberos 
not supported in current hadoop version
at 
org.apache.hadoop.hive.shims.Hadoop20Shims.getTokenFileLocEnvName(Hadoop20Shims.java:775)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:653)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Execution failed with exit status: 1
Obtaining error information

Task failed!
Task ID:
  Stage-1

Logs:

/tmp/edward/hive.log
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
hive 




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation

2013-10-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5252:
--

Attachment: HIVE-5252.1.patch.txt

 Add ql syntax for inline java code creation
 ---

 Key: HIVE-5252
 URL: https://issues.apache.org/jira/browse/HIVE-5252
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5252.1.patch.txt


 Something to the effect of compile 'my code here' using 'groovycompiler'.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5252) Add ql syntax for inline java code creation

2013-10-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5252:
--

Status: Patch Available  (was: Open)

 Add ql syntax for inline java code creation
 ---

 Key: HIVE-5252
 URL: https://issues.apache.org/jira/browse/HIVE-5252
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5252.1.patch.txt


 Something to the effect of compile 'my code here' using 'groovycompiler'.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788196#comment-13788196
 ] 

Edward Capriolo commented on HIVE-5460:
---

This should be ready for review.

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5460.1.patch.txt


 ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5253) Create component to compile and jar dynamic code

2013-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788208#comment-13788208
 ] 

Edward Capriolo commented on HIVE-5253:
---

With no -1's registered there is no blocker. This issue is already a month old. 
If someone wanted to have a debate over it the time was a month ago. We should 
not block features for a month over random security debates.  There already is 
pandora's box of 'public static HashMaps', ThreadLocal variables, and other 
things that if people REALLY want to talk about security they can fix.

 Create component to compile and jar dynamic code
 

 Key: HIVE-5253
 URL: https://issues.apache.org/jira/browse/HIVE-5253
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5253.10.patch.txt, HIVE-5253.11.patch.txt, 
 HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, 
 HIVE-5253.3.patch.txt, HIVE-5253.8.patch.txt, HIVE-5253.9.patch.txt, 
 HIVE-5253.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788211#comment-13788211
 ] 

Edward Capriolo commented on HIVE-5460:
---

99% of the windowing merge violated our code conventions. I am slowly fixing 
these issues as we find bugs and other tweeks in the code. You know next time 
someone does a code pie chart about a release I want to have the most lines of 
code :)

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5460.1.patch.txt


 ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787634#comment-13787634
 ] 

Edward Capriolo commented on HIVE-5423:
---

It seems like our tests/ junit cant understand not to run the bast class. I 
will rename it from TestBase to BaseTest and we will see if jenkins is happier

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-06 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5423:
--

Attachment: HIVE-5423.6.patch.txt

Renamed base class so hopefully we can keep it abstract.

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.6.patch.txt, HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5464) allow OR conditions in table join

2013-10-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787652#comment-13787652
 ] 

Edward Capriolo commented on HIVE-5464:
---

Hive only supports equi-joins like this. You can still accomplish this query 
using a cart-product. Map Reduce can not easily make this query efficient, are 
you planning to work on this? We have to think carefully if we want this. There 
is a big danger in adding things to hive that can not be done efficiently in 
map/reduce.

 allow OR conditions in table join
 -

 Key: HIVE-5464
 URL: https://issues.apache.org/jira/browse/HIVE-5464
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: N Campbell

 select tjoin1.c1, tjoin1.c2, tjoin2.c2 as c2j2 from tjoin1 tjoin1 inner join 
 tjoin2 tjoin2 on ( tjoin1.c1 = 10 or tjoin1.c1=20 )
 Query returned non-zero code: 10019, cause: FAILED: SemanticException [Error 
 10019]: Line 1:96 OR not supported in JOIN currently '20'



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-06 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-5460:
-

Assignee: Edward Capriolo

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor

 ISO-SQL 2012 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787665#comment-13787665
 ] 

Edward Capriolo commented on HIVE-5460:
---

Can you provide a link to the definition of what it should do?

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor

 ISO-SQL 2012 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787737#comment-13787737
 ] 

Edward Capriolo commented on HIVE-5423:
---

Ok looks good! yay!

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.6.patch.txt, HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-06 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5460:
--

Status: Patch Available  (was: Open)

Should test.

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5460.1.patch.txt


 ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5460) invalid offsets in lag lead should return an exception (per ISO-SQL)

2013-10-06 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5460:
--

Attachment: HIVE-5460.1.patch.txt

 invalid offsets in lag lead should return an exception (per ISO-SQL) 
 -

 Key: HIVE-5460
 URL: https://issues.apache.org/jira/browse/HIVE-5460
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Edward Capriolo
Priority: Minor
 Attachments: HIVE-5460.1.patch.txt


 ISO-SQL 2011 defines how lag and lead should behave when invalid offsets are 
 provided to the functions.
 i.e. select tint.rnum,tint.cint, lag( tint.cint, -100 )  over ( order by 
 tint.rnum) from tint tint 
 Instead of a meaningful error (as other vendors will emit) you get 
 Error: Query returned non-zero code: 2, cause: FAILED: Execution Error, 
 return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
 SQLState:  08S01
 ErrorCode: 2



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5400) Allow admins to disable compile and other commands

2013-10-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5400:
--

Fix Version/s: 0.13.0
 Assignee: Brock Noland  (was: Edward Capriolo)

Committed. Thanks Brock

 Allow admins to disable compile and other commands
 --

 Key: HIVE-5400
 URL: https://issues.apache.org/jira/browse/HIVE-5400
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.13.0

 Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch


 From here: 
 https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220
  I think we should afford admins who want to disable this functionality the 
 ability to do so. Since such admins might want to disable other commands such 
 as add or dfs, it wouldn't be much trouble to allow them to do this as well. 
 For example we could have a configuration option hive.available.commands 
 (or similar) which specified add,set,delete,reset, etc by default. Then check 
 this value in CommandProcessorFactory. It would probably make sense to add 
 this property to the restrict list.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5400) Allow admins to disable compile and other commands

2013-10-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5400:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks again Brock.

 Allow admins to disable compile and other commands
 --

 Key: HIVE-5400
 URL: https://issues.apache.org/jira/browse/HIVE-5400
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.13.0

 Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch


 From here: 
 https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220
  I think we should afford admins who want to disable this functionality the 
 ability to do so. Since such admins might want to disable other commands such 
 as add or dfs, it wouldn't be much trouble to allow them to do this as well. 
 For example we could have a configuration option hive.available.commands 
 (or similar) which specified add,set,delete,reset, etc by default. Then check 
 this value in CommandProcessorFactory. It would probably make sense to add 
 this property to the restrict list.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787445#comment-13787445
 ] 

Edward Capriolo commented on HIVE-5423:
---

This version is ready for review. Removed excessed files. Renamed as Brock 
suggested, moved files as mark suggested.

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5423) Speed up testing of scalar UDFS

2013-10-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5423:
--

Attachment: HIVE-5423.5.patch.txt

 Speed up testing of scalar UDFS
 ---

 Key: HIVE-5423
 URL: https://issues.apache.org/jira/browse/HIVE-5423
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5423.1.patch.txt, HIVE-5423.5.patch.txt, 
 HIVE-5423.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5334) Milestone 3: Some tests pass under maven

2013-10-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786758#comment-13786758
 ] 

Edward Capriolo commented on HIVE-5334:
---

Looks fine

 Milestone 3: Some tests pass under maven
 

 Key: HIVE-5334
 URL: https://issues.apache.org/jira/browse/HIVE-5334
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-5334.patch, HIVE-5334.patch


 This milestone is that some tests pass and therefore we have the basic unit 
 test environment setup. We'll hunt down the rest of the failing tests in 
 future jiras.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5087) Rename npath UDF to matchpath

2013-10-03 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785282#comment-13785282
 ] 

Edward Capriolo commented on HIVE-5087:
---

I am back under the opinion we should just remove this UDF. You could make a 
sequel to 'office space' based on the story behind this UDF

'yea... im going to need you to come in on Saturday and rename this udf'
'yea...im going to need you to come in on sunday because its saturday and I 
dont know the name yet'
'yea...im going to need you to come in next saturday because we are not sure if 
we should rename it yet'

It would be a block buster for sure.




 Rename npath UDF to matchpath
 -

 Key: HIVE-5087
 URL: https://issues.apache.org/jira/browse/HIVE-5087
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5087.1.patch.txt, HIVE-5087.99.patch.txt, 
 HIVE-5087-matchpath.1.patch.txt, HIVE-5087.patch.txt, HIVE-5087.patch.txt, 
 regex_path.diff






--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   3   4   5   6   7   8   9   10   >