Re: mapreduce against hive raw data ?

2010-10-01 Thread Edward Capriolo
On Fri, Oct 1, 2010 at 3:08 PM, Jinsong Hu  wrote:
> Hi, There:
>  I wonder if it is possible to run map-reduce again hive's raw data.
> hive supports hql, but sometimes, I want to run map-reduce to do more
> sophisticated
> processing than those simple hql can handle. In that case, I need to run my
> own custom map-reduce job against hive's raw data.
>  I wonder if that is possible. The key issue is where to find those files
> and how to deserialize them.
> Can anybody point me into the right location to find the API ?
>
> Jimmy.
>

Jimmy,

The files are typically found in /user/hive/warehouse/

By default they would be TextFiles delimited with ^A. But depending
how you defined the table, possibly other delimiters,sequence files
they could in a different format.


[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917016#action_12917016
 ] 

Edward Capriolo commented on HIVE-1611:
---

 Now that hive is TLP we likely have to get the ball rolling and cut the cord 
with hadoop. I will contact infra and see what our options are. We have a few 
issues. 
-we need to move the SVN from a hadoop subproject to a toplevel svn. 
-after we do that we need to take the forest docs and move them into hive then 
can change the search box

If we want to see the skinconf change done first, we should open/transfer this 
ticket to core I believe.


> Add alternative search-provider to Hive site
> 
>
> Key: HIVE-1611
> URL: https://issues.apache.org/jira/browse/HIVE-1611
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
> Attachments: HIVE-1611.patch
>
>
> Use search-hadoop.com service to make available search in Hive sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) before 
> so this issue is about enabling it for Hive. The ultimate goal is to use it 
> at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914741#action_12914741
 ] 

Edward Capriolo commented on HIVE-1668:
---

{quote}It should also help mature the product for eventual inclusion in 
trunk.{quote}
Why would we move something from hive out to github, just to move it back to 
hive?

{quote}Empirically, they don't. The value of the web interface to users is not 
nearly as high as the pain it causes the developers for maintenance.{quote}
Who are these developers who maintain it? Has anyone every added a feature 
beside me? I'm not complaining.

http://blog.milford.io/2010/06/getting-the-hive-web-interface-hwi-to-work-on-centos/
{quote}The Hive Web Interface is a pretty sweet deal.{quote} 
Sounds like people like it. 

Why are we debating the past state of hwi? It works now. If someone reports a 
bug I typically investigate and patch that same day.

I challenge anyone to open a ticket on core user, called "remove name node web 
interface to github" and tried to say " now offers a better name node 
interface using python." The ticket would instantly get a "RESOLVED: WILL NOT 
FIX".  Why is this any different? 










> Move HWI out to Github
> --
>
> Key: HIVE-1668
> URL: https://issues.apache.org/jira/browse/HIVE-1668
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Jeff Hammerbacher
>
> I have seen HWI cause a number of build and test errors, and it's now going 
> to cost us some extra work for integration with security. We've worked on 
> hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
> Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
> with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914669#action_12914669
 ] 

Edward Capriolo commented on HIVE-1668:
---

{quote}That's not a great argument for keeping code that's onerous to maintain 
in trunk.{quote}
Its not onerous to maintain. As you can see from the tickets I pointed out it 
broke because it was not tested. 

For example, 
https://issues.apache.org/jira/browse/HIVE-752 when designing SHIM classes that 
specify a classname in a string, one has to make sure they get the class name 
correct. I know it was an over site, but I am sure someone fired up the CLI and 
made sure the class name was correct.

As for https://issues.apache.org/jira/browse/HIVE-978, I specifically mentioned 
how to test this any why it should be tested in the patch and it still turned 
out not to work right. 

pragmatic is the perfect word. HWI was never made to be fancy. Anyone who has 
hive can build and run the web interface. With no extra dependencies. It looks 
like to use Beeswax you need Hue, which means you need to go somewhere else and 
get it and install it. It seems like you need to patch or load extra plugins to 
your namenode and datanode like org.apache.hadoop.thriftfs.NamenodePlugin, It 
looks like (http://archive.cloudera.com/cdh/3/hue/manual.html#_install_hue) you 
need: 
gcc  gcc
libxml2-devel   libxml2-dev
libxslt-devel   libxslt-dev
mysql-devel librarysqlclient-dev
python-develpython-dev
python-setuptools   python-setuptools
sqlite-devellibsqlite3-dev 

The pragmatic approach, is to use the web interface provided by hive. You do 
not need anything external like python, or have to make any changes to their 
environment. That is why I think we should stay part of the hive distribution. 
 
I'm -1 on taking it out.  

> Move HWI out to Github
> --
>
> Key: HIVE-1668
> URL: https://issues.apache.org/jira/browse/HIVE-1668
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Jeff Hammerbacher
>
> I have seen HWI cause a number of build and test errors, and it's now going 
> to cost us some extra work for integration with security. We've worked on 
> hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
> Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
> with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914605#action_12914605
 ] 

Edward Capriolo commented on HIVE-1668:
---

Plus, not to get too far off topic, but there is a huge portion of the hadoop 
community that thinks "Security? So what? Who cares?" I am not going to run 
Active Directory or Kerberos just so I can say "My hadoop is is secure" . It 
adds latency to many processes, complexity to the overall design of hadoop, and 
does not even encrypt data in transit. Many people are going to elect not to 
use "hadoop security" for those reasons. Is "extra work" a reason not to do 
something? Are we going to move the Hive Thrift server out to github too 
because of the burden of "extra work"? It is a lot of extra work for me when 
hadoop renames all its jmx counters or tells me "all my code is deprecated 
because of our new slick mapreduce.* api". I have learned to roll with the 
punches.

> Move HWI out to Github
> --
>
> Key: HIVE-1668
> URL: https://issues.apache.org/jira/browse/HIVE-1668
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Jeff Hammerbacher
>
> I have seen HWI cause a number of build and test errors, and it's now going 
> to cost us some extra work for integration with security. We've worked on 
> hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
> Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
> with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914584#action_12914584
 ] 

Edward Capriolo commented on HIVE-1668:
---

Jeff,
I disagree. The build and test errors are not insurmountable. In fact some if 
not most of the "ERRORS" were cascading changes that were not tested properly. 
For example:

https://issues.apache.org/jira/browse/HIVE-1183 was a fix I had to do because 
someone broke it here. https://issues.apache.org/jira/browse/HIVE-978 because 
someone wanted all jars to be named whatever.${version} and did not bother to 
look across all the shell script files that startup hive. 

https://issues.apache.org/jira/browse/HIVE-1294 again someone changed some 
shell scripts and only tested the cli.

https://issues.apache.org/jira/browse/HIVE-752 again someone broke hwi without 
testing it.

https://issues.apache.org/jira/browse/HIVE-1615, not really anyone's fault but 
no API stability across hive. I do not see why one method went away and another 
similar method took its place.

I have been of course talking about moving HWI to wikit for a while moving from 
JSP to Servlet/ Java code will fix errors, but the little time I do have I 
usually have to spend detecting and cleaning up other breakages.

HUE and Beeswax I honestly do not know, but sounds like you need extra magical 
stuff to make this work, and HWI works with hive on its own (onless people 
break it)

> Move HWI out to Github
> --
>
> Key: HIVE-1668
> URL: https://issues.apache.org/jira/browse/HIVE-1668
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Jeff Hammerbacher
>
> I have seen HWI cause a number of build and test errors, and it's now going 
> to cost us some extra work for integration with security. We've worked on 
> hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
> Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
> with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913741#action_12913741
 ] 

Edward Capriolo commented on HIVE-842:
--

By attack the Web UI separately what is meant? Will it be broken or 
non-functional at any phase here? That is what I find happens often, some of it 
is really the WUI's fault for using JSP and not servlets, but there is no 
simple way to code cover the wui and all the different ways its gets broken. 

> Authentication Infrastructure for Hive
> --
>
> Key: HIVE-842
> URL: https://issues.apache.org/jira/browse/HIVE-842
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>    Reporter: Edward Capriolo
>Assignee: Todd Lipcon
> Attachments: HiveSecurityThoughts.pdf
>
>
> This issue deals with the authentication (user name,password) infrastructure. 
> Not the authorization components that specify what a user should be able to 
> do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-268) "Insert Overwrite Directory" to accept configurable table row format

2010-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910656#action_12910656
 ] 

Edward Capriolo commented on HIVE-268:
--

Still not exactly what you want but with CTAS you can essentially get a folder 
in /user/hive/warehouse/ with the format you want.

> "Insert Overwrite Directory" to accept configurable table row format
> 
>
> Key: HIVE-268
> URL: https://issues.apache.org/jira/browse/HIVE-268
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Zheng Shao
>Assignee: Paul Yang
>
> There is no way for the users to control the file format when they are 
> outputting the result into a directory.
> We should allow:
> {code}
> INSERT OVERWRITE DIRECTORY "/user/zshao/result"
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '9'
> SELECT tablea.* from tablea;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Fix Version/s: 0.7.0
   (was: 0.6.0)
Affects Version/s: 0.6.0
   (was: 0.5.1)

> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Blocker
> Fix For: 0.7.0
>
> Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Fix Version/s: 0.6.0
   (was: 0.7.0)
Affects Version/s: 0.5.1
   (was: 0.6.0)
 Priority: Blocker  (was: Major)

I think we should patch this as well functionality was broken.

> hive --service jar looks for hadoop version but was not defined
> ---
>
> Key: HIVE-1613
> URL: https://issues.apache.org/jira/browse/HIVE-1613
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.5.1
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: hive-1613.patch.txt
>
>
> hive --service jar fails. I have to open another ticket to clean up the 
> scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Fix Version/s: 0.6.0
   (was: 0.7.0)
Affects Version/s: 0.5.1
   (was: 0.7.0)
 Priority: Blocker  (was: Major)

Moving this up the meta store changes recently broke the web interface

> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.5.1
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Attachment: hive-1615.patch.2.txt

> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.7.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0
Fix Version/s: 0.7.0

> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.7.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Summary: Web Interface JSP needs Refactoring for removed meta store methods 
 (was: Web Interface JSP needs Refactoring for deprecated meta store methods)

> Web Interface JSP needs Refactoring for removed meta store methods
> --
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for deprecated meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Attachment: hive-1615.patch.txt

> Web Interface JSP needs Refactoring for deprecated meta store methods
> -
>
> Key: HIVE-1615
> URL: https://issues.apache.org/jira/browse/HIVE-1615
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: hive-1615.patch.txt
>
>
> Some meta store methods being called from JSP have been removed. Really 
> should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1615) Web Interface JSP needs Refactoring for deprecated meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)
Web Interface JSP needs Refactoring for deprecated meta store methods
-

 Key: HIVE-1615
 URL: https://issues.apache.org/jira/browse/HIVE-1615
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Edward Capriolo
Assignee: Edward Capriolo


Some meta store methods being called from JSP have been removed. Really should 
prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Status: Patch Available  (was: Open)

> hive --service jar looks for hadoop version but was not defined
> ---
>
> Key: HIVE-1613
> URL: https://issues.apache.org/jira/browse/HIVE-1613
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1613.patch.txt
>
>
> hive --service jar fails. I have to open another ticket to clean up the 
> scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Attachment: hive-1613.patch.txt

> hive --service jar looks for hadoop version but was not defined
> ---
>
> Key: HIVE-1613
> URL: https://issues.apache.org/jira/browse/HIVE-1613
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1613.patch.txt
>
>
> hive --service jar fails. I have to open another ticket to clean up the 
> scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)
hive --service jar looks for hadoop version but was not defined
---

 Key: HIVE-1613
 URL: https://issues.apache.org/jira/browse/HIVE-1613
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0


hive --service jar fails. I have to open another ticket to clean up the scripts 
and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Build Crashing on Hive 0.5 Release

2010-09-02 Thread Edward Capriolo
On Thu, Sep 2, 2010 at 5:12 PM, Stephen Watt  wrote:
> Hi Folks
>
> I'm a Hadoop contributor and am presently working to get both Hadoop and
> Hive running on alternate JREs such as Apache Harmony and IBM Java.
>
> I noticed when building and running the functional tests ("clean test
> tar") for the Hive 0.5 release (i.e. not nightly build) , the build
> crashes right after running
> org.apache.hadoop.hive.ql.tool.TestLineageInfo. In addition, the
> TestCLIDriver Test Case fails as well. This is all using SUN JDK 1.60_14.
> I'm running on a SLES 10 system.
>
> This is a little odd, given that this is a release and not a nightly
> build. Although, its not uncommon for me to see Hudson pass tests that
> fail when running locally. Can someone confirm the build works for them?
>
> This is my build script:
>
> #!/bin/sh
>
> # Set Build Dependencies
> set PATH=$PATH:/home/hive/Java-Versions/jdk1.6.0_14/bin/
> export ANT_HOME=/home/hive/Test-Dependencies/apache-ant-1.7.1
> export JAVA_HOME=/home/hive/Java-Versions/jdk1.6.0_14
> export BUILD_DIR=/home/hive/hive-0.5.0-build
> export HIVE_BUILD=$BUILD_DIR/build
> export HIVE_INSTALL=$BUILD_DIR/hive-0.5.0-dev/
> export HIVE_SRC=$HIVE_INSTALL/src
> export PATH=$PATH:$ANT_HOME/bin
>
> # Define Hadoop Version to Use
> HADOOP_VER=0.20.2
>
> # Run Build and Unit Test
> cd $HIVE_SRC
> ant -Dtarget.dir=$HIVE_BUILD -Dhadoop.version=$HADOOP_VER clean test tar >
> $BUILD_DIR/hiveSUN32Build.out
>
>
> Regards
> Steve Watt

I seem to remember. There were some older bugs when specifying the
minor versions of the 20 branch.
can you try:

HADOOP_VER=0.20.0

Rather then:

HADOOP_VER=0.20.2


[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Status: Patch Available  (was: Open)

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
> hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Attachment: HIVE-471.6.patch.txt

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
> hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-471) A UDF for simple reflection

2010-08-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902883#action_12902883
 ] 

Edward Capriolo commented on HIVE-471:
--

@Namit,
I thought we were getting rid of negative tests. Am I misinformed?


> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902616#action_12902616
 ] 

Edward Capriolo commented on HIVE-1434:
---

Maven, I am on the fence about it. We actually do not need all the libs I 
included. Having them in a tarball sounds good, but making a maven repo for 
only this purpose seems to be a lot of work.

{quote}
Should we attempt to factor out the HBase commonality immediately, or commit 
the overlapping code and then do refactoring as a followup? I'm fine either 
way; I can give suggestions on how to create the reusable abstract bases and 
where to package+name them.{quote}
If you can specify specific instances then sure. The code may be 99% the same, 
but that one nuance is going to make the abstractions confusing and useless. 

I await further review.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-25 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   (was: 0.5.1)
Fix Version/s: 0.7.0

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.6.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-25 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Attachment: HIVE-471.5.patch

Almost two years in the making!

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.1
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Draft Resolution to make Hive a TLP

2010-08-25 Thread Edward Capriolo
+1

On Wed, Aug 25, 2010 at 1:43 AM, Namit Jain  wrote:
> +1
>
> 
> From: Ning Zhang [nzh...@facebook.com]
> Sent: Tuesday, August 24, 2010 9:18 PM
> To: 
> Subject: Re: [VOTE] Draft Resolution to make Hive a TLP
>
> +1
>
> On Aug 24, 2010, at 8:50 PM, Carl Steinbach wrote:
>
>> +1
>>
>> On Tue, Aug 24, 2010 at 6:56 PM, Ashish Thusoo  wrote:
>>
>>> Folks,
>>>
>>> I am going to make the following proposal at gene...@hadoop.apache.org
>>>
>>> In summary this proposal does the following things:
>>>
>>> 1. Establishes the PMC as comprising of the current committers of Hive (as
>>> of today - 8/24/2010).
>>>
>>> 2. Proposes Namit Jain to the chair of the project (PMC chairs have no more
>>> power than other PMC members, but they are responsible for writing regular
>>> reports for the Apache board, assigning rights to new committers, etc.)
>>>
>>> 3. Tasks the PMC to come up with the bylaws for governance of the project.
>>>
>>> Please vote on this as soon as possible(yes I should have done this as part
>>> of the earlier vote, but please bear with me), so that we can get the ball
>>> rolling on this...
>>>
>>> Thanks,
>>> Ashish
>>>
>>> Draft Resolution to be sent to the Apache Board
>>> ---
>>>
>>> Establish the Apache Hive Project
>>>
>>>        WHEREAS, the Board of Directors deems it to be in the best
>>>        interests of the Foundation and consistent with the
>>>        Foundation's purpose to establish a Project Management
>>>        Committee charged with the creation and maintenance of
>>>        open-source software related to parallel analysis of large
>>>        data sets for distribution at no charge to the public.
>>>
>>>        NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>>        Committee (PMC), to be known as the "Apache Hive Project",
>>>        be and hereby is established pursuant to Bylaws of the
>>>        Foundation; and be it further
>>>
>>>        RESOLVED, that the Apache Hive Project be and hereby is
>>>        responsible for the creation and maintenance of software
>>>        related to parallel analysis of large data sets; and be
>>>        it further
>>>
>>>        RESOLVED, that the office of "Vice President, Apache Hive" be
>>>        and hereby is created, the person holding such office to
>>>        serve at the direction of the Board of Directors as the chair
>>>        of the Apache Hive Project, and to have primary responsibility
>>>        for management of the projects within the scope of
>>>        responsibility of the Apache Hive Project; and be it further
>>>
>>>        RESOLVED, that the persons listed immediately below be and
>>>        hereby are appointed to serve as the initial members of the
>>>        Apache Hive Project:
>>>            * Namit Jain (na...@apache.org)
>>>            * John Sichi (j...@apache.org)
>>>            * Zheng Shao (zs...@apache.org)
>>>            * Edward Capriolo (appodic...@apache.org)
>>>            * Raghotham Murthy (r...@apache.org)
>>>            * Ning Zhang (nzh...@apache.org)
>>>            * Paul Yang (pa...@apache.org)
>>>            * He Yongqiang (he yongqi...@apache.org)
>>>            * Prasad Chakka (pras...@apache.org)
>>>            * Joydeep Sen Sarma (jsensa...@apache.org)
>>>            * Ashish Thusoo (athu...@apache.org)
>>>
>>>        NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain
>>>        be appointed to the office of Vice President, Apache Hive, to
>>>        serve in accordance with and subject to the direction of the
>>>        Board of Directors and the Bylaws of the Foundation until
>>>        death, resignation, retirement, removal or disqualification,
>>>        or until a successor is appointed; and be it further
>>>
>>>        RESOLVED, that the initial Apache Hive PMC be and hereby is
>>>        tasked with the creation of a set of bylaws intended to
>>>        encourage open development and increased participation in the
>>>        Apache Hive Project; and be it further
>>>
>>>        RESOLVED, that the Apache Hive Project be and hereby
>>>        is tasked with the migration and rationalization of the Apache
>>>        Hadoop Hive sub-project; and be it further
>>>
>>>        RESOLVED, that all responsibilities pertaining to the Apache
>>>        Hive sub-project encumbered upon the
>>>        Apache Hadoop Project are hereafter discharged.
>>>
>>>
>
>


Re: [DISCUSSION] Move to become a TLP

2010-08-20 Thread Edward Capriolo
I am +1 as well.



On Fri, Aug 20, 2010 at 1:29 PM, Ashish Thusoo  wrote:
> Thanks everyone who voted. Looks like this is unanimous at this point. I will 
> start the proceedings in the Hadoop PMC to make Hive a TLP.
>
> Ashish
>
> -Original Message-
> From: Paul Yang [mailto:py...@facebook.com]
> Sent: Thursday, August 19, 2010 4:05 PM
> To: hive-dev@hadoop.apache.org
> Subject: RE: [DISCUSSION] Move to become a TLP
>
> +1
>
> -Original Message-
> From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
> Sent: Thursday, August 19, 2010 3:30 PM
> To: hive-dev@hadoop.apache.org
> Subject: RE: [DISCUSSION] Move to become a TLP
>
> +1
>
> -Original Message-
> From: Carl Steinbach [mailto:c...@cloudera.com]
> Sent: Thursday, August 19, 2010 3:18 PM
> To: hive-dev@hadoop.apache.org
> Subject: Re: [DISCUSSION] Move to become a TLP
>
> +1
>
> On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang  wrote:
>
>> +1 as well.
>>
>> On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote:
>>
>> > +1.
>> >
>> > Zheng
>> >
>> > On Mon, Aug 16, 2010 at 11:58 AM, John Sichi 
>> wrote:
>> >> +1 from me.  The momentum on cross-company collaboration we're
>> >> +seeing
>> now, plus big integration contributions such as the new storage
>> handlers (HyperTable and Cassandra), are all signs that Hive is growing up 
>> fast.
>> >>
>> >> HBase recently took the same route, so I'm going to have a chat
>> >> with
>> Jonathan Gray to find out what that involved for them.
>> >>
>> >> JVS
>> >>
>> >> On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote:
>> >>
>> >>> Yes, I think Hive is ready to become a TLP.
>> >>>
>> >>> On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo
>> >>> 
>> wrote:
>> >>>
>> >>>> Nice one Ed...
>> >>>>
>> >>>> Folks,
>> >>>>
>> >>>> Please chime in. I think we should close this out next week one
>> >>>> way or
>> the
>> >>>> other. We can consider this a vote at this point, so please vote
>> >>>> on
>> this
>> >>>> issue.
>> >>>>
>> >>>> Thanks,
>> >>>> Ashish
>> >>>>
>> >>>> -Original Message-
>> >>>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>> >>>> Sent: Thursday, August 12, 2010 8:05 AM
>> >>>> To: hive-dev@hadoop.apache.org
>> >>>> Subject: Re: [DISCUSSION] Move to become a TLP
>> >>>>
>> >>>> On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo
>> >>>> 
>> >>>> wrote:
>> >>>>> Folks,
>> >>>>>
>> >>>>> This question has come up in the PMC once again and would be
>> >>>>> great to
>> >>>> hear once more on this topic. What do people think? Are we ready
>> >>>> to
>> become a
>> >>>> TLP?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Ashish
>> >>>>
>> >>>> I thought of one more benefit. We can rename our packages from
>> >>>>
>> >>>> org.apache.hadoop.hive.*
>> >>>> to
>> >>>> org.apache.hive.*
>> >>>>
>> >>>> :)
>> >>>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Yours,
>> > Zheng
>> > http://www.linkedin.com/in/zshao
>>
>>
>


[jira] Commented: (HIVE-1505) Support non-UTF8 data

2010-08-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900697#action_12900697
 ] 

Edward Capriolo commented on HIVE-1505:
---

 Maybe you should fork hive and call it chive. 

On a serious node . Great job. Would you consider editing the cli.xml in the 
xdocs to explain this feature? I think it would be very helpful look in 
docs/xdocs/.

> Support non-UTF8 data
> -
>
> Key: HIVE-1505
> URL: https://issues.apache.org/jira/browse/HIVE-1505
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: bc Wong
>Assignee: Ted Xu
> Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900039#action_12900039
 ] 

Edward Capriolo commented on HIVE-1555:
---

I wonder if this could end up being a very effective way to query shared data 
stores. 

I think I saw something like this in futurama.. Dont worry about querying 
blank, let me worry about querying blank.
 
http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBcQFjAB&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DB5cAwTEEGNE&ei=Qk9sTLAThIqXB__DzDw&usg=AFQjCNH_TOUS1cl6t0gZXefRURw0a_feZg

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hadoop Hive
>  Issue Type: New Feature
>Affects Versions: 0.5.0
>Reporter: Bob Robertson
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-4-patch.txt

Refactored the code, added xdoc, more extensive testing.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898266#action_12898266
 ] 

Edward Capriolo commented on HIVE-1530:
---

@Joydeep +1 

> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898030#action_12898030
 ] 

Edward Capriolo commented on HIVE-1530:
---

I like the default xml. Hive has many undocumented options, new ones are being 
added often. Are end users going to know which jar the default.xml are in? 
Users want to extracting a jar just to get the conf out of it to read the 
description of the setting.

As for what hadoop does...I personally find it annoying to have navigate to 
hadoop/src/mapred/mapred-default.xml or to hadoop/src/hdfs/hdfs-default.xml to 
figure out what options I have for settings. So i do not really thing we should 
just do it to be like hadoop it it makes peoples life harder.

If anything please keep it as hive-site.xml.sample.


> Include hive-default.xml and hive-log4j.properties in hive-common JAR
> -
>
> Key: HIVE-1530
> URL: https://issues.apache.org/jira/browse/HIVE-1530
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Carl Steinbach
>
> hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
> and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
> hive-default.xml file that currently sits in the conf/ directory should be 
> removed.
> Motivations for this change:
> * We explicitly tell users that they should never modify hive-default.xml yet 
> give them the opportunity to do so by placing the file in the conf dir.
> * Many users are familiar with the Hadoop configuration mechanism that does 
> not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
> assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: hive version compatibility

2010-08-12 Thread Edward Capriolo
Jaydeep,

Currently one build of hive works with hadoop 0.17,0.18,0.19,and 0.20.
However there is talk about dropping support for older versions and
moving completely to mapreduce api.

Edward

On Thu, Aug 12, 2010 at 8:29 AM, jaydeep vishwakarma
 wrote:
> Hi,
>
> I found very interesting feature in hive version 0.6.0. Is there any
> compatibility constraint with hadoop, If yes than which hadoop version
> it supports?
>
> Regards,
> Jaydeep
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>


Re: [DISCUSSION] Move to become a TLP

2010-08-12 Thread Edward Capriolo
On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo  wrote:
> Folks,
>
> This question has come up in the PMC once again and would be great to hear 
> once more on this topic. What do people think? Are we ready to become a TLP?
>
> Thanks,
> Ashish

I thought of one more benefit. We can rename our packages from

org.apache.hadoop.hive.*
to
org.apache.hive.*

:)


Re: How HIVE manages a join

2010-08-10 Thread Edward Capriolo
Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo  wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach  wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he 
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he 
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher 
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >>  wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  
>>> >>> -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>>  wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>


Re: How HIVE manages a join

2010-08-10 Thread Edward Capriolo
This page is is already in version control..

/home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml

Edward

On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach  wrote:
> Hi Yongqiang,
> Please go ahead and update the wiki page. I will copy it over to version
> control when you are done.
> Thanks.
> Carl
>
> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he 
> wrote:
>>
>> In the Hive Join wiki page, it says
>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>
>> Where should i do the update?
>>
>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he 
>> wrote:
>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> > and seems stable now. I did one skew join but haven't get a chance to
>> > look at another skew join Namit mentioned to me. But definitely should
>> > update the wiki earlier. My bad.
>> >
>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher 
>> > wrote:
>> >> Yongqiang mentioned he was going to update the wiki with this
>> >> information in
>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>
>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>> >> map
>> >> join and the other skew join you mention in the above thread?
>> >>
>> >> Thanks,
>> >> Jeff
>> >>
>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >>  wrote:
>> >>>
>> >>> Roberto ..
>> >>>
>> >>> You can find these links useful ..
>> >>>
>> >>>
>> >>>
>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>> - Simple joins and optimizations..
>> >>>
>> >>>
>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> >>> New kind of joins / features of hive ..
>> >>>
>> >>> Thanks
>> >>>
>> >>> Bharath.V
>> >>> 4th year Undergraduate..
>> >>> IIIT Hyderabad
>> >>>
>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>>  wrote:
>> 
>>  Hi,
>> 
>>  I cannot find any documentation about what algorithm performs HIVE to
>>  translate JOIN clauses to Map-Reduce tasks.
>> 
>>  In particular, if I have two tables A and B, each table is written on
>>  a
>>  separate file and each file is splitted on hadoop nodes. When I
>>  perform a
>>  JOIN with A.column = B.column, the framework has to compare full data
>>  from
>>  the first file and full data from the second file. In order to
>>  perform a
>>  full scan of all possibile combinations of values, how can hadoop
>>  perform
>>  it? If each node contains a portion of each file, it seems not
>>  possible to
>>  have a complete comparison. Does one of the two files enterely
>>  replicated on
>>  each node? Or, does HIVE use another kind of strategy/optimization?
>> 
>>  Thanks.
>> >>
>> >>
>> >
>
>


Re: Hive local mode on by default for 0.6.0

2010-08-09 Thread Edward Capriolo
On Mon, Aug 9, 2010 at 9:28 PM, Joydeep Sen Sarma  wrote:
> We enabled a feature called 'auto-local mode' (hive-1408). The query 
> processor looks at the size of the input and decides dynamically whether 
> local mode execution can be done. The determination is done on a per job 
> level for a multi-job query.
>
> We enabled it by default in trunk so it can get some coverage. Local mode 
> support in 0.6 has some bugs (in fact a big part of this jira was a 
> comprehensive test for local mode and small fixes for the bugs that this 
> uncovered). The relevant option is:
>
> set hive.exec.mode.local.auto=
>
>
> I have been a little worried about enabling this by default - we can turn it 
> off if required. The case that worries me the most is if a lot of users refer 
> to scripts (via transform clauses) that are only available in the cluster 
> nodes and not in the client node. Another assumption is that mapred.local.dir 
> is set to a value valid on the client side (which may not be the case if the 
> same hadoop config is being shared across client and server side).
>
> Promise to add some documentation on the wiki about this ASAP.
>
> -Original Message-
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Monday, August 09, 2010 2:22 PM
> To: 
> Subject: Hive local mode on by default for 0.6.0
>
> I already caugh someone on IRC who was very surprised by the local
> mode in hive trunk. Is local mode on by default?
>
> Do you think the release 0.6.0 should have this on by default? There
> have been a few issues like HIVE-1520, and it seems like letting this
> out in the wild without actively turning it on might find edge cases
> and complications.
>
> Regards,
> Edward
>

Another thing is some people launch jobs from machines without
multiple-disks and cores, sometimes even from the namenode. I think on
these machines local performance will be poor or even dangerous. e.g.
OOM crashes name node and corrupts FSimage.

Someone came to the IRC and wondered why jobs were not showing in the
job tracker. I personally feel like it would be better to turn this
off. People that know about it and want it on can set it to true.

It is a super cool feature though.


Hive local mode on by default for 0.6.0

2010-08-09 Thread Edward Capriolo
I already caugh someone on IRC who was very surprised by the local
mode in hive trunk. Is local mode on by default?

Do you think the release 0.6.0 should have this on by default? There
have been a few issues like HIVE-1520, and it seems like letting this
out in the wild without actively turning it on might find edge cases
and complications.

Regards,
Edward


[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Status: Patch Available  (was: Open)

This patch has full read/write functionality. I am going to do another patch 
later today with xdocs, but do not expect any code changes.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-3-patch.txt

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-3-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1513) hive starter scripts should load admin/user supplied script for configurability

2010-08-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895698#action_12895698
 ] 

Edward Capriolo commented on HIVE-1513:
---

Anything you put in the bin/ext is sourced as part of the bootstrap process. 
Could you do something like bin/ext/mystuff.sh?

> hive starter scripts should load admin/user supplied script for 
> configurability
> ---
>
> Key: HIVE-1513
> URL: https://issues.apache.org/jira/browse/HIVE-1513
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Joydeep Sen Sarma
>
> it's difficult to add environment variables to Hive starter scripts except by 
> modifying the scripts directly. this is undesirable (since they are source 
> code). Hive starter scripts should load a admin supplied shell script for 
> configurability. This would be similar to what hadoop does with hadoop-env.sh

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1511) Hive plan serialization is slow

2010-08-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895352#action_12895352
 ] 

Edward Capriolo commented on HIVE-1511:
---

Also possibly a clever way to remove duplicate expressions that evaluate to the 
same result such as multiple key=0

> Hive plan serialization is slow
> ---
>
> Key: HIVE-1511
> URL: https://issues.apache.org/jira/browse/HIVE-1511
> Project: Hadoop Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>
> As reported by Edward Capriolo:
> For reference I did this as a test case
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-2-patch.txt

Closing in on this one. This patch sets up build environment correctly. Proper 
test infrastructure. Patch is much cleaner. Still working on 
Serializing/Deserialing correctly so not very functional. 80% I think.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1441) Extend ivy offline mode to cover metastore downloads

2010-07-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894133#action_12894133
 ] 

Edward Capriolo commented on HIVE-1441:
---

Fresh checkout before any after patch. Still looking into it.
{noformat}
 
  
M
etaException(message:Could not connect to meta store using any of the URIs 
provided)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:160)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:128)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:71)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote.setUp(TestHiveMetaStoreRemote.java:64)
at junit.framework.TestCase.runBare(TestCase.java:125)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)

  
  
  



>From eclipse:
Running metastore!
MetaException(message:hive.metastore.warehouse.dir is not set in the config or 
blank)
at org.apache.hadoop.hive.metastore.Warehouse.(Warehouse.java:58)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:125)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39)
at java.lang.Thread.run(Thread.java:619)
10/07/30 16:03:22 ERROR metastore.HiveMetaStore: Metastore Thrift Server threw 
an exception. Exiting...
10/07/30 16:03:22 ERROR metastore.HiveMetaStore: 
MetaException(message:hive.metastore.warehouse.dir is not set in the config or 
blank)
at org.apache.hadoop.hive.metastore.Warehouse.(Warehouse.java:58)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:125)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39)
at java.lang.Thread.run(Thread.java:619)
{noformat}

> Extend ivy offline mode to cover metastore downloads
> 
>
> Key: HIVE-1441
> URL: https://issues.apache.org/jira/browse/HIVE-1441
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1441.1.patch
>
>
> We recently started downloading datanucleus jars via ivy, and the existing 
> ivy offilne mode doesn't cover this, so we still end up trying to contact the 
> ivy repository even with offline mode enabled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1441) Extend ivy offline mode to cover metastore downloads

2010-07-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894028#action_12894028
 ] 

Edward Capriolo commented on HIVE-1441:
---

Testing.

test:
[junit] Running org.apache.hadoop.hive.metastore.TestHiveMetaStore
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 24.539 sec
[junit] Running org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote
[junit] Running metastore!
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.238 sec
[junit] Test org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote FAILED

Can this be skipped?

> Extend ivy offline mode to cover metastore downloads
> 
>
> Key: HIVE-1441
> URL: https://issues.apache.org/jira/browse/HIVE-1441
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1441.1.patch
>
>
> We recently started downloading datanucleus jars via ivy, and the existing 
> ivy offilne mode doesn't cover this, so we still end up trying to contact the 
> ivy repository even with offline mode enabled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893772#action_12893772
 ] 

Edward Capriolo commented on HIVE-1492:
---

"the largest file is the correct file" 
Is that generally true or an absolute fact?

> FileSinkOperator should remove duplicated files from the same task based on 
> file sizes
> --
>
> Key: HIVE-1492
> URL: https://issues.apache.org/jira/browse/HIVE-1492
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch
>
>
> FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
> retain only one file for each task. A task could produce multiple files due 
> to failed attempts or speculative runs. The largest file should be retained 
> rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Hive should start moving to the new hadoop mapreduce api.

2010-07-29 Thread Edward Capriolo
Aren't these things mutually exclusive?
The new Map Reduce API appeared in 20.
Deprecating 17 seems reasonable, but we still have to support the old
api for 18 and 19 correct?

On Thu, Jul 29, 2010 at 2:11 PM, Ashish Thusoo  wrote:
> +1 to this
>
> Ashish
>
> -Original Message-
> From: yongqiang he [mailto:heyongqiang...@gmail.com]
> Sent: Thursday, July 29, 2010 10:54 AM
> To: hive-dev@hadoop.apache.org
> Subject: Hive should start moving to the new hadoop mapreduce api.
>
> Hi all,
>
> In offline discussions when we fixing HIVE-1492, we think it maybe good now 
> to start thinking to move Hive to use new MapReduce context API, and also 
> start deprecating Hadoop-0.17.0 support in Hive.
> Basically the new MapReduce API gives Hive more control at runtime.
>
> Any thoughts on this?
>
>
> Thanks
>


[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0

Hwi does not start correctly without this patch.

> HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
> 
>
> Key: HIVE-1294
> URL: https://issues.apache.org/jira/browse/HIVE-1294
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Edward Capriolo
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: hive-1294.patch.txt
>
>
> The Hive Webserver fails to startup with the following error message, if 
> HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
> $ build/dist/bin/hive --service hwi
> Exception in thread "main" java.io.IOException: Error opening job jar: 
> -libjars
>at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>at java.util.zip.ZipFile.open(Native Method)
>at java.util.zip.ZipFile.(ZipFile.java:114)
>at java.util.jar.JarFile.(JarFile.java:133)
>at java.util.jar.JarFile.(JarFile.java:70)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Slightly modifying the command line to launch hadoop in hwi.sh solves the 
> problem:
> $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
> 28c28
> <   exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS "$@"
> ---
> >   exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
> > "$@"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

Attachment: hive-1294.patch.txt

> HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
> 
>
> Key: HIVE-1294
> URL: https://issues.apache.org/jira/browse/HIVE-1294
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Edward Capriolo
>Priority: Blocker
> Attachments: hive-1294.patch.txt
>
>
> The Hive Webserver fails to startup with the following error message, if 
> HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
> $ build/dist/bin/hive --service hwi
> Exception in thread "main" java.io.IOException: Error opening job jar: 
> -libjars
>at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>at java.util.zip.ZipFile.open(Native Method)
>at java.util.zip.ZipFile.(ZipFile.java:114)
>at java.util.jar.JarFile.(JarFile.java:133)
>at java.util.jar.JarFile.(JarFile.java:70)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Slightly modifying the command line to launch hadoop in hwi.sh solves the 
> problem:
> $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
> 28c28
> <   exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS "$@"
> ---
> >   exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
> > "$@"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1294:
-

Assignee: Edward Capriolo

> HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
> 
>
> Key: HIVE-1294
> URL: https://issues.apache.org/jira/browse/HIVE-1294
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Edward Capriolo
>Priority: Minor
>
> The Hive Webserver fails to startup with the following error message, if 
> HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
> $ build/dist/bin/hive --service hwi
> Exception in thread "main" java.io.IOException: Error opening job jar: 
> -libjars
>at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>at java.util.zip.ZipFile.open(Native Method)
>at java.util.zip.ZipFile.(ZipFile.java:114)
>at java.util.jar.JarFile.(JarFile.java:133)
>at java.util.jar.JarFile.(JarFile.java:70)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Slightly modifying the command line to launch hadoop in hwi.sh solves the 
> problem:
> $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
> 28c28
> <   exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS "$@"
> ---
> >   exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
> > "$@"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

Priority: Blocker  (was: Minor)

> HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
> 
>
> Key: HIVE-1294
> URL: https://issues.apache.org/jira/browse/HIVE-1294
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 0.5.0
>Reporter: Dilip Joseph
>Assignee: Edward Capriolo
>Priority: Blocker
>
> The Hive Webserver fails to startup with the following error message, if 
> HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
> $ build/dist/bin/hive --service hwi
> Exception in thread "main" java.io.IOException: Error opening job jar: 
> -libjars
>at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>at java.util.zip.ZipFile.open(Native Method)
>at java.util.zip.ZipFile.(ZipFile.java:114)
>at java.util.jar.JarFile.(JarFile.java:133)
>at java.util.jar.JarFile.(JarFile.java:70)
>at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Slightly modifying the command line to launch hadoop in hwi.sh solves the 
> problem:
> $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
> 28c28
> <   exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS "$@"
> ---
> >   exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
> > "$@"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hive Web Interface Broken YET AGAIN!

2010-07-29 Thread Edward Capriolo
All,

While the web interface is not as widely used as the cli, people do
use it. Its init process has been broken 3 times I can remember (once
by the shims), once by adding version numbers to the jars, and now it
is affected by the libjars.

[r...@etl02 ~]# hive --service hwi
Exception in thread "main" java.io.IOException: Error opening job jar: -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:114)
at java.util.jar.JarFile.(JarFile.java:133)
at java.util.jar.JarFile.(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

I notice someone patched the cli do deal with this. There is no test
coverage for the shell scripts.

But it seems like only some of the scripts were repaired:

bin/ext/cli.sh
bin/ext/lineage.sh
bin/ext/metastore.sh

I wonder why only half the scripts are repaired? In general if
something changes in hive or hadoop that causes the cli to break we
should fix it across the board. I feel like every time a release is
coming up I test drive the web interface to find a simple script
problem stops it from running.

Edward


[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-07-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: cas-handle.tar.gz

This is not a quality patch yet. I am still experimenting with some ideas. 
Everying is free form and will likely change before the final patch. There are 
a few junk files (HiveIColumn,etc) which will not be part of the release.
Thus far:
CassandraSplit.java
HiveCassandraTableInputFormat.java
CassandraSerDe.java
TestColumnFamilyInputFormat.java
TestCassandraPut.java
TestColumnFamilyInputFormat.java

Are working and can give you an idea of where the code is going.

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, hive-1434-1.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Next Hive Contributors Meeting: August 9th @ Facebook

2010-07-26 Thread Edward Capriolo
On Wed, Jul 21, 2010 at 6:29 PM, John Sichi  wrote:
> Hi Ed,
>
> To keep the meetup format lightweight, we'll go without the teleconference, 
> but if you'd like, I can separately set up a periodic conf call for 
> committers outside the Bay Area to give you a chance to ask questions and air 
> any concerns.  How does that sound?
>
> JVS
>
> On Jul 21, 2010, at 1:48 PM, Edward Capriolo wrote:
>
>> Hey,
>>
>> Sorry that I suggested/asked for the teleconference and then
>> shamefully missed it.  I would ask you to reconsider, hopefully we can
>> work out the kinks, or do you have some summer interns floating around
>> that can handle it :)
>>
>> Edward
>>
>>
>>
>>
>>
>> On Wed, Jul 21, 2010 at 3:24 PM, John Sichi  wrote:
>>> +hive-user
>>>
>>> For the last contributor meeting, we had a teleconference setup, but it
>>> burned a bit of time on technical difficulties and ended up not even being
>>> used.  So going forward we will skip this part, but we'll still make sure to
>>> publish the slides for any presentations in the meeting summary, as well as
>>> videos in cases where we have facilities for recording the presenters.
>>> JVS
>>> Begin forwarded message:
>>>
>>> From: Carl Steinbach 
>>> Date: July 20, 2010 6:08:18 PM PDT
>>> To: 
>>> Subject: Next Hive Contributors Meeting: August 9th @ Facebook
>>> Reply-To: 
>>>
>>> *What*: Hive Contributors Meeting: August 9th @
>>> Facebook<http://www.meetup.com/Hive-Contributors-Group/calendar/14164112/>
>>>
>>> *When*: Monday, August 9, 2010 4:00 PM
>>>
>>> *Where*: Facebook HQ
>>> 1601 South California Avenue
>>> Palo Alto, CA 94304
>>>
>>> The next Hive Contributors Meeting will occur on August 9th from 4-6pm at
>>> Facebook's offices in Palo Alto.
>>>
>>> You must RSVP if you plan to attend this event.
>>>
>>> RSVP to this Meetup:
>>> http://www.meetup.com/Hive-Contributors-Group/calendar/14164112/
>>>
>>>
>
>

That would be nice. As a suggestion/reminder we do have ##hive on
freenode, We seem to be under-represented there. I think we could get
good synergy and move processes along quicker over irc.


[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-07-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Status: Patch Available  (was: Open)

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1414-2.txt, hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-07-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Attachment: hive-1414-2.txt

New version only reads hiverc if -i option is not specified. Includes xdocs.

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1414-2.txt, hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Next Hive Contributors Meeting: August 9th @ Facebook

2010-07-21 Thread Edward Capriolo
Hey,

Sorry that I suggested/asked for the teleconference and then
shamefully missed it.  I would ask you to reconsider, hopefully we can
work out the kinks, or do you have some summer interns floating around
that can handle it :)

Edward





On Wed, Jul 21, 2010 at 3:24 PM, John Sichi  wrote:
> +hive-user
>
> For the last contributor meeting, we had a teleconference setup, but it
> burned a bit of time on technical difficulties and ended up not even being
> used.  So going forward we will skip this part, but we'll still make sure to
> publish the slides for any presentations in the meeting summary, as well as
> videos in cases where we have facilities for recording the presenters.
> JVS
> Begin forwarded message:
>
> From: Carl Steinbach 
> Date: July 20, 2010 6:08:18 PM PDT
> To: 
> Subject: Next Hive Contributors Meeting: August 9th @ Facebook
> Reply-To: 
>
> *What*: Hive Contributors Meeting: August 9th @
> Facebook
>
> *When*: Monday, August 9, 2010 4:00 PM
>
> *Where*: Facebook HQ
> 1601 South California Avenue
> Palo Alto, CA 94304
>
> The next Hive Contributors Meeting will occur on August 9th from 4-6pm at
> Facebook's offices in Palo Alto.
>
> You must RSVP if you plan to attend this event.
>
> RSVP to this Meetup:
> http://www.meetup.com/Hive-Contributors-Group/calendar/14164112/
>
>


Alternate to java beans xml serializer for query plan

2010-07-20 Thread Edward Capriolo
I was unable to locate a ticket on this but I know a few people have
brought this up. A query with a large number of OR clauses ( >100  )
cause hive to throw OOM. Has anyone looked into an xml serializer that
will not OOM regardless of how larger the query is?

Edward


Re: Notes from the last Hive Contributors Meeting

2010-07-15 Thread Edward Capriolo
On Thu, Jul 15, 2010 at 2:02 PM, John Sichi  wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good about
> updating the wiki we know that things tend to fall though the cracks.
>
> I think there is still a place for wiki, free form, multi-person
> planning, etc but I do not think a mature software product can every
> have authoritative documentation in a wiki.
>
>



On Thu, Jul 15, 2010 at 2:02 PM, John Sichi  wrote:
> Create/Drop View
>
> Note: View support is only available starting in Hive 0.6.
> I added this caveat when I added the CREATE/DROP view section to the DDL
> page.  For the most part, we've been following this convention, and I think
> we should keep it up whatever the final decision on docs is (copy from wiki
> to xdocs during release, or maintain only in xdocs).
> Personally, I find wiki much friendlier in general, but the wiki software
> (MoinMoin) we are using leaves a lot to be desired compared to mediawiki or
> Confluence.
> JVS
> On Jul 15, 2010, at 7:30 AM, Edward Capriolo wrote:
>
> So a user is running hive and reads the wiki, and says "Wow we have
> view support, let me try this" This fails because views are only in
> trunk. This gives people a general bad impression about hive because
> they expect trunk features, because they have no authoritative
> documentation on THEIR VERSION. Users can be fickle and if they hit
> incorrect documentation they start to get the impression the software
> is "buggy" suddenly they start questioning everything and bringing
> every problem to the hive administrator because even though they wrote
> a query wrong their first instinct is to "blame hive".
>
>
>
>
> I find editing xdocs EASIER then working with wiki. Wiki is great and
> all but in my travels I have to work on 5 different wiki's they all
> are slightly different in what they support and their mark up. We
> should be able to commit xdoc patches without full unit tests. Keeping
> the xdoc up to date should not be an issue because we should simply
> not accept a patch that changes/adds functionality without some xdoc.
>
> Another issue right now is there are features that are NOT documented
> anywhere. When a user asks about those features I have to send them to
> Jira tickets, often times the ticket will have a long back and forth
> where the feature is debated, or sometimes just a patch, you never see
> the full syntax, it can be very confusing,I often end up telling them
> to dig through a .q file inside a patch to figure out what this
> feature is and how to use it. While most people are good abo

Re: Notes from the last Hive Contributors Meeting

2010-07-15 Thread Edward Capriolo
On Thu, Jul 15, 2010 at 5:15 AM, Carl Steinbach  wrote:
> Hi,
>
> Notes from the last Hive Contributors Meeting are now available on the
> wiki: http://wiki.apache.org/hadoop/HiveContributorsMinutes100706
>
> Thanks.
>
> Carl
>

Sorry I did not get to listen to the event. So one topic of interest for me is:

"Several people voiced concerns that developers/users are less likely
to update the documentation if doing so requires them to submit a
patch."

I think this is a valid concern, however I want to point out a few
bigger picture things.  First, I want to point out what I think is a
great shining of documentation.

http://hornetq.sourceforge.net/docs/hornetq-2.1.1.Final/user-manual/en/html/index.html

hbase does a nice job as well.
http://hbase.apache.org/docs/r0.20.5/metrics.html

While I think the hive documentation on the wiki is better then most
wiki's, it has some issues. Here is an example. I am running hive5.

So a user is running hive and reads the wiki, and says "Wow we have
view support, let me try this" This fails because views are only in
trunk. This gives people a general bad impression about hive because
they expect trunk features, because they have no authoritative
documentation on THEIR VERSION. Users can be fickle and if they hit
incorrect documentation they start to get the impression the software
is "buggy" suddenly they start questioning everything and bringing
every problem to the hive administrator because even though they wrote
a query wrong their first instinct is to "blame hive".

I find editing xdocs EASIER then working with wiki. Wiki is great and
all but in my travels I have to work on 5 different wiki's they all
are slightly different in what they support and their mark up. We
should be able to commit xdoc patches without full unit tests. Keeping
the xdoc up to date should not be an issue because we should simply
not accept a patch that changes/adds functionality without some xdoc.

Another issue right now is there are features that are NOT documented
anywhere. When a user asks about those features I have to send them to
Jira tickets, often times the ticket will have a long back and forth
where the feature is debated, or sometimes just a patch, you never see
the full syntax, it can be very confusing,I often end up telling them
to dig through a .q file inside a patch to figure out what this
feature is and how to use it. While most people are good about
updating the wiki we know that things tend to fall though the cracks.

I think there is still a place for wiki, free form, multi-person
planning, etc but I do not think a mature software product can every
have authoritative documentation in a wiki.


[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-07-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Attachment: HIVE-471.4.patch

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.1
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-07-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Status: Patch Available  (was: Open)

> A UDF for simple reflection
> ---
>
> Key: HIVE-471
> URL: https://issues.apache.org/jira/browse/HIVE-471
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.1
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
> HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff
>
>
> There are many methods in java that are static and have no arguments or can 
> be invoked with one simple parameter. More complicated functions will require 
> a UDF but one generic one can work as a poor-mans UDF.
> {noformat}
> SELECT reflect("java.lang.String", "valueOf", 1), reflect("java.lang.String", 
> "isEmpty")
> FROM src LIMIT 1;
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hive documentation

2010-07-13 Thread Edward Capriolo
All,

We are starting to move documentation into hive trunk into the form of
xdoc. If you are thinking to add something/modify the wiki you may
want to consider writing an xdoc instead.

Trunk already has support for xdoc and the only xdoc related ticket
open is this one.

https://issues.apache.org/jira/browse/HIVE-1446

Edward


[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-07-13 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-1446-part-1.diff

Got most of the language manual

> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1446-part-1.diff, hive-1446.diff, hive-logo-wide.png
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-07-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886912#action_12886912
 ] 

Edward Capriolo commented on HIVE-1096:
---

I am having trouble uploading with the update diff function of the review 
board.  As I mentioned several times, I really had one simple requirement

{noformat}
hive -hiveconf DAY=5 -e "LOAD DATA INFILE '/tmp/${DAY}'  into logs 
partition=${DAY}"
{noformat}

I am all for doing things 100% correct, but this is such a simple thing I am 
really getting worn out by the endless revisions and doing lots of fancy things 
just because someone might want to do ${x${y}bla}. 
Really, I would like to make this ticket go +1, and get on with something more 
interesting.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
> hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-07-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-12.patch.txt

Change interpolate to substituteAdded the substitution logic to file, dfs, 
set , and query processor

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
> hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-07-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
> hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Reminder: Hive Contributors Meeting is Tomorrow

2010-07-05 Thread Edward Capriolo
On Mon, Jul 5, 2010 at 4:21 PM, Carl Steinbach  wrote:
> Hi Everyone,
>
> The next installment of the monthly Hive Contributors Meeting is convening
> tomorrow from 3-5pm at Cloudera's offices in Palo Alto. If you are planning
> to attend and have not already done so, please officially sign up at
>
> http://www.meetup.com/Hive-Contributors-Group
>
> (If you're not already a member of the Hive Contributors Group you'll have
> to join.)
>
> The proposed agenda is:
>
> 3-3:15 Introductions
>
> 3:15 - 4:15 Share what we're working on
>        - HOwl overview and update from Olga Natkovich
>        - Beeswax demo from bc Wong
>
> 4:15 - 5:00 0.6 Release Management Discussion
>
> Meeting Location:
> Cloudera
> 820 Portage Ave.
> Palo Alto, CA 94306
> http://www.google.com/maps?q=210+Portage+Ave,+Palo+Alto,+CA+94306
>
> Thanks.
>
> Carl
>

Please find a way to do an audio,video or go-to-meeting, for those on
the east coast.

Thank you,
Edward


[jira] Commented: (HIVE-1447) Speed up reflection method calls

2010-07-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884488#action_12884488
 ] 

Edward Capriolo commented on HIVE-1447:
---

Does this have an affect on all generic UDFs? I had someone telling me that 
they added one of my genUDF's to a query on a large table and the time to 
complete the job went up dramatically.

> Speed up reflection method calls
> 
>
> Key: HIVE-1447
> URL: https://issues.apache.org/jira/browse/HIVE-1447
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: A.java, HIVE-1447.1.patch
>
>
> See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and 
> http://www.jguru.com/faq/view.jsp?EID=246569
> There is a huge drop of overhead (more than half) if we do 
> "field.setAccessible(true)" for the field that we want to access.
> I did a simple experiment and that worked well with method as well.
> The results are (note that the method just add 1 to an integer):
> {code}
> 1 regular method calls:26 milliseconds.
> 1 reflective method calls without lookup:4029 milliseconds.
> 1 accessible reflective method calls without lookup:1810 milliseconds.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-07-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884384#action_12884384
 ] 

Edward Capriolo commented on HIVE-1434:
---

I actually got pretty far with this simply duplicating the logic in the Hbase 
Storage handler. Unfortunately I hit a snafu. Cassandra is not using the 
deprecated mapred.*, their input format is using mapreduce.*. I have seen a few 
tickets for this, and as far as I know hive is 100% mapred. So to get this done 
we either have to wait until hive is converted to mapreduce, or I have to make 
an "old school" mapred based input format for cassandra. 

@John am I wrong? Is there a way to work with mapreduce input formats that I am 
not understanding?



> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: hive-1434-1.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-1446.diff

Includes the image in the vsl to fix the alignment

> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1446.diff, hive-logo-wide.png
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-logo-wide.png

We need this wise logo to fix the align of the generated docs

> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-logo-wide.png
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884142#action_12884142
 ] 

Edward Capriolo commented on HIVE-1446:
---

I will make a xdoc the CLI page

> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884141#action_12884141
 ] 

Edward Capriolo commented on HIVE-1135:
---

Carl, thank you for the assist!

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-06-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-1.txt

Just a start. (To prove that I am doing something with this ticket)

> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hadoop Hive
>  Issue Type: New Feature
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: hive-1434-1.txt
>
>
> Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Review Request: HIVE-1096: Hive Variables

2010-06-29 Thread Edward Capriolo
0:22, Carl Steinbach wrote:
> > trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java, 
> > line 93
> > <http://review.hbase.org/r/229/diff/1/?file=1604#file1604line93>
> >
> > I don't think you want to perform variable substitution at this point. 
> > It makes it impossible to create nested variables.

I will check it out.


- Edward


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/229/#review288
---


On 2010-06-23 20:02:57, Edward Capriolo wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/229/
> ---
> 
> (Updated 2010-06-23 20:02:57)
> 
> 
> Review request for Hive Developers.
> 
> 
> Summary
> ---
> 
> Hive Variables
> 
> 
> This addresses bug HIVE-1096.
> http://issues.apache.org/jira/browse/HIVE-1096
> 
> 
> Diffs
> -
> 
>   trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109 
>   trunk/conf/hive-default.xml 955109 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 
> 955109 
>   trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q 
> PRE-CREATION 
>   trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
> PRE-CREATION 
> 
> Diff: http://review.hbase.org/r/229/diff
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Edward
> 
>



Re: 6.0 and trunk look broken to me

2010-06-25 Thread Edward Capriolo
On Wed, Jun 23, 2010 at 11:52 PM, Edward Capriolo  wrote:
> On Wed, Jun 23, 2010 at 10:48 PM, John Sichi  wrote:
>> Did you get past this?  It looks like some kind of bad build.
>>
>> JVS
>>
>> On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote:
>>
>>> Not sure if this is just my env but on 0.6.0 when I run the unit tests I 
>>> get a bunch of errors of the following form:
>>>
>>>    [junit] Begin query: alter3.q
>>>    [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
>>>    [junit]     at 
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
>>>    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>> Method)
>>>    [junit]     at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>    [junit]     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>    [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>>>    [junit]     at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>    [junit]     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>    [junit]     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>>>    [junit]
>>>
>>> -Original Message-
>>> From: John Sichi [mailto:jsi...@facebook.com]
>>> Sent: Wednesday, June 23, 2010 2:15 PM
>>> To: 
>>> Subject: Re: 6.0 and trunk look broken to me
>>>
>>> (You mean 0.6, right?)
>>>
>>> I'm not able to reproduce this (just tested with latest trunk on Linux and 
>>> Mac).  Is anyone else seeing it?
>>>
>>> JVS
>>>
>>> On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:
>>>
>>>> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed 
>>>> mode.
>>>>
>>>> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
>>>> edw...@ec dist]$ export
>>>> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
>>>> bin/hive Hive history
>>>> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
>>>> hive> show tables;
>>>> FAILED: Parse Error: line 0:-1 cannot recognize input ''
>>>>
>>>> [edw...@ec dist]$ more /tmp/edward/hive.log
>>>> 2010-06-23 16:41:00,749 ERROR ql.Driver
>>>> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
>>>> cannot recognize input ''
>>>>
>>>> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
>>>> recognize input ''
>>>>
>>>>      at 
>>>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>>>>      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>>>>      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>>>>      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>>>>      at 
>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>>>      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>      at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>      at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>
>>
>>
>
> I do not know what is up. Cleaned up my .ivy2 checked out and the
> build again. I guess if no one else is seeing it, it must be something
> on my system.
>
> Total time: 3 minutes 7 seconds
> [edw...@ec hive_6_pre]$ cd build/dist/
> [edw...@ec dist]$ cd ../hive-trunk/^C
> [edw...@ec dist]$ ls
> bin  conf  examples  lib  README.txt
> [edw...@ec dist]$ bin/hive
> Hive history file=/tmp/edward/hive_job_log_edward_201006232341_41029014.txt
> hive> show tables;
> FAILED: Parse Error: line 0:-1 cannot recognize input ''
>
> hive> exit;
> [edw...@ec dist]$ ant -v
> Apache Ant version 1.8.0 compiled on February 1 2010
> Trying the default build file: build.xml
> Buildfile: build.xml does not exist!
> Build failed
> [edw...@ec dist]$ java -v
> Unrecognized option: -v
> Could not create the Java virtual machine.
> [edw...@ec dist]$ java -version
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>

JDO upgrade issue with HIVE-1176,
fixed this issue for me.


[jira] Created: (HIVE-1434) Cassandra Storage Handler

2010-06-24 Thread Edward Capriolo (JIRA)
Cassandra Storage Handler
-

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo


Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882349#action_12882349
 ] 

Edward Capriolo commented on HIVE-1135:
---

Bump:: I will fix the formatting later. Can we commit this we do not really 
need any unit tests here?

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: 6.0 and trunk look broken to me

2010-06-23 Thread Edward Capriolo
On Wed, Jun 23, 2010 at 10:48 PM, John Sichi  wrote:
> Did you get past this?  It looks like some kind of bad build.
>
> JVS
>
> On Jun 23, 2010, at 2:38 PM, Ashish Thusoo wrote:
>
>> Not sure if this is just my env but on 0.6.0 when I run the unit tests I get 
>> a bunch of errors of the following form:
>>
>>    [junit] Begin query: alter3.q
>>    [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT
>>    [junit]     at 
>> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052)
>>    [junit]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>    [junit]     at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>    [junit]     at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>    [junit]     at java.lang.reflect.Method.invoke(Method.java:597)
>>    [junit]     at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>    [junit]     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>    [junit]     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>    [junit]     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
>>    [junit]
>>
>> -Original Message-
>> From: John Sichi [mailto:jsi...@facebook.com]
>> Sent: Wednesday, June 23, 2010 2:15 PM
>> To: 
>> Subject: Re: 6.0 and trunk look broken to me
>>
>> (You mean 0.6, right?)
>>
>> I'm not able to reproduce this (just tested with latest trunk on Linux and 
>> Mac).  Is anyone else seeing it?
>>
>> JVS
>>
>> On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote:
>>
>>> Trunk and 6.0 both show this in hadoop local mode and hadoop distributed 
>>> mode.
>>>
>>> export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
>>> edw...@ec dist]$ export
>>> HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
>>> bin/hive Hive history
>>> file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
>>> hive> show tables;
>>> FAILED: Parse Error: line 0:-1 cannot recognize input ''
>>>
>>> [edw...@ec dist]$ more /tmp/edward/hive.log
>>> 2010-06-23 16:41:00,749 ERROR ql.Driver
>>> (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
>>> cannot recognize input ''
>>>
>>> org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
>>> recognize input ''
>>>
>>>      at 
>>> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
>>>      at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
>>>      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
>>>      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
>>>      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
>>>      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>      at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>      at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>
>

I do not know what is up. Cleaned up my .ivy2 checked out and the
build again. I guess if no one else is seeing it, it must be something
on my system.

Total time: 3 minutes 7 seconds
[edw...@ec hive_6_pre]$ cd build/dist/
[edw...@ec dist]$ cd ../hive-trunk/^C
[edw...@ec dist]$ ls
bin  conf  examples  lib  README.txt
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006232341_41029014.txt
hive> show tables;
FAILED: Parse Error: line 0:-1 cannot recognize input ''

hive> exit;
[edw...@ec dist]$ ant -v
Apache Ant version 1.8.0 compiled on February 1 2010
Trying the default build file: build.xml
Buildfile: build.xml does not exist!
Build failed
[edw...@ec dist]$ java -v
Unrecognized option: -v
Could not create the Java virtual machine.
[edw...@ec dist]$ java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)


[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments

2010-06-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882024#action_12882024
 ] 

Edward Capriolo commented on HIVE-1431:
---

We have a few tickets open, we really need to move all this stuff to a real 
parser so we can properly deal with things like ';' or comments like this or 
whatever. It is painfully hard to work around all these type of things and we 
never get to the root of the problem.

> Hive CLI can't handle query files that begin with comments
> --
>
> Key: HIVE-1431
> URL: https://issues.apache.org/jira/browse/HIVE-1431
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: CLI
>Reporter: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
>
> {code}
> % cat test.q
> -- This is a comment, followed by a command
> set -v;
> -- 
> -- Another comment
> --
> show tables;
> -- Last comment
> (master) [ ~/Projects/hive ]
> % hive < test.q
> Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt
> hive> -- This is a comment, followed by a command
> > set -v;
> FAILED: Parse Error: line 2:0 cannot recognize input 'set'
> hive> -- 
> > -- Another comment
> > --
> > show tables;
> OK
> rawchunks
> Time taken: 5.334 seconds
> hive> -- Last comment
> > (master) [ ~/Projects/hive ]
> % 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: Hive Variables

2010-06-23 Thread Edward Capriolo

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/229/
---

Review request for Hive Developers.


Summary
---

Hive Variables


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 955109 
  trunk/conf/hive-default.xml 955109 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 955109 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 
955109 
  trunk/ql/src/test/queries/clientpositive/set_processor_namespaces.q 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/set_processor_namespaces.q.out 
PRE-CREATION 

Diff: http://review.hbase.org/r/229/diff


Testing
---


Thanks,

Edward



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-11-patch.txt

Was not interpolating system:vars. Fixed with better test case.

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Fix For: 0.6.0, 0.7.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
> hive-1096-11-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, 
> hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



6.0 and trunk look broken to me

2010-06-23 Thread Edward Capriolo
Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode.

export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca
edw...@ec dist]$ export
HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$
bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt
hive> show tables;
FAILED: Parse Error: line 0:-1 cannot recognize input ''

[edw...@ec dist]$ more /tmp/edward/hive.log
2010-06-23 16:41:00,749 ERROR ql.Driver
(SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1
cannot recognize input ''

org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot
recognize input ''

at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-10-patch.txt

Patch adds variable interpretation. 

> Hive Variables
> --
>
> Key: HIVE-1096
> URL: https://issues.apache.org/jira/browse/HIVE-1096
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.5.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
> hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff
>
>
> From mailing list:
> --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
> called "Variables." Basically you can define a variable via command-line 
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
> ${DT} within the hive queries. This could be extremely useful. I can't seem 
> to find this feature even on trunk. Is this feature currently anywhere in the 
> roadmap?--
> This could be implemented in many places.
> A simple place to put this is 
> in Driver.compile or Driver.run we can do string substitutions at that level, 
> and further downstream need not be effected. 
> There could be some benefits to doing this further downstream, parser,plan. 
> but based on the simple needs we may not need to overthink this.
> I will get started on implementing in compile unless someone wants to discuss 
> this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: real time query option

2010-06-23 Thread Edward Capriolo
On Wed, Jun 23, 2010 at 2:12 AM, Amr Awadallah  wrote:
> For low-latency queries you should either use HBase instead, or consider
> Hive over HBase, see:
>
> http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/
>
> -- amr
>
> On 6/22/2010 11:05 PM, jaydeep vishwakarma wrote:
>>
>> Hi,
>>
>> I want to avoid delta time to execute the queries. Every time even when
>> we fetch single row from hive tables it goes to typical map and reduce
>> process. Is there any platform which built on top of HDFS or hive table
>> which help me to get real time query data, I want to avoid filling data
>> to DB.
>>
>> Regards,
>> Jaydeep
>>
>> The information contained in this communication is intended solely for the
>> use of the individual or entity to whom it is addressed and others
>> authorized to receive it. It may contain confidential or legally privileged
>> information. If you are not the intended recipient you are hereby notified
>> that any disclosure, copying, distribution or taking any action in reliance
>> on the contents of this information is strictly prohibited and may be
>> unlawful. If you have received this communication in error, please notify us
>> immediately by responding to this email and then delete it from your system.
>> The firm is neither liable for the proper and complete transmission of the
>> information contained in this communication nor for any delay in its
>> receipt.
>

Hive by its nature is not real time, but there are some "REAL TIME"
options in hive, that you might be able to take advantage of.

If your dataset is small:

set mapred.job.tracker=local;

This will give you a local 1 mapper 1 reducer job. There is not
jobtracker start up overhead everything happens in thread.

Option: pre compute your results sets you want in real time.

select * from tablea where part=x

Is NOT a map reduce job. So if you have precomputed tablea selecting
it will be as fast as hadoop can stream it to your client.


[jira] Commented: (HIVE-1414) automatically invoke .hiverc init script

2010-06-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881206#action_12881206
 ] 

Edward Capriolo commented on HIVE-1414:
---

1 & 3 ok. 

As for #2 is there every a condition where getenv() or home will be null?  I 
doubt hive would work in the face in either of these conditions. I will add the 
null checks anyway.

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Fix For: 0.7.0
>
> Attachments: hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1419) Policy on deserialization errors

2010-06-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880908#action_12880908
 ] 

Edward Capriolo commented on HIVE-1419:
---

I am looking through this and trying to wrap my head around it. Off hand do you 
know what happens in this situation. 

We have a table that we have added columns to over time

create table tab (a int, b int);

Over time we have added more columns

alter table tab (a int, b int, c int)

This works fine for us as selecting column c on older data returns null for 
that column. Will this behaviour be preserved ?

> Policy on deserialization errors
> 
>
> Key: HIVE-1419
> URL: https://issues.apache.org/jira/browse/HIVE-1419
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Vladimir Klimontovich
>Assignee: Vladimir Klimontovich
>Priority: Minor
> Fix For: 0.5.1, 0.6.0
>
> Attachments: corrupted_records_0.5.patch, 
> corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
> corrupted_records_trunk_ver2.patch
>
>
> When deserializer throws an exception the whole map tasks fails (see 
> MapOperator.java file). It's not always an convenient behavior especially on 
> huge datasets where several corrupted lines could be a normal practice. 
> Proposed solution:
> 1) Have a counter of corrupted records
> 2) When a counter exceeds a limit (configurable via 
> hive.max.deserializer.errors property, 0 by default) throw an exception. 
> Otherwise just log and exception with WARN level.
> Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Is Hive/HBase reviewboard down ?

2010-06-19 Thread Edward Capriolo
You could just upload the patch to issues.apache.org. From the way I
understand it the two things are working in concert together.

Edward

On Sat, Jun 19, 2010 at 9:27 AM, Prafulla Tekawade <
prafull...@users.sourceforge.net> wrote:

> http://review.hbase.org is in-accessible
> I wanted to put one patch for review.
> Can someone let me know if reviewboard is hosted at
> some other place ?
>
> --
> Best Regards,
> Prafulla V Tekawade
>


[jira] Commented: (HIVE-1414) automatically invoke .hiverc init script

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880414#action_12880414
 ] 

Edward Capriolo commented on HIVE-1414:
---

Files automatically: ql sourced env[HIVE_HOME]/bin/.hiverc, 
property(user.home)/.hiverc. I think only the CLI needs these features. Users 
of hive service are accessing the session though code repetition is not a 
problem, the same is true with JDBC. CLI users get the most benefit from the 
.hiverc. What do you think?

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Attachments: hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-06-18 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Attachment: hive-1414-patch-1.txt

First attempt at patch.

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
> Attachments: hive-1414-patch-1.txt
>
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1414) automatically invoke .hiverc init script

2010-06-18 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1414:
-

Assignee: Edward Capriolo

> automatically invoke .hiverc init script
> 
>
> Key: HIVE-1414
> URL: https://issues.apache.org/jira/browse/HIVE-1414
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: 0.5.0
>Reporter: John Sichi
>Assignee: Edward Capriolo
>
> Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880303#action_12880303
 ] 

Edward Capriolo commented on HIVE-1135:
---

Great on ivy. 
As for the wiki I think we should just put a node at the top of the page that 
says "Do not edit me. Edit xdocs instead." For the pages we have migrated. I 
want to do like a page every other day so it should be done soon enough. I 
actually have commit access but I usually leave the commits up to the experts. 
Also since I worked on this ticket I really should not be the commit person. 
Anyone else?

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
> hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880207#action_12880207
 ] 

Edward Capriolo commented on HIVE-1405:
---

I was thinking we just look for hive_rc in the users home directory and/or in 
hive_home/bin. If we find that file we have to read it line by line and process 
it just like other hive commands. We could restrict this to just set or add 
commands but there is no reason it could not have a full query.

> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880053#action_12880053
 ] 

Edward Capriolo commented on HIVE-1405:
---

{noformat}
[edw...@ec dist]$ echo show tables > a.sql
[edw...@ec dist]$ bin/hive
[edw...@ec dist]$ chmod a+x a.sql 
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_1189860304.txt
[edw...@ec dist]$ pwd
/mnt/data/hive/hive/build/dist
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_310534855.txt
hive> ! /mnt/data/hive/hive/build/dist/a.sql;
/mnt/data/hive/hive/build/dist/a.sql: line 1: show: command not found
Command failed with exit code = 127
{noformat}

! seems to execute bash commands

Dont we want to execute hive commands inside hive like add jar


> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-17 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1135-5-patch.txt

Added the join page as well.

> Use Anakia for version controlled documentation
> ---
>
> Key: HIVE-1135
> URL: https://issues.apache.org/jira/browse/HIVE-1135
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.5.0
>    Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Fix For: 0.6.0
>
> Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
> hive-1135-5-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
> jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png
>
>
> Currently the Hive Language Manual and many other critical pieces of 
> documentation are on the Hive wiki. 
> Right now we count on the author of a patch to follow up and add wiki 
> entries. While we do a decent job with this, new features can be missed. Or 
> using running older/newer branches can not locate relevant documentation for 
> their branch. 
> ..example of a perception I do not think we want to give off...
> http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
> We should generate our documentation in the way hadoop & hbase does, inline 
> using forest. I would like to take the lead on this, but we need a lot of 
> consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880028#action_12880028
 ] 

Edward Capriolo commented on HIVE-1405:
---

I like Carl's approach. The entire point of the hiverc is not to explicitly 
have to do invoke anything explicit to add jars.

> Implement a .hiverc startup file
> 
>
> Key: HIVE-1405
> URL: https://issues.apache.org/jira/browse/HIVE-1405
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Jonathan Chang
>Assignee: John Sichi
>
> When deploying hive, it would be nice to have a .hiverc file containing 
> statements that would be automatically run whenever hive is launched.  This 
> way, we can automatically add JARs, create temporary functions, set flags, 
> etc. for all users quickly. 
> This should ideally be set up like .bashrc and the like with a global version 
> and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Vertical partitioning

2010-06-17 Thread Edward Capriolo
On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma <
jaydeep.vishwaka...@mkhoj.com> wrote:

> Just looking opportunity and feasibility for it. In one of my table have
> more than 20 fields where most of the time I need only 10 main fields. We
> rarely need other fields for day to day analysis.
>
> Regards,
> Jaydeep
>
>
> Ning Zhang wrote:
>
> Hive support columnar storage (RCFile) but not vertical partitioning. Is
> there any use case for vertical partitioning?
>
> On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:
>
>
>
> Hi,
>
> Does hive support Vertical partitioning?
>
> Regards,
> Jaydeep
>
>
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>
>
>
>
>
>
>
> 
>
> The information contained in this communication is intended solely for the
> use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally privileged
> information. If you are not the intended recipient you are hereby notified
> that any disclosure, copying, distribution or taking any action in reliance
> on the contents of this information is strictly prohibited and may be
> unlawful. If you have received this communication in error, please notify us
> immediately by responding to this email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of the
> information contained in this communication nor for any delay in its
> receipt.
>

Vertical partitioning is just as practical in a traditional RDBMS as it
would be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row
size
2) Your DBMS has terrible blob/clob/text support and the only want to get
large objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join
to select the columns you need. I do not consider row serialization and de
serialization to be the majority of a hive job, and in most cases hadoop
handles 1 large file better then two smaller ones. Then again we have some
tables 140+ columns so i can see vertical partitioning helping with those
tables but it doubles the management.


  1   2   3   4   5   6   >