Re: does anyone care about list bucketing stored as directories?

2017-10-08 Thread Xuefu Zhang
Lack a response doesn't necessarily means "don't care". Maybe you can have a good description of the problem and proposed solution. Frankly I cannot make much sense out of the previous email. Thanks, Xuefu On Fri, Oct 6, 2017 at 5:05 PM, Sergey Shelukhin wrote: > Looks like nobody does… I’ll fi

Re: Aug. 2017 Hive User Group Meeting

2017-08-21 Thread Xuefu Zhang
Dear Hive users and developers, As reminder, the next Hive User Group Meeting will occur this Thursday, Aug. 24. The agenda is available on the event page ( https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/). See you all there! Thanks, Xuefu On Tue, Aug 1, 2017 at 7:18 PM, Xuefu

Aug. 2017 Hive User Group Meeting

2017-08-01 Thread Xuefu Zhang
Hi all, It's an honor to announce that Hive community is launching a Hive user group meeting in the bay area this month. The details can be found at https://www.meetup.com/Hive-User-Group-Meeting/events/242210487/. We are inviting talk proposals from Hive users as well as developers at this time.

Welcome Rui Li to Hive PMC

2017-05-24 Thread Xuefu Zhang
Hi all, It's an honer to announce that Apache Hive PMC has recently voted to invite Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and committer, and has made significant contribution in Hive especially in Hive on Spark. Please join me in congratulating him and looking forwar

Jimmy Xiang now a Hive PMC member

2017-05-24 Thread Xuefu Zhang
Hi all, It's an honer to announce that Apache Hive PMC has recently voted to invite Jimmy Xiang as a new Hive PMC member. Please join me in congratulating him and looking forward to a bigger role that he will play in Apache Hive project. Thanks, Xuefu

Welcome new Hive committer, Zhihai Xu

2017-05-05 Thread Xuefu Zhang
Hi all, I'm very please to announce that Hive PMC has recently voted to offer Zhihai a committership which he accepted. Please join me in congratulating on this recognition and thanking him for his contributions to Hive. Regards, Xuefu

Re: [ANNOUNCE] Apache Hive 2.0.0 Released

2016-02-16 Thread Xuefu Zhang
Congratulation, guys!!! --Xuefu On Tue, Feb 16, 2016 at 11:54 AM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Great news! Thanks Sergey for the effort. > > Thanks > Prasanth > > > On Feb 16, 2016, at 1:44 PM, Sergey Shelukhin wrote: > > > > The Apache Hive team is proud to a

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-03 Thread Xuefu Zhang
uld destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, t

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-02 Thread Xuefu Zhang
> employees accept any responsibility. > > > > *From:* Koert Kuipers [mailto:ko...@tresata.com] > *Sent:* 03 February 2016 00:09 > *To:* user@hive.apache.org > *Subject:* Re: Hive on Spark Engine versus Spark using Hive metastore > > > > uuuhm with spark using Hiv

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-02 Thread Xuefu Zhang
-9563693-0-7*. > > co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4* > > *Publications due shortly:* > > *Complex Event Processing in Heterogeneous Environments*, ISBN: > 978-0-9563693-3-8 > > *Oracle and Sybase, Concepts and

Re: Hive on Spark Engine versus Spark using Hive metastore

2016-02-02 Thread Xuefu Zhang
I think the diff is not only about which does optimization but more on feature parity. Hive on Spark offers all functional features that Hive offers and these features play out faster. However, Spark SQL is far from offering this parity as far as I know. On Tue, Feb 2, 2016 at 2:38 PM, Mich Talebz

Re: Running Spark-sql on Hive metastore

2016-01-31 Thread Xuefu Zhang
For Hive on Spark, there is a startup cost. The second run should be faster. More importantly, it looks like you have 18 map tasks but only your cluster only runs two of them at a time. Thus, you cluster is basically having only two way parallelism. If you configure your cluster to give more capaci

Re: Two results are inconsistent when i use Hive on Spark

2016-01-27 Thread Xuefu Zhang
Hi Jone, Did you meant you get different results from time to time? If so, could you run "explain query" multiple times to see if there is any difference. Also, could you try without map-join hint? Without dataset, it's hard to reproduce the problem. Thus, it's great if you can provide DML, and t

Re: January Hive User Group Meeting

2016-01-21 Thread Xuefu Zhang
For those who cannot attend in person, here is the webex info: https://cloudera.webex.com/meet/xzhang 1-650-479-3208 Call-in toll number (US/Canada) 623 810 662 (access code) Thanks, Xuefu On Wed, Jan 20, 2016 at 9:45 AM, Xuefu Zhang wrote: > Hi all, > > As a reminder, the meetin

Re: January Hive User Group Meeting

2016-01-20 Thread Xuefu Zhang
Hi all, As a reminder, the meeting will be held tomorrow as scheduled. Please refer to the meetup page[1] for details. Looking forward to meeting you all! Thanks, Xuefu [1] http://www.meetup.com/Hive-User-Group-Meeting/events/227463783/ On Wed, Dec 16, 2015 at 3:38 PM, Xuefu Zhang wrote

Re: Hive on Spark task running time is too long

2016-01-11 Thread Xuefu Zhang
You should check executor log to find out why it failed. There might have more explanation. --Xuefu On Sun, Jan 10, 2016 at 11:21 PM, Jone Zhang wrote: > *I have submited a application many times.* > *Most of applications running correctly.See attach 1.* > *But one of the them breaks as expecte

Re: How to ensure that the record value of Hive on MapReduce and Hive on Spark are completely consistent?

2016-01-07 Thread Xuefu Zhang
If the number of records are in synch, then the chance for any value disagreement is very low because Hive on Spark and Hive on MR are basically running the same byte code. If there is anything wrong specific to Spark, then the disparity would be much bigger than that. I suggest you test your produ

Re: It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same

2015-12-22 Thread Xuefu Zhang
It seems that the plan isn't quite right, possibly due to union all optimization in Spark. Could you create a JIRA for this? CC Chengxiang as he might have some insight. Thanks, Xuefu On Tue, Dec 22, 2015 at 3:39 AM, Jone Zhang wrote: > Hive 1.2.1 on Spark1.4.1 > > 2015-12-22 19:31 GMT+08:00 J

Re: Hive on Spark throw java.lang.NullPointerException

2015-12-18 Thread Xuefu Zhang
Could you create a JIRA with repro case? Thanks, Xuefu On Thu, Dec 17, 2015 at 9:21 PM, Jone Zhang wrote: > *My query is * > set hive.execution.engine=spark; > select > > t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxs

Re: Hive on Spark - Error: Child process exited before connecting back

2015-12-17 Thread Xuefu Zhang
:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: at > java.lang.ClassLoader.loadClass(ClassLoader.java:358) > 2015-12-17 17:34:02,435 INFO org.apache.hive.spark.client.SparkClientImpl: ... > 2 more > 2015-12-17 17:34:02,438 WARN org.apache.hive.spark.client.SparkClientImpl: &g

Re: January Hive User Group Meeting

2015-12-16 Thread Xuefu Zhang
ndorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibilit

January Hive User Group Meeting

2015-12-16 Thread Xuefu Zhang
Dear Hive users and developers, Hive community is considering a user group meeting[1] January 21, 2016 at Cloudera facility in Palo Alto, CA. This will be a great opportunity for vast users and developers to find out what's happening in the community and share each other's experience with Hive. Th

Re: making session setting "set spark.master=yarn-client" for Hive on Spark

2015-12-16 Thread Xuefu Zhang
Mich, By switching the values for spark.master, you're basically asking Hive to use your YARN cluster rather than your spark standalone cluster. Both modes are supported besides local, local-cluster, and yarn-cluster. And yarn-cluster is the recommended mode. Thanks, Xuefu On Wed, Dec 16, 2015 a

Re: Hive on Spark - Error: Child process exited before connecting back

2015-12-15 Thread Xuefu Zhang
As to the spark versions that are supported. Spark has made non-compatible API changes in 1.5, and that's the reason why Hive 1.1.0 doesn't work with Spark 1.5. However, the latest Hive in master or branch-1 should work with spark 1.5. Also, later CDH 5.4.x versions have already supported Spark 1.

Re: Hive on Spark - Error: Child process exited before connecting back

2015-12-15 Thread Xuefu Zhang
Ophir, Can you provide your hive.log here? Also, have you checked your spark application log? When this happens, it usually means that Hive is not able to launch an spark application. In case of spark on YARN, this application is the application master. If Hive fails to launch it, or the applicat

Re: Pros and cons -Saving spark data in hive

2015-12-15 Thread Xuefu Zhang
You might want to consider Hive on Spark where you can work directly with Hive and your query execution is powered by Spark as an engine. --Xuefu On Tue, Dec 15, 2015 at 6:04 PM, Divya Gehlot wrote: > Hi, > I am new bee to Spark and I am exploring option and pros and cons which > one will work

Re: Hive on Spark application will be submited more times when the queue resources is not enough.

2015-12-09 Thread Xuefu Zhang
Hi Jone, Thanks for reporting the problem. When you say there is no enough resource, do you mean that you cannot launch Yarn application masters? I feel that we should error out right way if the application cannot be submitted. Any attempt of resubmitted seems problematic. I'm not sure if there i

Re: Getting error when trying to start master node after building spark 1.3

2015-12-06 Thread Xuefu Zhang
f you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensur

Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-04 Thread Xuefu Zhang
TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > name: u_wsd.t_sd_ucm_cominfo_incremental > &g

Re: Quick Question

2015-12-04 Thread Xuefu Zhang
Create a table with the file and query the table. Parquet is fully supported in Hive. --Xuefu On Fri, Dec 4, 2015 at 10:58 AM, Siva Kanth Sattiraju (ssattira) < ssatt...@cisco.com> wrote: > Hi All, > > Is there a way to read “parquet” file through Hive? > > Regards, > Siva > >

Re: FW: Getting error when trying to start master node after building spark 1.3

2015-12-04 Thread Xuefu Zhang
; make it work. > > > > Thanks > > > > Mich > > > > *From:* Xuefu Zhang [mailto:xzh...@cloudera.com] > *Sent:* 04 December 2015 17:03 > *To:* user@hive.apache.org > *Subject:* Re: FW: Getting error when trying to start master node after > building spar

Re: FW: Getting error when trying to start master node after building spark 1.3

2015-12-04 Thread Xuefu Zhang
My last attempt: 1. Make sure the spark-assembly.jar from your own build doesn't contain hive classes, using "jar -tf spark-assembly.jar | grep hive" command. Copy it to Hive's /lib directory. After this, you can forget everything about this build. 2. Download prebuilt tarball from Spark download

Re: Why there are two different stages on the same query when i use hive on spark.

2015-12-03 Thread Xuefu Zhang
Can you also attach explain query result? What's your data format? --Xuefu On Thu, Dec 3, 2015 at 12:09 AM, Jone Zhang wrote: > Hive1.2.1 on Spark1.4.1 > > *The first query is:* > set mapred.reduce.tasks=100; > use u_wsd; > insert overwrite table t_sd_ucm_cominfo_incremental partition (ds=20151

Re: Building spark 1.3 from source code to work with Hive 1.2.1

2015-12-03 Thread Xuefu Zhang
Mich, To start your Spark standalone cluster, you can just download the tarball from Spark repo site. In other words, you don't need to start your cluster using your build. You only need to spark-assembly.jar to Hive's /lib directory and that's it. I guess you have been confused by this, which I

Re: Hive on spark table caching

2015-12-02 Thread Xuefu Zhang
Depending on the query, Hive on Spark does implicitly cache datasets (not necessarily the input tables) for performance benefits. Such queries include multi-insert, self-join, self-union, etc. However, no caching happens across queries at this time, which may be improved in the future. Thanks, Xue

Re: Problem with getting start of Hive on Spark

2015-12-01 Thread Xuefu Zhang
Link, It seems that you're using Hive 1.2.1, which doesn't support Spark 1.5.2, or at least not tested. Please try Hive master branch if you want to use Spark 1.5.2. If the problem remains, please provide all the commands you run in your Hive session that leads to the failure. Thanks, Xuefu On M

Re: Problem with getting start of Hive on Spark

2015-12-01 Thread Xuefu Zhang
Mich, As I understand, you have a problem with Hive on Spark due to duel network interfaces. I agree that this is something that should be fixed in Hive. However, saying Hive on Spark doesn't work seems unfair. At Cloudera, we have many customers that successfully deployed Hive on Spark on their c

Re: Hive version with Spark

2015-11-29 Thread Xuefu Zhang
Sofia, What specific problem did you encounter when trying spark.master other than local? Thanks, Xuefu On Sat, Nov 28, 2015 at 1:14 AM, Sofia Panagiotidi < sofia.panagiot...@taiger.com> wrote: > Hi Mich, > > > I never managed to run Hive on Spark with a spark master other than local > so I am

Re: Answers to recent questions on Hive on Spark

2015-11-28 Thread Xuefu Zhang
ployees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibility. > > > > *From:* Xuefu Zhang [mailto:xzh...@cloudera.c

Re: Answers to recent questions on Hive on Spark

2015-11-28 Thread Xuefu Zhang
gt; parameter hive.spark.client.server.address > > > > Now I don’t seem to be able to set it up in hive-site.xml or as a set > parameter in hive prompt itself! > > > > Any hint would be appreciated or any work around? > > > > Regards, > > > >

Re: Java heap space occured when the amount of data is very large with the same key on join sql

2015-11-28 Thread Xuefu Zhang
How much data you're dealing with and how skewed it's? The code comes from Spark as far as I can see. To overcome the problem, you have a few things to try: 1. Increase executor memory. 2. Try Hive's skew join. 3. Rewrite your query. Thanks, Xuefu On Sat, Nov 28, 2015 at 12:37 AM, Jone Zhang wr

Re: 答复: Answers to recent questions on Hive on Spark

2015-11-27 Thread Xuefu Zhang
, Wangwenli wrote: > Hi xuefu , > > > > thanks for the information. > > One simple question, *any plan when the hive on spark can be used in > production environment?* > > > > Regards > > wenli > > > > *发件人:* Xuefu Zhang [mailto:xzh...@

Re: Answers to recent questions on Hive on Spark

2015-11-27 Thread Xuefu Zhang
*Sybase ASE 15 Gold Medal Award 2008* > > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > > > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > > Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE

Answers to recent questions on Hive on Spark

2015-11-27 Thread Xuefu Zhang
Hi there, There seemed an increasing interest in Hive On Spark From the Hive users. I understand that there have been a few questions or problems reported and I can see some frustration sometimes. It's impossible for Hive on Spark team to respond every inquiry even thought we wish we could. Howeve

Re: hive1.2.1 on spark connection time out

2015-11-25 Thread Xuefu Zhang
There usually a few more messages before this but after "spark-submit" in hive.log. Do you have spark.home set? On Sun, Nov 22, 2015 at 10:17 PM, zhangjp wrote: > > I'm using hive1.2.1 . I want to run hive on spark model,but there is some > issues. > have been set spark.master=yarn-client; > spa

Re: Write access request to the Hive wiki

2015-11-25 Thread Xuefu Zhang
Hi Aihua, I just granted you the write access to Hive wiki. Let me know if problem remains. Thanks, Xuefu On Wed, Nov 25, 2015 at 10:50 AM, Aihua Xu wrote: > I'd like to request write access to the Hive wiki to update some of the > docs. > > My Confluence user name is aihuaxu. > > Thanks! > Ai

Re: [ANNOUNCE] New PMC Member : John Pullokkaran

2015-11-24 Thread Xuefu Zhang
Congratulations, John! --Xuefu On Tue, Nov 24, 2015 at 3:01 PM, Prasanth J wrote: > Congratulations and Welcome John! > > Thanks > Prasanth > > On Nov 24, 2015, at 4:59 PM, Ashutosh Chauhan > wrote: > > On behalf of the Hive PMC I am delighted to announce John Pullokkaran is > joining Hive PMC

Re: Upgrading from Hive 0.14.0 to Hive 1.2.1

2015-11-24 Thread Xuefu Zhang
This upgrade should be no different from other upgrade. You can use Hive's schema tool to upgrade your existing metadata. Thanks, Xuefu On Tue, Nov 24, 2015 at 10:05 AM, Mich Talebzadeh wrote: > Hi, > > > > I would like to upgrade to Hive 1.2.1 as I understand one cannot deploy > Spark executio

Re: Building Spark to use for Hive on Spark

2015-11-22 Thread Xuefu Zhang
Hive is supposed to work with any version of Hive (1.1+) and a version of Spark w/o Hive. Thus, to make HoS work reliably and also simply the matters, I think it still makes to require that spark-assembly jar shouldn't contain Hive Jars. Otherwise, you have to make sure that your Hive version match

Re: starting spark-shell throws /tmp/hive on HDFS should be writable error

2015-11-20 Thread Xuefu Zhang
This seems belonging to Spark user list. I don't see any relevance to Hive except the directory containing "hive" word. --Xuefu On Fri, Nov 20, 2015 at 1:13 PM, Mich Talebzadeh wrote: > Hi, > > > > Has this been resolved. I don’t think this has anything to do with > /tmp/hive directory permissi

Re: troubleshooting: "unread block data' error

2015-11-19 Thread Xuefu Zhang
Are you able to run queries that are not touching HBase? This problem were seen before but fixed. On Tue, Nov 17, 2015 at 3:37 AM, Sofia wrote: > Hello, > > I have configured Hive to work Spark. > > I have been trying to run a query on a Hive table managing an HBase table > (created via HBaseSto

Re: Do you have more suggestions on when to use Hive on MapReduce or Hive on Spark?

2015-11-04 Thread Xuefu Zhang
Hi Jone, Thanks for trying Hive on Spark. I don't know about your cluster, so I cannot comment too much on your configurations. We do have a "Getting Started" guide [1] which you may refer to. (We are currently updating the document.) Your executor size (cores/memory) seems rather small and not al

Re: Hive on Spark NPE at org.apache.hadoop.hive.ql.io.HiveInputFormat

2015-11-03 Thread Xuefu Zhang
Singh wrote: > This is the virtual machine from Hortonworks. > > The query is this > > select count(*) from sample_07; > > It should run fine with MR. > > I am trying to run on Spark. > > > > > > > On Tue, Nov 3, 2015 at 4:39 PM, Xuefu Zhang wrote:

Re: Hive on Spark NPE at org.apache.hadoop.hive.ql.io.HiveInputFormat

2015-11-02 Thread Xuefu Zhang
That msg could be just noise. On the other hand, there is NPE, which might be the problem you're having. Have you tried your query with MapReduce? On Sun, Nov 1, 2015 at 5:32 PM, Jagat Singh wrote: > One interesting message here , *No plan file found: * > > 15/11/01 23:55:36 INFO exec.Utilities:

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
er the limited > resources. > "15/10/23 17:37:13 Reporter WARN > org.apache.spark.deploy.yarn.YarnAllocator>> Container killed by YARN for > exceeding memory limits. 7.6 GB of 7.5 GB physical memory used. Consider > boosting spark.yarn.executor.memoryOverhead." >

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
operties file in spark, > Spark provided "def persist(newLevel: StorageLevel)" > api only... > > 2015-10-23 19:03 GMT+08:00 Xuefu Zhang : > >> quick answers: >> 1. you can pretty much set any spark configuration at hive using set >> command. >> 2.

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
quick answers: 1. you can pretty much set any spark configuration at hive using set command. 2. no. you have to make the call. On Thu, Oct 22, 2015 at 10:32 PM, Jone Zhang wrote: > 1.How can i set Storage Level when i use Hive on Spark? > 2.Do Spark have any intention of dynamically determine

Re: Hive and Spark on Windows

2015-10-20 Thread Xuefu Zhang
at 11:46 AM, Xuefu Zhang wrote: > >> Yes. You need HADOOP_HOME, which tells Hive how to connect to HDFS and >> get its dependent libraries there. >> >> On Tue, Oct 20, 2015 at 7:36 AM, Andrés Ivaldi >> wrote: >> >>> I've already installed cyg

Re: Hive and Spark on Windows

2015-10-20 Thread Xuefu Zhang
> Does Hive needs hadoop always? or there are some configuration missing? > > Thanks > > On Mon, Oct 19, 2015 at 11:31 PM, Xuefu Zhang wrote: > >> Hi Andres, >> >> We haven't tested Hive on Spark on Windows. However, if you can get Hive >> and Spark t

Re: Hive and Spark on Windows

2015-10-19 Thread Xuefu Zhang
Hi Andres, We haven't tested Hive on Spark on Windows. However, if you can get Hive and Spark to work on Windows, I'd assume that the configuration is no different from on Linux. Let's know if you encounter any specific problems. Thanks, Xuefu On Mon, Oct 19, 2015 at 5:13 PM, Andrés Ivaldi wrot

Re: regarding hiveserver2 DeRegisterWatcher

2015-10-12 Thread Xuefu Zhang
Can you articulate further why HiveServer2 is not working in such an event? What's current behavior and what's expected from an end user's standpoint? Thanks, Xuefu On Mon, Oct 12, 2015 at 6:52 AM, Wangwenli wrote: > > now hiveserver2 has multiple instance register to zookeeper, if zookeeper >

Re: Alias vs Assignment

2015-10-08 Thread Xuefu Zhang
It looks to me that this adds only syntactic suger which doesn't provide much additional value. On the contrary, it might even bring confusion to non-sql-server users. As you have already noted, it's not ISO standard. Writing queries this way actually make them less portable. Personally I'd discour

Fw: read this

2015-09-28 Thread Xuefu Zhang
Hello! New message, please read <http://elatronic.com/story.php?a> Xuefu Zhang

Re: hive on spark query error

2015-09-25 Thread Xuefu Zhang
What's the value of spark.master in your case? The error specifically says something wrong with it. --Xuefu On Fri, Sep 25, 2015 at 9:18 AM, Garry Chen wrote: > Hi All, > > I am following > https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started? > To s

Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-16 Thread Xuefu Zhang
Congratulations, Ashutosh!. Well-deserved. Thanks to Carl also for the hard work in the past few years! --Xuefu On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach wrote: > I am very happy to announce that Ashutosh Chauhan is taking over as the > new VP of the Apache Hive project. Ashutosh has be

Re: Hive on Spark on Mesos

2015-09-09 Thread Xuefu Zhang
Mesos isn't supported for Hive on Spark. We have never attempted to run against it. --Xuefu On Wed, Sep 9, 2015 at 6:12 AM, John Omernik wrote: > In the docs for Hive on Spark, it appears to have instructions only for > Yarn. Will there be instructions or the ability to run hive on spark with

Hive User Group Meeting Singapore

2015-08-31 Thread Xuefu Zhang
Dear Hive users, Hive community is considering a user group meeting during Hadoop World that will be held in Singarpore [1] Dec 1-3, 2015. As I understand, this will be the first time that this meeting ever happens in Asia Pacific even though there is a large user base in that region. As another g

Re: Hive on Spark

2015-08-31 Thread Xuefu Zhang
What you described isn't part of the functionality of Hive on Spark. Rather, Spark is used here as a general purpose engine similar to MR but without intemediate stages. It's batch origientated. Keeping 100T data in memory is hardly beneficial unless you know that that dataset is going to be used

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Xuefu Zhang
Please check out HIVE-11502. For your poc, you can simply get around using other data types instead of double. On Thu, Aug 20, 2015 at 2:08 AM, Nishant Aggarwal wrote: > Thanks for the reply Noam. I have already tried the later point of > dividing the query. But the challenge comes during the jo

Re: Request write access to the Hive wiki

2015-08-10 Thread Xuefu Zhang
s access > too? :) > > Thanks, > > On Mon, Aug 10, 2015 at 2:37 PM, Xuefu Zhang wrote: > >> Done! >> >> On Mon, Aug 10, 2015 at 1:05 AM, Xu, Cheng A >> wrote: >> >>> Hi, >>> >>> I’d like to have write access to the Hive wiki.

Re: Request write access to the Hive wiki

2015-08-10 Thread Xuefu Zhang
Done! On Mon, Aug 10, 2015 at 1:05 AM, Xu, Cheng A wrote: > Hi, > > I’d like to have write access to the Hive wiki. My Confluence username is > cheng.a...@intel.com with Full Name “Ferdinand Xu”. Please help me deal > with it. Thank you! > > > > Regards, > > Ferdinand Xu > > >

Re: Computation timeout

2015-07-29 Thread Xuefu Zhang
your solution :) >> >> Thanks, >> >> >> Loïc >> >> Loïc CHANEL >> Engineering student at TELECOM Nancy >> Trainee at Worldline - Villeurbanne >> >> 2015-07-29 16:14 GMT+02:00 Xuefu Zhang : >> >>> Okay. To confirm, you set it to

Re: Computation timeout

2015-07-29 Thread Xuefu Zhang
I thought the idea of infinite operation was not very >> compatible with the "idle" word (as the operation will not stop running), >> but I'll try :-) >> Thanks for the idea, >> >> >> Loïc >> >> Loïc CHANEL >> Engineering

Re: Computation timeout

2015-07-29 Thread Xuefu Zhang
Have you tried hive.server2.idle.operation.timeout? --Xuefu On Wed, Jul 29, 2015 at 5:52 AM, Loïc Chanel wrote: > Hi all, > > As I'm trying to build a secured and multi-tenant Hadoop cluster with > Hive, I am desperately trying to set a timeout to Hive requests. > My idea is that some users can

Re: Obtain user identity in UDF

2015-07-27 Thread Xuefu Zhang
There is a udf, current_user, which returns a value that can passed to your udf as an input, right? On Mon, Jul 27, 2015 at 1:13 PM, Adeel Qureshi wrote: > Is there a way to obtain user authentication information in a UDF like > kerberos username that they have logged in with to execute a hive q

Re: Error: java.lang.RuntimeException: org.apache.hive.com/esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 380

2015-07-16 Thread Xuefu Zhang
Same as https://issues.apache.org/jira/browse/HIVE-11269? On Thu, Jul 16, 2015 at 7:25 AM, Anupam sinha wrote: > Hi Guys, > > I am writing the simple hive query,Receiving the following error > intermittently. This error > presents itself for 30min-2hr then goes away. > > Appreciate your help to

Re: EXPORTing multiple partitions

2015-06-25 Thread Xuefu Zhang
Hi Brian, If you think that is useful, please feel free to create a JIRA requesting for it. Thanks, Xuefu On Thu, Jun 25, 2015 at 10:36 AM, Brian Jeltema < brian.jelt...@digitalenvoy.net> wrote: > Answering my own question: > > create table foo_copy like foo; > insert into foo_copy partitio

Re: Error using UNION ALL operator on tables of different storage format !!!

2015-06-18 Thread Xuefu Zhang
Sounds like a bug. However, could you reproduce with the latest Hive code? --Xuefu On Thu, Jun 18, 2015 at 8:56 PM, @Sanjiv Singh wrote: > Hi All > > I was trying to combine records of two tables using UNION ALL. > One table testTableText is on TEXT format and another table testTableORC > is on

Hosting Hive User Group Meeting During Hadoop World NY

2015-06-10 Thread Xuefu Zhang
Dear Hive users, Hive community is considering a user group meeting during Hadoop World that will be held in New York at the end of September. To make this happen, your support is essential. First, I'm wondering if any user in New York area would be willing to host the meetup. Secondly, I'm solici

Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

2015-05-27 Thread Xuefu Zhang
I'm afraid you're at the wrong community. You might have a better chance to get an answer in Spark community. Thanks, Xuefu On Wed, May 27, 2015 at 5:44 PM, Sanjay Subramanian < sanjaysubraman...@yahoo.com> wrote: > hey guys > > On the Hive/Hadoop ecosystem we have using Cloudera distribution CD

Re: Hive on Spark VS Spark SQL

2015-05-22 Thread Xuefu Zhang
're not as mature as Hive. > What it depends on Hive for is Metastore, CliDriver, DDL parser, etc. > > Cheolsoo > > On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang wrote: > >> I have been working on HIve on Spark, and knows a little about SparkSQL. >> Here ar

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Xuefu Zhang
I have been working on HIve on Spark, and knows a little about SparkSQL. Here are a few factors to be considered: 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's front end (parser and semantic analyzer) and metastore, and inject in between a laryer where Hive's operator tre

Re: Repeated Hive start-up issues

2015-05-15 Thread Xuefu Zhang
Your namenode is in safe mode, as the exception shows. You need to verify/fix that before trying Hive. Secondly, "!=" may not work as expected. Try "<>" or other simpler query first. --Xuefu On Fri, May 15, 2015 at 6:17 AM, Anand Murali wrote: > Hi All: > > I have installed Hadoop-2.6, Hive 1.

Re: Too many connections from hive to zookeeper

2015-04-29 Thread Xuefu Zhang
This is a known issue and has been fixed in later releases. --Xuefu On Wed, Apr 29, 2015 at 7:44 PM, Shady Xu wrote: > Recently I found in the zookeeper log that there were too many client > connections and it was hive that was establishing more and more connections. > > I modified the max clie

Re: Table Lock Manager: ZooKeeper cluster

2015-04-20 Thread Xuefu Zhang
I'm not a zookeeper expert, but zookeeper is supposed to be characteristics of light-weight, high performance, and fast response. Unless you zookeeper is already overloaded, I don't see why you would need a separate zookeeper cluster just for Hive. There are a few zookeeper usages in Hive, the add

Re: merge small orc files

2015-04-20 Thread Xuefu Zhang
Also check hive.merge.size.per.task and hive.merge.smallfiles.avgsize. On Mon, Apr 20, 2015 at 8:29 AM, patcharee wrote: > Hi, > > How to set the configuration hive-site.xml to automatically merge small > orc file (output from mapreduce job) in hive 0.14 ? > > This is my current configuration> >

Re: [ANNOUNCE] New Hive Committers - Jimmy Xiang, Matt McCline, and Sergio Pena

2015-03-23 Thread Xuefu Zhang
Congratulations to all! --Xuefu On Mon, Mar 23, 2015 at 11:08 AM, Carl Steinbach wrote: > The Apache Hive PMC has voted to make Jimmy Xiang, Matt McCline, and > Sergio Pena committers on the Apache Hive Project. > > Please join me in congratulating Jimmy, Matt, and Sergio. > > Thanks. > > - Car

Re: Hive on Spark

2015-03-16 Thread Xuefu Zhang
482638193 end=1426482732205 duration=94012 > from=org.apache.hadoop.hive.ql.Driver> > 2015-03-16 10:42:12,205 INFO [main]: log.PerfLogger > (PerfLogger.java:PerfLogBegin(121)) - from=org.apache.hadoop.hive.ql.Driver> > 2015-03-16 10:42:12,544 INFO [main]: log.PerfLogger > (Perf

Re: Hive on Spark

2015-03-13 Thread Xuefu Zhang
You need to copy the spark-assembly.jar to your hive/lib. Also, you can check hive.log to get more messages. On Fri, Mar 13, 2015 at 4:51 AM, Amith sha wrote: > Hi all, > > > Recently i have configured Spark 1.2.0 and my environment is hadoop > 2.6.0 hive 1.1.0 Here i have tried hive on Spark w

Re: [hive building error] can't download pentaho-aggdesigner-algorithm-5.1.5-jhyde.pom

2015-03-10 Thread Xuefu Zhang
The aws instance is done. We are working to restore it. Thanks, Xuefu On Tue, Mar 10, 2015 at 12:17 AM, wangzhenhua (G) wrote: > Hi, all, > > When I build hive source using Maven, it gets stuck in: > "Downloading: > http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.3-r

Re: Does any one know how to deploy a custom UDAF jar file in SparkSQL

2015-03-10 Thread Xuefu Zhang
This question seems more suitable to Spark community. FYI, this is Hive user list. On Tue, Mar 10, 2015 at 5:46 AM, shahab wrote: > Hi, > > Does any one know how to deploy a custom UDAF jar file in SparkSQL? Where > should i put the jar file so SparkSQL can pick it up and make it accessible > fo

Re: [ANNOUNCE] Apache Hive 1.1.0 Released

2015-03-09 Thread Xuefu Zhang
Great job, guys! This is a much major release with significant new features and improvement. Thanks to everyone who contributed to make this happen. Thanks, Xuefu On Sun, Mar 8, 2015 at 10:40 PM, Brock Noland wrote: > The Apache Hive team is proud to announce the the release of Apache > Hive v

Re: error: Failed to create spark client. for hive on spark

2015-03-02 Thread Xuefu Zhang
pl.( > SparkClientImpl.java:96) > ... 26 more > Caused by: java.util.concurrent.TimeoutException: Timed out waiting for > client connection. > at org.apache.hive.spark.client.rpc.RpcServer$2.run(RpcServer. > java:134) > at io.netty.util.concurrent.PromiseTask$RunnableAdapter. > c

Re: Where does hive do sampling in order by ?

2015-03-02 Thread Xuefu Zhang
there is no sampling for order by in Hive. Hive uses a single reducer for order by (if you're talking about MR execution engine). Hive on Spark is different for this, thought. Thanks, Xuefu On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang wrote: > Order by usually invoke 2 steps (sampling job and re

Re: error: Failed to create spark client. for hive on spark

2015-03-02 Thread Xuefu Zhang
Could you check your hive.log and spark.log for more detailed error message? Quick check though, do you have spark-assembly.jar in your hive lib folder? Thanks, Xuefu On Mon, Mar 2, 2015 at 5:14 AM, scwf wrote: > Hi all, > anyone met this error: HiveException(Failed to create spark client.) >

Re: Bucket map join - reducers role

2015-02-27 Thread Xuefu Zhang
Could you post your query and "explain your_query" result? On Fri, Feb 27, 2015 at 5:32 AM, murali parimi < muralikrishna.par...@icloud.com> wrote: > Hello team, > > I have two tables A and B. A has 360Million rows with one column K. B has > around two billion rows with multiple columns includin

Re: Union all with a field 'hard coded'

2015-02-21 Thread Xuefu Zhang
> it. (Tech writing by successive approximation.) > > Thanks again. > > -- Lefty > > On Sat, Feb 21, 2015 at 6:27 AM, Xuefu Zhang wrote: > >> I haven't tried union distinct, but I assume the same rule applies. >> >> Thanks for putting it together. It look

Re: Union all with a field 'hard coded'

2015-02-21 Thread Xuefu Zhang
anualUnion-ColumnAliasesforUNIONALL> > . > > Please review it one more time. > > -- Lefty > > On Fri, Feb 20, 2015 at 7:06 AM, Xuefu Zhang wrote: > >> Hi Lefty, >> >> The description seems good to me. I just slightly modified it so that it >> sounds mor

Re: Union all with a field 'hard coded'

2015-02-20 Thread Xuefu Zhang
gt; >> This is a part of standard SQL syntax, isn't it? >> >> On Mon, Feb 2, 2015 at 2:22 PM, Xuefu Zhang wrote: >> >>> Yes, I think it would be great if this can be documented. >>> >>> --Xuefu >>> >>> On Sun, Feb 1,

Re: Does Hive 1.0.0 still support commandline

2015-02-09 Thread Xuefu Zhang
There should be no confusion. While in 1.0 you can still use HiveCLI, you don't have HiveCLI + HiveSever1 option. You will not able to connect HiveServer2 with HiveCLI. Thus, the clarification is: You can only use HiveCLI as a standalone application in 1.0. --Xuefu On Mon, Feb 9, 2015 at 9:17 AM

  1   2   >