RE: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Bennie Schut
The benefit of using the partitioned approach is really nicely described in the oreilly book Programming Hive. (Thanks for writing it Edward) For me the ability to drop a single partition if there's any doubt about the quality of the data of just one job is a large benefit. From: Edward

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Krishnan K
Hi Edward, All, Thanks for the quick reply! We are using dynamic partitions - so unable to say to which partition each record goes. We dont have much control here. Is there any properties that can be set ? I'm a bit doubtful here - is it because of the lock acquired on the table ? Regards,

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Dean Wampler
You'll face all the usual concurrency synchronization risks if you're updating the same place concurrently. One thing to keep in mind; it's all just HDFS under the hood. That pretty much tells you everything you need to know. Yes, there's also the metadata. So, one way to update a partition

View metadata

2013-01-24 Thread Todd Wilson
Hello all: I'm working on adding Hadoop as a data source to our query tool (ODBC connection against a Cloudera virtual machine (written in .NET)). Hive .9 is installed. I've got a question about views from the Hive documentation.

An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel
Hi! After hitting the curse of the last reducer many times on LEFT OUTER JOIN queries, and trying to think about it, I came to the conclusion there's something I am missing regarding how keys are handled in mapred jobs. The problem shows when I have table A containing billions of rows with

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks
Hi David An explain extended would give you the exact pointer. From my understanding, this is how it could work. You have two tables then two different map reduce job would be processing those. Based on the join keys, combination of corresponding columns would be chosen as key from mapper1

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel
On 24 Jan 2013, at 18:16, bejoy...@yahoo.com wrote: Hi David An explain extended would give you the exact pointer. From my understanding, this is how it could work. You have two tables then two different map reduce job would be processing those. Based on the join keys, combination of

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks
Hi David, The default partitioner used in map reduce is the hash partitioner. So based on your keys they are send to a particular reducer. May be in your current data set, the keys that have no values in table are all falling in the same hash bucket and hence being processed by the same

about hive limit optimization settings

2013-01-24 Thread Wu, James C.
Hi, Do anyone know the meaning of these hive settings? The description of them are not clear to me. If someone can give me an example of how they shall be used, it would be great! property namehive.limit.row.max.size/name value10/value descriptionWhen trying a smaller subset of data

Re: about hive limit optimization settings

2013-01-24 Thread Abdelrhman Shettia
Hi James. Basically if we have a table called table A which is mapped to a directory in hive /data/a . And n is the number of the files under /data/a with each row size s. hive -e select * from a limit 10 To show the result very fast hive.limit.optimize.limit.file n in this case will be

substr() index out of range exception in hive 0.8.1

2013-01-24 Thread Yu Yang
Hi All, I'm working on hive 0.8.1. and meet following problem. I use function substr(item,-4,1) to process one item in hive table, and there is one row in which the content of the item is ba_s0一朝忽觉京梦醒,半世浮沉雨打萍--衣俊卿小n实录010, then the job failed. I checked the task log, it appeared

Hive ODBC driver

2013-01-24 Thread Chhaya Vishwakarma
Hi, I want to call hive in c# how can I do that ? I found hive ODBC driver but not getting any downloadables.Can anyone give proper link to download hive ODBC driver . Regards, Chhaya Vishwakarma The contents of this e-mail and any attachment(s) may contain

Re: substr() index out of range exception in hive 0.8.1

2013-01-24 Thread 曹坤
Hi Yu Yang have a look at this issue: https://issues.apache.org/jira/browse/HIVE-2722 2013/1/25 Yu Yang clouder...@gmail.com Hi All, I'm working on hive 0.8.1. and meet following problem. I use function substr(item,-4,1) to process one item in hive table, and there is one row in which the

Re: about hive limit optimization settings

2013-01-24 Thread Nitin Pawar
hive has a feature for data sampling where you actually don't read the entire table but sample of the table. I suppose these parameters belong to those queries. more you can read at https://cwiki.apache.org/Hive/languagemanual-sampling.html On Fri, Jan 25, 2013 at 4:42 AM, Wu, James C.

Re: Hive ODBC driver

2013-01-24 Thread Nitin Pawar
see if any of the below drivers help you https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll http://nuget.org/List/Packages/Hive.Sharp.Lib On Fri, Jan 25, 2013 at 9:47 AM, Chhaya Vishwakarma chhaya.vishwaka...@lntinfotech.com wrote: Hi, I want to call hive in c# how

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread David Morel
On 24 Jan 2013, at 20:39, bejoy...@yahoo.com wrote: Hi David, The default partitioner used in map reduce is the hash partitioner. So based on your keys they are send to a particular reducer. May be in your current data set, the keys that have no values in table are all falling in the same

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Nitin Pawar
set mapred.min.split.size=1024000; set mapred.max.split.size=4096000; set hive.merge.mapfiles=false; I had set above value and setting max split size to a lower value did increase my # number of maps. My blocksize was 128MB Only thing was my files on hdfs were not heavily compressed and I was

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Mathieu Despriee
Hi David, What file format and compression type are you using ? Mathieu Le 25 janv. 2013 à 07:16, David Morel dmore...@gmail.com a écrit : Hello, I have seen many posts on various sites and MLs, but didn't find a firm answer anywhere: is it possible yes or no to force a smaller split size

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Edward Capriolo
In most cases you want bigger splits because having lots of small tasks plays havoc on the job tracker. I have found that jobs with thousands of short lived map tasks tend to monopolize the slots. in other versions of hive the default was not CombineHiveInputFormat I think in most cases you want

Re: Real-life experience of forcing smaller input splits?

2013-01-24 Thread Edward Capriolo
Not all files are split-table Sequence Files are. Raw gzip files are not. On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar nitinpawar...@gmail.comwrote: set mapred.min.split.size=1024000; set mapred.max.split.size=4096000; set hive.merge.mapfiles=false; I had set above value and setting max