The benefit of using the partitioned approach is really nicely described in the
oreilly book Programming Hive. (Thanks for writing it Edward)
For me the ability to drop a single partition if there's any doubt about the
quality of the data of just one job is a large benefit.
From: Edward
Hi Edward, All,
Thanks for the quick reply!
We are using dynamic partitions - so unable to say to which partition each
record goes. We dont have much control here.
Is there any properties that can be set ?
I'm a bit doubtful here - is it because of the lock acquired on the table ?
Regards,
You'll face all the usual concurrency synchronization risks if you're
updating the same place concurrently. One thing to keep in mind; it's all
just HDFS under the hood. That pretty much tells you everything you need to
know. Yes, there's also the metadata. So, one way to update a partition
Hello all:
I'm working on adding Hadoop as a data source to our query tool (ODBC
connection against a Cloudera virtual machine (written in .NET)). Hive .9
is installed.
I've got a question about views from the Hive documentation.
Hi!
After hitting the curse of the last reducer many times on LEFT OUTER
JOIN queries, and trying to think about it, I came to the conclusion
there's something I am missing regarding how keys are handled in mapred
jobs.
The problem shows when I have table A containing billions of rows with
Hi David
An explain extended would give you the exact pointer.
From my understanding, this is how it could work.
You have two tables then two different map reduce job would be processing
those. Based on the join keys, combination of corresponding columns would be
chosen as key from mapper1
On 24 Jan 2013, at 18:16, bejoy...@yahoo.com wrote:
Hi David
An explain extended would give you the exact pointer.
From my understanding, this is how it could work.
You have two tables then two different map reduce job would be
processing those. Based on the join keys, combination of
Hi David,
The default partitioner used in map reduce is the hash partitioner. So based on
your keys they are send to a particular reducer.
May be in your current data set, the keys that have no values in table are all
falling in the same hash bucket and hence being processed by the same
Hi,
Do anyone know the meaning of these hive settings? The description of them are
not clear to me. If someone can give me an example of how they shall be used,
it would be great!
property
namehive.limit.row.max.size/name
value10/value
descriptionWhen trying a smaller subset of data
Hi James.
Basically if we have a table called table A which is mapped to a directory in
hive /data/a . And n is the number of the files under /data/a with each row
size s.
hive -e select * from a limit 10
To show the result very fast
hive.limit.optimize.limit.file n
in this case will be
Hi All,
I'm working on hive 0.8.1. and meet following problem.
I use function substr(item,-4,1) to process one item in hive table, and
there is one row in which the content of the item is
ba_s0一朝忽觉京梦醒,半世浮沉雨打萍--衣俊卿小n实录010, then the job failed.
I checked the task log, it appeared
Hi,
I want to call hive in c# how can I do that ? I found hive ODBC driver but not
getting any downloadables.Can anyone give proper link to download hive ODBC
driver .
Regards,
Chhaya Vishwakarma
The contents of this e-mail and any attachment(s) may contain
Hi Yu Yang
have a look at this issue: https://issues.apache.org/jira/browse/HIVE-2722
2013/1/25 Yu Yang clouder...@gmail.com
Hi All,
I'm working on hive 0.8.1. and meet following problem.
I use function substr(item,-4,1) to process one item in hive table, and
there is one row in which the
hive has a feature for data sampling where you actually don't read the
entire table but sample of the table.
I suppose these parameters belong to those queries.
more you can read at
https://cwiki.apache.org/Hive/languagemanual-sampling.html
On Fri, Jan 25, 2013 at 4:42 AM, Wu, James C.
see if any of the below drivers help you
https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll
http://nuget.org/List/Packages/Hive.Sharp.Lib
On Fri, Jan 25, 2013 at 9:47 AM, Chhaya Vishwakarma
chhaya.vishwaka...@lntinfotech.com wrote:
Hi,
I want to call hive in c# how
On 24 Jan 2013, at 20:39, bejoy...@yahoo.com wrote:
Hi David,
The default partitioner used in map reduce is the hash partitioner. So
based on your keys they are send to a particular reducer.
May be in your current data set, the keys that have no values in table
are all falling in the same
set mapred.min.split.size=1024000;
set mapred.max.split.size=4096000;
set hive.merge.mapfiles=false;
I had set above value and setting max split size to a lower value did
increase my # number of maps. My blocksize was 128MB
Only thing was my files on hdfs were not heavily compressed and I was
Hi David,
What file format and compression type are you using ?
Mathieu
Le 25 janv. 2013 à 07:16, David Morel dmore...@gmail.com a écrit :
Hello,
I have seen many posts on various sites and MLs, but didn't find a firm
answer anywhere: is it possible yes or no to force a smaller split size
In most cases you want bigger splits because having lots of small tasks
plays havoc on the job tracker. I have found that jobs with thousands of
short lived map tasks tend to monopolize the slots. in other versions of
hive the default was not CombineHiveInputFormat I think in most cases you
want
Not all files are split-table Sequence Files are. Raw gzip files are not.
On Fri, Jan 25, 2013 at 1:47 AM, Nitin Pawar nitinpawar...@gmail.comwrote:
set mapred.min.split.size=1024000;
set mapred.max.split.size=4096000;
set hive.merge.mapfiles=false;
I had set above value and setting max
20 matches
Mail list logo