Re: doubts reg Hive

2012-10-01 Thread sudha sadhasivam
Sir
I want to retrieve names of fields having a particular value. 
How can it be done on a single table and across multiple tables?

G Sudha
--- On Mon, 10/1/12, Harsh J ha...@cloudera.com wrote:

From: Harsh J ha...@cloudera.com
Subject: Re: doubts reg Hive
To: common-user@hadoop.apache.org
Date: Monday, October 1, 2012, 11:13 AM

Sudha,

On Mon, Oct 1, 2012 at 9:31 AM, sudha sadhasivam
sudhasadhasi...@yahoo.com wrote:
 We are doing a project in Hive.
            Given a field / value  is it possible to find the corresponding 
headers (meta data).
             For example if we have a table with id, user, work_place, 
residence_place
  given a value New York we need to display the headers where New York 
appears ( for eg work_place, residence_place etc.

    Kindly intimate whether  it is possible. If so what is the command for the 
same?

It is certainly possible and is a trivial requirement. All you're
doing is a filtering here, across multiple columns (OR-wise). Perhaps
something like (Pardon if naive/wrong): SELECT * FROM people WHERE
residence_place LIKE '%New York%' OR work_place LIKE '%New York%';

In any case, for Hive questions, you are better off asking the
u...@hive.apache.org lists than the Hadoop user lists here.

-- 
Harsh J


AUTO: Yuan Jin is out of the office. (returning 10/08/2012)

2012-10-01 Thread Yuan Jin


I am out of the office until 10/08/2012.

I am out of office. I will reply you after the holiday.


Note: This is an automated response to your message
java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
sent on 01/10/2012 21:32:09.

This is the only notification you will receive while this person is away.

Add file to distributed cache

2012-10-01 Thread Abhishek
Hi all 

How do you add a small file to distributed cache in MR program 

Regards
Abhi

Sent from my iPhone


Re: Add file to distributed cache

2012-10-01 Thread Bejoy KS
Hi Abshiek

You can find a simple example of using Distributed Cache here
http://kickstarthadoop.blogspot.co.uk/2011/05/hadoop-for-dependent-data-splits-using.html
--Original Message--
From: Abhishek
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Add file to distributed cache
Sent: Oct 2, 2012 05:44

Hi all 

How do you add a small file to distributed cache in MR program 

Regards
Abhi

Sent from my iPhone


Regards
Bejoy KS

Sent from handheld, please excuse typos.


Re: Which hardware to choose

2012-10-01 Thread Alexander Pivovarov
Privet Oleg

Cloudera and Dell setup the following cluster for my company
Company receives 1.5 TB raw data per day

38 data nodes + 2 Name Nodes

Data Node:
Dell PowerEdge C2100 series
2 x XEON x5670
48 GB RAM ECC  (12x4GB 1333MHz)
12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
Intel Gigabit ET Dual port PCIe x4
Redundant Power Supply
Hadoop CDH3
max map tasks 24
max reduce tasks 8

Name Node and Secondary Name Node are the similar but
96GB RAM  (not sure why)
6x600Gb 15 RPM Serial SCSI
RAID10


another config is here
page 298
http://books.google.com/books?id=Wu_xeGdU4G8Cpg=PA298lpg=PA298dq=hadoop+jbodsource=blots=i7xVQBPb_wsig=8mhq-MtpkRcTiRB1ioKciMxIasghl=ensa=Xei=AGtqUMK6D8T10gHD4ICQAQved=0CEMQ6AEwAg#v=onepageq=hadoop%20jbodf=false


you probably need just 1 computer with 10 x 2 TB SATA HDD



On Mon, Oct 1, 2012 at 6:02 PM, Oleg Ruchovets oruchov...@gmail.com wrote:

 Hi ,
   We are on a very early stage of our hadoop project and want to do a POC.

 We have ~ 5-6 terabytes of row data and we are going to execute some
 aggregations.

 We plan to use  8 - 10 machines

 Questions:

   1)  Which hardware should we use:
 a) How many discs , what discs is better to use?
 b) How many RAM?
 c) How many CPUs?


2) Please share best practices and tips / tricks related to utilise
 hardware using for hadoop projects.

 Thanks in advance
 Oleg.