Re: doubts reg Hive
Sir I want to retrieve names of fields having a particular value. How can it be done on a single table and across multiple tables? G Sudha --- On Mon, 10/1/12, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: doubts reg Hive To: common-user@hadoop.apache.org Date: Monday, October 1, 2012, 11:13 AM Sudha, On Mon, Oct 1, 2012 at 9:31 AM, sudha sadhasivam sudhasadhasi...@yahoo.com wrote: We are doing a project in Hive. Given a field / value is it possible to find the corresponding headers (meta data). For example if we have a table with id, user, work_place, residence_place given a value New York we need to display the headers where New York appears ( for eg work_place, residence_place etc. Kindly intimate whether it is possible. If so what is the command for the same? It is certainly possible and is a trivial requirement. All you're doing is a filtering here, across multiple columns (OR-wise). Perhaps something like (Pardon if naive/wrong): SELECT * FROM people WHERE residence_place LIKE '%New York%' OR work_place LIKE '%New York%'; In any case, for Hive questions, you are better off asking the u...@hive.apache.org lists than the Hadoop user lists here. -- Harsh J
AUTO: Yuan Jin is out of the office. (returning 10/08/2012)
I am out of the office until 10/08/2012. I am out of office. I will reply you after the holiday. Note: This is an automated response to your message java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING sent on 01/10/2012 21:32:09. This is the only notification you will receive while this person is away.
Add file to distributed cache
Hi all How do you add a small file to distributed cache in MR program Regards Abhi Sent from my iPhone
Re: Add file to distributed cache
Hi Abshiek You can find a simple example of using Distributed Cache here http://kickstarthadoop.blogspot.co.uk/2011/05/hadoop-for-dependent-data-splits-using.html --Original Message-- From: Abhishek To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Add file to distributed cache Sent: Oct 2, 2012 05:44 Hi all How do you add a small file to distributed cache in MR program Regards Abhi Sent from my iPhone Regards Bejoy KS Sent from handheld, please excuse typos.
Re: Which hardware to choose
Privet Oleg Cloudera and Dell setup the following cluster for my company Company receives 1.5 TB raw data per day 38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit ET Dual port PCIe x4 Redundant Power Supply Hadoop CDH3 max map tasks 24 max reduce tasks 8 Name Node and Secondary Name Node are the similar but 96GB RAM (not sure why) 6x600Gb 15 RPM Serial SCSI RAID10 another config is here page 298 http://books.google.com/books?id=Wu_xeGdU4G8Cpg=PA298lpg=PA298dq=hadoop+jbodsource=blots=i7xVQBPb_wsig=8mhq-MtpkRcTiRB1ioKciMxIasghl=ensa=Xei=AGtqUMK6D8T10gHD4ICQAQved=0CEMQ6AEwAg#v=onepageq=hadoop%20jbodf=false you probably need just 1 computer with 10 x 2 TB SATA HDD On Mon, Oct 1, 2012 at 6:02 PM, Oleg Ruchovets oruchov...@gmail.com wrote: Hi , We are on a very early stage of our hadoop project and want to do a POC. We have ~ 5-6 terabytes of row data and we are going to execute some aggregations. We plan to use 8 - 10 machines Questions: 1) Which hardware should we use: a) How many discs , what discs is better to use? b) How many RAM? c) How many CPUs? 2) Please share best practices and tips / tricks related to utilise hardware using for hadoop projects. Thanks in advance Oleg.