Re: Question on DFS Balancing

2014-03-05 Thread divye sheth
I wont be in a position to fix that depending on HDFS-1804 as we are upgrading to CDH4 in the coming month. Just wanted a short term solution. I have read somewhere that manual movement of the blocks would help. Could some one guide me to the exact steps or precautions I should take while doing

Streaming data access in HDFS: Design Feature

2014-03-05 Thread Radhe Radhe
Hello All, Can anyone please explain what we mean by Streaming data access in HDFS. Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks. Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160

Re: Streaming data access in HDFS: Design Feature

2014-03-05 Thread shashwat shriparv
Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming *Warm Regards_**∞_* * Shashwat

[no subject]

2014-03-05 Thread Avinash Kujur
i am getting thia error while cloning the hadoop trunk code from git.apache.org using terminal. error is: [cloudera@localhost ~]$ git clone git://git.apache.org/hadoop-common.githadoop Initialized empty Git repository in /home/cloudera/hadoop/.git/ fatal: Unable to look up git.apache.org (port

How to solve it ? java.io.IOException: Failed on local exception

2014-03-05 Thread 张超
Hi all, Here is a problem that confuses me. when I use java code to manipulate pseudo-distributed hadoop , it throws an exception: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: localhost/127.0.0.1; destination host is: localhost:9000; I have

Re:

2014-03-05 Thread Nitin Pawar
try this git clone https://github.com/apache/hadoop-common.git hadoop On Wed, Mar 5, 2014 at 1:58 PM, Avinash Kujur avin...@gmail.com wrote: i am getting thia error while cloning the hadoop trunk code from git.apache.org using terminal. error is: [cloudera@localhost ~]$ git clone

RE: Streaming data access in HDFS: Design Feature

2014-03-05 Thread Radhe Radhe
Hi Shashwat, This is an excerpt from Hadoop The Definitive Guide--Tom White Hadoop Streaming Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and

Re: How to solve it ? java.io.IOException: Failed on local exception

2014-03-05 Thread Stanley Shi
which version of hadoop you are using? This is something similar with your error log: http://stackoverflow.com/questions/19895969/can-access-hadoop-fs-through-shell-but-not-through-java-main Regards, *Stanley Shi,* On Wed, Mar 5, 2014 at 4:29 PM, 张超 chao.zh...@dianping.com wrote: Hi all,

Re: Streaming data access in HDFS: Design Feature

2014-03-05 Thread Nitin Pawar
are you asking why data read/write from/to hdfs blocks via mapreduce framework is done in streaming manner? On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe radhe.krishna.ra...@live.comwrote: Hi Shashwat, This is an excerpt from Hadoop The Definitive Guide--Tom White Hadoop Streaming Hadoop

Re: Question on DFS Balancing

2014-03-05 Thread Azuryy Yu
you can write a simple tool to move blocks peer to peer. I had such tool before, but I cannot find it now. background: our cluster is not balanced, load balancer is very slow, so i wrote this tool to move blocks from one node to another node. On Wed, Mar 5, 2014 at 4:06 PM, divye sheth

RE: Streaming data access in HDFS: Design Feature

2014-03-05 Thread Radhe Radhe
Hi Nitin, I believe Hadoop Streaming is different from Streaming Data Access in HDFS. We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks. I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS. Thanks, RR Date: Wed,

Re: Streaming data access in HDFS: Design Feature

2014-03-05 Thread Nitin Pawar
Hadoop streaming allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. In other words, you need not need to learn java programming for writing simple mapreduce program. Where as streaming data access in HDFS is totally different. When

Re: Question on DFS Balancing

2014-03-05 Thread divye sheth
Does this require any downtime? I guess it should and any other precautions that I should take? Thanks Azuryy. On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu azury...@gmail.com wrote: you can write a simple tool to move blocks peer to peer. I had such tool before, but I cannot find it now.

Re:

2014-03-05 Thread Avinash Kujur
after downloading 150 mb it gave this error. error: RPC failed; result=18, HTTP code = 20047 MiB | 24 KiB/s did not get what it means. On Wed, Mar 5, 2014 at 12:30 AM, Nitin Pawar nitinpawar...@gmail.comwrote: try this git clone https://github.com/apache/hadoop-common.git hadoop On Wed, Mar

Re:

2014-03-05 Thread Nitin Pawar
I think you are having issues because of slow network. I would say you checkout the source code from apache svn ex: svn co http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4 you can checkout from the branch you want to work on. On Wed, Mar 5, 2014 at 2:48 PM, Avinash Kujur

Re:

2014-03-05 Thread Mingjiang Shi
It looks like your network is unstable. You may consider download it as a zip from github if you just want a copy of the source code. Try this link: https://github.com/apache/hadoop-common/archive/trunk.zip On Wed, Mar 5, 2014 at 5:18 PM, Avinash Kujur avin...@gmail.com wrote: after

Re: Question on DFS Balancing

2014-03-05 Thread Azuryy Yu
It don't need any downtime. just like Balancer, but this tool move blocks peer to peer. you specified source node and destination node. then start. On Wed, Mar 5, 2014 at 5:12 PM, divye sheth divs.sh...@gmail.com wrote: Does this require any downtime? I guess it should and any other

Re: How to solve it ? java.io.IOException: Failed on local exception

2014-03-05 Thread shashwat shriparv
What is the code that you are trying? *Warm Regards_**∞_* * Shashwat Shriparv* [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv] https://twitter.com/shriparv[image:

[no subject]

2014-03-05 Thread Avinash Kujur
when i am using this command mvn clean install -DskipTests -Pdist its giving this error: [cloudera@localhost ~]$ mvn clean install -DskipTests -Pdist [INFO] Scanning for projects... [INFO] [INFO] BUILD FAILURE [INFO]

Re:

2014-03-05 Thread Mingjiang Shi
Did you execute the command from /home/cloudera? Does it contains the hadoop source code? You need to execute the command from the source code directory. On Wed, Mar 5, 2014 at 6:28 PM, Avinash Kujur avin...@gmail.com wrote: when i am using this command mvn clean install -DskipTests -Pdist

Re:

2014-03-05 Thread Avinash Kujur
[cloudera@localhost hadoop-common-trunk]$ mvn clean install -DskipTests -Pdist [INFO] Scanning for projects... Downloading: http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom [ERROR] The build could not read 1 project - [Help 1] [ERROR]

Re:

2014-03-05 Thread Avinash Kujur
home/cloudera/ contains hadoop files. On Wed, Mar 5, 2014 at 2:40 AM, Avinash Kujur avin...@gmail.com wrote: [cloudera@localhost hadoop-common-trunk]$ mvn clean install -DskipTests -Pdist [INFO] Scanning for projects... Downloading:

Re:

2014-03-05 Thread Avinash Kujur
yes. it has internet access. On Wed, Mar 5, 2014 at 2:47 AM, Mingjiang Shi m...@gopivotal.com wrote: see the error message: Unknown host repo.maven.apache.org - [Help 2] Does your machine has internet access? On Wed, Mar 5, 2014 at 6:42 PM, Avinash Kujur avin...@gmail.com wrote:

Re:

2014-03-05 Thread Mingjiang Shi
it looks like more a connection problem as it complains it cannot access repo.maven.apache.org. On Wed, Mar 5, 2014 at 6:49 PM, Avinash Kujur avin...@gmail.com wrote: yes. it has internet access. On Wed, Mar 5, 2014 at 2:47 AM, Mingjiang Shi m...@gopivotal.com wrote: see the error

Re:

2014-03-05 Thread Mingjiang Shi
see the error message: Unknown host repo.maven.apache.org - [Help 2] Does your machine has internet access? On Wed, Mar 5, 2014 at 6:42 PM, Avinash Kujur avin...@gmail.com wrote: home/cloudera/ contains hadoop files. On Wed, Mar 5, 2014 at 2:40 AM, Avinash Kujur avin...@gmail.com wrote:

Re:

2014-03-05 Thread Avinash Kujur
if i follow repo.maven.apache.org link on my url, it is showing this message : Browsing for this directory has been disabled. View http://search.maven.org/#browse this directory's contents on http://search.maven.org http://search.maven.org/#browse instead. so how can i change the link from

Re:

2014-03-05 Thread Mingjiang Shi
Can you access this link? http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom On Wed, Mar 5, 2014 at 6:54 PM, Avinash Kujur avin...@gmail.com wrote: if i follow repo.maven.apache.org link on my url, it is showing this message :

RE: Benchmarking Hive Changes

2014-03-05 Thread java8964
Are you doing on standalone one box? How large are your test files and how long of the jobs of each type took? Yong From: anth...@mattas.net Subject: Benchmarking Hive Changes Date: Tue, 4 Mar 2014 21:31:42 -0500 To: user@hadoop.apache.org I’ve been trying to benchmark some of the Hive

Re: Benchmarking Hive Changes

2014-03-05 Thread Anthony Mattas
Yes, I'm using the HortonWorks Data Platform 2.0 Sandbox which is a standalone box. But shame on me it looks like the files are both very tiny (46K), I'm seeing about 23 seconds per query, which appears mostly to be starting up MR. So I'm going to find a new data set and try again, is there any

Re: Node manager or Resource Manager crash

2014-03-05 Thread Krishna Kishore Bonagiri
Vinod, One more observation I can share is that all the times the NM or RM is getting killed, I see the following kind of messages in the NM's log 2014-03-05 05:33:23,824 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2014-03-05 05:33:23,824

Fw: Hadoop at ApacheCon Denver

2014-03-05 Thread Melissa Warnkin
Hello Hadoop enthusiasts,  As you are no doubt aware, ApacheCon North America will be held in Denver, Colorado starting on April 7th.  Hadoop has 25 talks and two tutorials!! Check it out here:  http://apacheconnorthamerica2014.sched.org/?s=hadoop. We would love to see you in Denver next

Re: Unable to export hadoop trunk into eclipse

2014-03-05 Thread nagarjuna kanamarlapudi
Can any one help me here ? On Tue, Mar 4, 2014 at 3:23 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Yes I installed.. mvn clean install -DskipTests was successful. Only import to eclipse is failing. On Tue, Mar 4, 2014 at 12:51 PM, Azuryy Yu azury...@gmail.com

Re: Question on DFS Balancing

2014-03-05 Thread Harsh J
You can safely move block files between disks. Follow the instructions here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Mar 4, 2014 at 11:47 PM, divye sheth divs.sh...@gmail.com wrote: Thanks Harsh. The jira is fixed in

Re: Benchmarking Hive Changes

2014-03-05 Thread Olivier Renault
The last iteration of stinger is coming with Tez. The HDP 2 sandbox that you're using is not including Tez. You can add it manually if you would like (doc is available on Hortonworks.com/labs) or it'll be available of the HDP 2.1 sandbox. Kind regards Olivier On 5 Mar 2014 17:15, Anthony Mattas

Re: Using a specific local path in fs.defaultFS (e.g. file:///local/)

2014-03-05 Thread Chris Mildebrandt
Hi, I'm going to bump this question up, but it's looking like I may have to write my own implementation to make this work. Any ways around that using the existing technology? Is this something that would be useful enough to modify the existing LocalFileSystem class and contribute back? Thanks,

Re: unsubscribe

2014-03-05 Thread Ted Yu
See second bullet under https://hadoop.apache.org/mailing_lists.html#User On Wed, Mar 5, 2014 at 11:11 AM, Dibyendu Karmakar dibyendu.d...@gmail.comwrote: unsubscribe -- Dibyendu Karmakar, dibyendu.d...@gmail.com

Re: App Master issue.

2014-03-05 Thread Xuan Gong
The message means it can not connect to ResourceManager. Could you share your configuration ? It might be easier to figure out the real issue. Thanks Xuan Gong On Wed, Mar 5, 2014 at 11:29 AM, Sai Prasanna ansaiprasa...@gmail.comwrote: Hi, I have a five node cluster. One master and 4

Re: Impact of Tez/Spark to MapReduce

2014-03-05 Thread Edward Capriolo
The thing about yarn is you chose what is right for the the workload. For example: Spark may not the right choice if for example join tables do not fit in memory. On Wednesday, March 5, 2014, Anthony Mattas anth...@mattas.net wrote: With Tez and Spark becoming mainstream what does Map Reduce

Re: App Master issue.

2014-03-05 Thread Mingjiang Shi
Hi Sai, A few questions: 1. which version of hadoop are you using? yarn.resourcemanager.hostname is a new configuration which is not available old versions. 2. Does your yarn-site.xml contains yarn.resourcemanager.scheduler.address? If yes, what's the value? 3. or you could access

Re: Impact of Tez/Spark to MapReduce

2014-03-05 Thread Jeff Zhang
I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce

PseudoAuthenticationHandler (HADOOP-10193)

2014-03-05 Thread Eugene Koifman
Hi, it used to be possible to submit a request to a sevlet using org.apache.hadoop.hdfs.web.AuthFilter as POST specifying user.name(simple authentication) as a form parameter. For example, curl -X POST -d 'user.name=foo' 'http://' after https://issues.apache.org/jira/browse/HADOOP-10193,

Re: App Master issue.

2014-03-05 Thread Mingjiang Shi
Sorry, it should be accessing http://node_manager_ip:8042/conf to check the value of yarn.resourcemanager. scheduler.address on the node manager. On Thu, Mar 6, 2014 at 9:36 AM, Mingjiang Shi m...@gopivotal.com wrote: Hi Sai, A few questions: 1. which version of hadoop are you using?

Re: The best practice of migrating hadoop 1.0.1 to hadoop 2.2.3

2014-03-05 Thread Azuryy Yu
Hi, 1) Is it possible to do an in-place migration, while keeping all data in HDFS safely? yes. stop the HDFS firstly, then run start-dfs.sh -upgrade 2) If it is yes, is there any doc/guidance to do this? you just want a HDFS upgrade, so I don't think there are some useful doc. 3)

[no subject]

2014-03-05 Thread Avinash Kujur
hi, i am getting error in between when downloading all th jars usng maven command: mvn clean install -DskipTests -Pdist the error is: [INFO] --- hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) @ hadoop-common --- [WARNING] [protoc, --version] failed with error code 1 help me

Re: The best practice of migrating hadoop 1.0.1 to hadoop 2.2.3

2014-03-05 Thread Mingjiang Shi
Hi Jerry, R efer to the following links for reference: http://www.michael-noll.com/blog/2011/08/23/performing-an-hdfs-upgrade-of-an-hadoop-cluster/ http://wiki.apache.org/hadoop/Hadoop_Upgrade Notes: 1. the hadoop version used in the doc may be different from yours, but they are good references

Re:

2014-03-05 Thread Gordon Wang
Do you have protobuf installed on your build box? you can use which protoc to check. Looks like protobuf is missing. On Thu, Mar 6, 2014 at 2:55 PM, Avinash Kujur avin...@gmail.com wrote: hi, i am getting error in between when downloading all th jars usng maven command: mvn clean install

Re:

2014-03-05 Thread Avinash Kujur
yes. protobuf is installed. libprotoc 2.4.1 i checked. On Wed, Mar 5, 2014 at 11:04 PM, Gordon Wang gw...@gopivotal.com wrote: Do you have protobuf installed on your build box? you can use which protoc to check. Looks like protobuf is missing. On Thu, Mar 6, 2014 at 2:55 PM, Avinash Kujur