Re: JAVA_HOME Cygwin problem (solution doesn't work)

2008-05-24 Thread vikas
Hi,

As far as I understand HADOOP ( by following its scripts like start-all.sh
etc .. a& my experience ) it does not care about the rest of the
environmental variable as long as you provide right values in its
environment file.

ie .. conf/hadoop-env.sh

set some thing similar to get around your problem.

# The java implementation to use.  Required.
 export JAVA_HOME="/cygdrive/c/Program Files/Java/jre1.5.0_11"
#/usr/lib/j2sdk1.5-sun

Regards,
-Vikas.


On Sat, May 24, 2008 at 6:11 AM, vatsan <[EMAIL PROTECTED]> wrote:

>
> I have installed hadoop on cygwin , I am running windows XP.
>
> My Java directory is C:\Program Files\Java\jre1.6.0_06
>
> I am not able to run hadoop as it complains of "no such file or directory
> error".
>
> I did some searching and found out someone had proposed a solution of doing
>
> SET JAVA_HOME=C:\Program Files\Java\jre1.6.0_06
>
> in the Cygwin.bat file,
>
> but that doesn't work for me.
>
> Neither does using the absolute path name "\cygwin\c\Program Files\Java" OR
> using  \cygwin\c\"Program Files"\Java
>
> Can someone guide me here?
>
> (I understand that the problem is because of the path convention conflicts
> in windows and Cygwin, I found some stuff on fixes for the path issues that
> spoke of using cygpath.exe as a fix ... for example while running a java
> program on cygwin, but could not find anything that addressed my problem.)
>
> --
> View this message in context:
> http://www.nabble.com/JAVA_HOME-Cygwin-problem-%28solution-doesn%27t-work%29-tp17443172p17443172.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: Where is the files?

2008-05-07 Thread vikas
it will be mapped to /tmp <--> equivalanet to /tmp in
windows

Regards,
-Vikas.

On Wed, May 7, 2008 at 8:06 PM, hong <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I started Hadoop in standalone mode, and put some file on to HDSF. I
> strictly followed the instructions in Hadoop Quick Start.
>
> HDSF is mapped to a local directory in my local file system, right? and
> where is it?
>
> Thank you in advance!
>
>
>


Re: How to perform FILE IO with Hadoop DFS

2008-05-06 Thread vikas
Please find my response inline,

Thanks,
-Vikas

On Tue, May 6, 2008 at 3:43 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:

> vikas wrote:
>
> > Thank you very much for the right link. It really helped. As many others
> > even I'm waiting for
> > "Append to files in HDFS"
> >
> > Is there any thing which I can do to raise its priority. Does HADOOP
> > Developer community is  tracking any request counter for a particular
> > feature to raise ones priority. if that is the case I would like to add
> > my
> > vote to this :)
> >
>
> Apache projects celebrate community contributions more than just votes.
>
>
> > I've registered to mailing list .. and that gives me previlage of
> > creating
> > JIRA and watching on one. can you tell me how I get into developer
> > community
> > so that if time permits even I can contribute by discussion or code.
> >
> >
> -get on the core-developer list
> -watch how things work. Most discussion is on on specific bugs. Note also
> how hudson tests all patches, rejects anything with no tests or javac,
> javadoc warnings.
> -check out SVN_HEAD and build it
> -start patching stuff on the side, non critical things, so that people
> learn to trust your coding skills. The filesystem is taken very seriously,
> as a failure there could lose petabytes of data.
> -look at the test process. All changes need to fit in there. Even if you
> don't have 500 machines to spare, others do, so design your changes to test
> in that world, and to run on bigger clusters.


I'll Work On this.


>
> Note that Append is not something you are ever going to see on S3 files;
> it's not part of the S3 REST API. So if you assume append everywhere, your
> app wont be so happy on the EC2 farms.
>

I was not able to follow you on this.  from the dependant bug-fixes of
JIRA-1700 (yet to be resolved) what I saw was I'll be getting API on
FileSystem for appending data to the existing file.

That ability will give me an opportunity to create a FileSystem (DFS Java
Object { in my case} ) and update the file of my interest to avoid multiple
small files in the system.

Hope I'm watching on the right issue. correct me if I'm wrong. and if
possible can you detail the terminology used above [ S3 files,  EC2 farms,
S3 REST API ]


> --
> Steve Loughran  http://www.1060.org/blogxter/publish/5
> Author: Ant in Action   http://antbook.org/
>


Re: How to perform FILE IO with Hadoop DFS

2008-05-05 Thread vikas
Thank you very much for the right link. It really helped. As many others
even I'm waiting for
"Append to files in HDFS"

Is there any thing which I can do to raise its priority. Does HADOOP
Developer community is  tracking any request counter for a particular
feature to raise ones priority. if that is the case I would like to add my
vote to this :)

I've registered to mailing list .. and that gives me previlage of creating
JIRA and watching on one. can you tell me how I get into developer community
so that if time permits even I can contribute by discussion or code.

Best regards,
-Vikas


On Mon, May 5, 2008 at 9:43 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

>
> On May 4, 2008, at 6:27 PM, vikas wrote:
>
>  Hi All,
> >
> > I was looking for, how multiple inputs can be written to same output
> > that
> > too at different intervals of time ( ie.. I want to re-open the same
> > file to
> > append data to it )
> >
> > This link did not contain any thing related to my Q.
> > http://issues.apache.org/jira/browse/HADOOP-3149. May be you wanted to
> > suggest me another link.  The above link has the details of how an input
> > can
> > be written to multiple output files.
> >
> >
> My apologies, the correct link is 
> http://issues.apache.org/jira/browse/HADOOP-1700
> - a copy/paste error.
>
>  Is any one working o develop the usability of DFS it would be really
> > effective if DFS operations are allowed on top of which we can use
> > map-reduce functionality.
> >
> > Please correct me if I'm assuming the programming model differently. as
> > of
> > know it looks as if I need to write a separate application  to collect
> > the
> > input & then store it in HADOOP so that it can be processed on
> > multi-node
> > cluster.
> >
> >
> Yes, you will need to 'load' data onto HDFS and then run Map-Reduce
> programs on it.
>
> However, the input to your Map-Reduce program can be a 'directory', thus
> you can load data into the same directory periodically as separate files and
> then when you have all the data, process them.
>
>  I feel it would be more good if one can directly store data to DFS which
> > can
> > be processed. updating the same file will give me an opportunity to
> > avoid
> > multiple small files and a redundant task of merging them to a different
> > file.
> >
> >
> This is a relevant problem and we are currently developing the notion of
> 'archives' to get over multiple small files (which place a fair bit of  load
> on the NameNode).
> http://issues.apache.org/jira/browse/HADOOP-3307 (I'm pretty sure the link
> _is_ right this time around... *smile*)
>
> Arun
>
>
>
>  Thank you very much for your time,
> >
> > -Vikas
> >
> > On Mon, May 5, 2008 at 12:43 AM, Arun C Murthy <[EMAIL PROTECTED]>
> > wrote:
> >
> >  Vikas,
> > >
> > > On May 4, 2008, at 7:51 AM, vikas wrote:
> > >
> > >  Hi All,
> > >
> > > >
> > > > can any one please help me with the technique of writing to the same
> > > > file in
> > > > DFS of Hadoop.
> > > >
> > > > I want to perform insertion, deletion and update on the file of my
> > > > DFS.
> > > >
> > > >
> > > >  HDFS doesn't support file-updates, once written (i.e. after the
> > > file is
> > > 'closed') the file is immutable.
> > >
> > > Appends to a file are coming soon: http://issues.apache.org/jira/
> > > browse/HADOOP-3149.
> > >
> > > Arun
> > >
> > >
> > >
>


Re: How to perform FILE IO with Hadoop DFS

2008-05-04 Thread vikas
Hi All,

I was looking for, how multiple inputs can be written to same output that
too at different intervals of time ( ie.. I want to re-open the same file to
append data to it )

This link did not contain any thing related to my Q.
http://issues.apache.org/jira/browse/HADOOP-3149. May be you wanted to
suggest me another link.  The above link has the details of how an input can
be written to multiple output files.

Is any one working o develop the usability of DFS it would be really
effective if DFS operations are allowed on top of which we can use
map-reduce functionality.

Please correct me if I'm assuming the programming model differently. as of
know it looks as if I need to write a separate application  to collect the
input & then store it in HADOOP so that it can be processed on multi-node
cluster.

I feel it would be more good if one can directly store data to DFS which can
be processed. updating the same file will give me an opportunity to avoid
multiple small files and a redundant task of merging them to a different
file.

Thank you very much for your time,

-Vikas

On Mon, May 5, 2008 at 12:43 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> Vikas,
>
> On May 4, 2008, at 7:51 AM, vikas wrote:
>
>  Hi All,
> >
> > can any one please help me with the technique of writing to the same
> > file in
> > DFS of Hadoop.
> >
> > I want to perform insertion, deletion and update on the file of my DFS.
> >
> >
> HDFS doesn't support file-updates, once written (i.e. after the file is
> 'closed') the file is immutable.
>
> Appends to a file are coming soon: http://issues.apache.org/jira/
> browse/HADOOP-3149.
>
> Arun
>
>


Hoe to perform FILE IO with Hadoop DFS

2008-05-04 Thread vikas
Hi All,

can any one please help me with the technique of writing to the same file in
DFS of Hadoop.

I want to perform insertion, deletion and update on the file of my DFS.

 So how can I get the write access to the file already created. and what is
the best way to update a file.


Thank You,
-Vikas.


New bee quick questions :-)

2008-04-21 Thread vikas
Hi,

I'm new to HADOOP. And aiming to develop good amount of code with it. I've
some quick questions it would be highly appreciable if some one can answer
them.

I was able to run HADOOP in cygwin environment. run the examples both in
standalone mode as well as in a 2 node cluster.

1) How can I over come the difficulty of giving password for SSH logins when
ever DataNodes are getting started.

2) I've put some 1.5 GB of file in my Master node where even a DataNode is
running. I want to see how load balancing can be done so that disk space
will be utilized even from other datanodes.

3) How can I add a new DataNode without stopping HADOOP.

4) Let us suppose I want to shutdown one datanode for maintenance  purpose.
is there any way to inform Hadoop saying that this particular datanode is
going done -- please make sure the data in it is replicated else where ?

5) I was going through some videos on MAP-Reduce and few Yahoo tech talks.
in that they were specifying a Hadoop cluster has multiple cores -- what
does this mean ?

  5.1) can I have multiple instance of namenodes running in a cluster apart
from secondary nodes ?

6) If I go on create huge files will they be balanced among all the
datanodes ? or do I need to change the creation of file location in the
application.

Expecting your kind response,

Thanking you,
-Vikas.