Read/write dependency wrt total data size on hdfs
Hi, I am storing data on a HDFS cluster(4 machines). I have seen that read/write is not very much effected with the size of data on HDFS (Total data size of HDFS). I have used 20-30% of cluster and didn't completely filled it. Can someone explain me why its so and HDFS promises such feature or I am missing some stuff? Thanks, wasim
Max. Possible No. of Files
Hi, Does someone has some data regarding maximum possible number of files over HDFS ? my second question is, I created small files with small block size up to one lac and read the files from HDFS, reading performance remains almost unaffected with increasing number of files. The possible reasons I could think are: 1 . One lac isn't a big number to disturb HDFS performance (I used 1 namenode and 4 data nodes) 2. As reading is done directly from datanode with first time interaction with namenode, so reading from different nodes doesn't affect the performance. If someone could add or negate some information it will be highly appreciated. Cheers, Wasim
HDFS data for HBase and other projects
Hi, If we have already data stored in HDFS. Which of following sub-projects can use this data for further processing/operations: 1.. Pig 2.. HBase 3.. ZooKeeper 4.. Hive 5.. Any other Hadoop related project Thanks, Wasim
Append in Hadoop
Hi, Can someone tell about Append functionality in Hadoop. Is it available now in 0.20 ?? Regards, Wasim
Some Storage communication related questions
Hi, I have multiple questions: Does hadoop use some parallel technique for CopyFromLocal and CopyToLocal (like DistCp) Or its simple ONE stream writing? For Amazon S3 to Local system communication, Hadoop uses Rest service interface or SOAP ? Are there some new storage systems currently in pipeline to be interfaced with hadoop ? Thanks, Wasim
File Transfer Rates
Hi, Could someone help me to find some real Figures (transfer rate) about Hadoop File transfer from local filesystem to HDFS, S3 etc and among Storage Systems (HDFS to S3 etc) Thanks, Wasim
Hadoop-KFS-FileSystem API
Hi, I am looking to use KFS as storage with Hadoop FileSystem API. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/kfs/package-summary.html This page states about KFS usage with Hadoop and stated as last step to run map/reduce tracker. Is it necessary to turn it on? How only storage works with FileSystem API ? Thanks Wasim
Append Algorithm
Hadoop Append (0.19) works in same way as GFS does? GFS algorithm is below: Application originates record append request. 2. GFS client translates request and sends it to master. 3. Master responds with chunk handle and (primary + secondary) replica locations. 4. Client pushes write data to all locations. Primary checks if record fits in specified chunk. 6. If record does not fit, then the primary: . pads the chunk, . tells secondaries to do the same, . and informs the client. . Client then retries the append with the nextchunk. 7. If record fits, then the primary: . appends the record, . tells secondaries to do the same, . receives responses from secondaries, . and sends final response to the client.
Hadoop copy on same cluster
Hi, Is there any API which COPY files from one folder to another on same HADOOP cluster( DistCp can be used but its not effective with performance) Sth like CopyFromLocal but with source and destination both on same hadoop cluster. Cheers, Wasim
Conf Object witout hadoop-default.xml and hadoop-site.xml
Hi, Is it possible: I create Configuration object without hadoop-default.xml and hadoop-site.xml files and after creation set the values in Configuration Object? If yes which are the values that I need to set in configuration Object to get FileSystem Object. Thanks, Wasim
FileSystem.append and FSDataOutputStream.seek
Hello, Does anyone know when Hadoop team has plan to Implement FileSystem.append(Path) functionality and Something seekable with FSDataOutputStream (mean seek capability) ? On which forum we can ask for some functionalities inclusion ? Thanks, Wasim
Anything like RandomAccessFile in Hadoop FS ?
Hi, Is there any Utility for Hadoop files which can work same as RandomAccessFile in Java ? Thanks, Wasim
DistCp and CopyFiles
Hi, In 0.18 CopyFiles.java(0.17) is changed with DistCp.java. Is there any difference between these ? Thanks, Wasim
DistCp 0.18 Vs DistCp 0.17
Hi, The package for DistCp in 0.18 is: " org.Apache.Hadoop.tools". Is it same in 0.17 or different one ? is there any difference among these two versions for DistCp ? Thanks, Wasim
HDFS from non-hadoop Program
Hello, I am trying to access HDFS from a non-hadoop program using java. When I try to get Configuration file, it shows exception both in DEBUG mode and normal one: org.apache.hadoop.conf.Configuration: java.io.IOException: config()at org.apache.hadoop.conf.Configuration.(Configuration.java:156) With the same Configuration files when I try to access from a single stand alone program, it runs perfectly fine. Some people posted same issues before but no solution is posted. anyone found the solution ? Thanks wasim
HDFS Login Security
Hi, Do we have any Java class for Login purpose to HDFS programmatically like traditional UserName/Password mechanism ? or we can have only system user or user who started NameNode ? Thanks, Wasim
Data Transfer mechanism between different clusters
Hello All, what kind of support Hadoop provides for data transfer between more than one cluster residing on different geographical locations (might be by using WAN) ? is there any fast and efficient method available ? (Like GridFTP in Globus ) Thanks, Wasim
HDFS read/write programmatically
Hi, I have configured HDFS on windows and running it using Cygwin. I am interested to access programmatically the files and folders in HDFS. ( mean I can read/write files in HDFS using Java code). I used this example http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample . Code is running fine. its loads Config file. But this doesn't write to HDFS rather it writes to C Drive. I provided hdfs://user/A/B (hdfs is my FS Name) as argument but still its writing in C drive and when I provide same argument for reading a file , its says File not found. if some one can guide me.. its looks like some Path issue Thanks
Hadoop on Suse
Hi, Anyone experience with installing Hadoop or HDFS on Suse Linux? Thanks
Re: HDFS Vs KFS
KFS is also another Distributed file system implemented in C++. Here you can get details: http://kosmosfs.sourceforge.net/ -- From: "rae l" <[EMAIL PROTECTED]> Sent: Thursday, August 21, 2008 4:52 PM To: Subject: Re: HDFS Vs KFS On Thu, Aug 21, 2008 at 9:44 PM, Wasim Bari <[EMAIL PROTECTED]> wrote: Hi, Can some expert differentiate or compare HDFS with KFS ? Apparently it looks like similar architecture with little difference and same objective. What's KFS? Which KFS? Here all ones know HDFS, but someone like me didn't know KFS, please specify which KFS in detail.
HDFS Vs KFS
Hi, Can some expert differentiate or compare HDFS with KFS ? Apparently it looks like similar architecture with little difference and same objective. Thanks, Wasim
Hadoop DFS
Hi, I am new to Hadoop. Right now, I am Only interested to Work with Hadoop DFS. Can some one guide me where to start? Anyone has information about some application has already integrated Hadoop DFS ? Any information regarding Material about Hadoop DFS, case studies, Articles, books etc will be very nice. Thanks, Wasim