Re: hadoop File loading

2011-11-22 Thread Jeff Zhang
It will work as long as you consider the xml tag boundary in your RecordReader. On Tue, Nov 22, 2011 at 9:20 AM, hari708 hari...@gmail.com wrote: Hi, I have a big file consisting of XML data.the XML is not represented as a single line in the file. if we stream this file using ./hadoop dfs

maven for hadoop

2011-11-22 Thread mohmmadanis moulavi
Hi all, How to create maven local repository for hadoop. I want to work with ant to build the code on  machine which has proxy, what I can i do for it? please reply   Thanks Regards, Mohmmadanis Moulavi

RE: Hadoop Metrics

2011-11-22 Thread Ravi teja ch n v
Hi Paolo, If you are using versions later than 23.0, then you can refer to the Hadoop Definitive Guide 2nd Edition, The metrics in the latest versions have changed a little bit,and documentation is awaited for Next Gen Mapreduce. Regards, Ravi Teja

Re: Regarding loading a big XML file to HDFS

2011-11-22 Thread Joey Echeverria
If your file is bigger than a block size (typically 64mb or 128mb), then it will be split into more than one block. The blocks may or may not be stored on different datanodes. If you're using a default InputFormat, then the input will be split between two task. Since you said you need the whole

Re: Regarding loading a big XML file to HDFS

2011-11-22 Thread Mridul Muralidharan
You cannot determine start of an xml document from a collection of xml documents (in the dfs file) if you start at some arbitrary point within it the collection (unless some data specific hints are used). Regards, Mridul On Tuesday 22 November 2011 09:28 AM, Michael Segel wrote: Just

Build failure

2011-11-22 Thread Oscar Kene
Hello, I'm a fairly novice hadoop user. I'm trying to compile on a completely newly installed debian squeeze machine. Getting the following error. I'm leaning to the repositories not being what is expected. Is this a problem on my side? [INFO]

Re: part-* output sorted?

2011-11-22 Thread Harsh J
Unless you use a TotalOrderPartitioner, the outputs are only sorted per partition file. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html for achieving a total order sort output, which would give you what you want. On Tue, Nov 22, 2011

Capacity Planning using Dfsadmin command

2011-11-22 Thread Abhishek Pratap Singh
Hi, I have a query about hadoop, this is very important for capacity planning and requesting new hardware. Hadoop command: *hadoop dfsadmin –report* depicts the DFS usage and DFS remaning. I have checked that the DFS usage does not match the size of *hadoop dfs –dus* on root or user

Re: Hadoop next stable release

2011-11-22 Thread Konstantin Boudnik
We are expecting to release 0.22 very shortly. 0.22 is suppose to be considered stable because it has been heavily tested at scale by eBay team (as far as I know). However, I will let 0.22's RM to comment on that. Cos On Tue, Nov 22, 2011 at 12:05PM, Niranjan Balasubramanian wrote: Hello We

How is network distance for nodes calculated

2011-11-22 Thread Edmon Begoli
I am reading Hadoop Definitive Guide 2nd Edition and I am struggling to figure out the exact Hadoop's formula for network distance calculation (page 64/65). (I have my guesses, but I would like to know the exact formula) There is an example showing following distances: For example, imagine a

Re: part-* output sorted?

2011-11-22 Thread Leon Mergen
Hello, On Tue, Nov 22, 2011 at 3:30 PM, Harsh J ha...@cloudera.com wrote: Unless you use a TotalOrderPartitioner, the outputs are only sorted per partition file. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html for achieving a

Re: Hadoop next stable release

2011-11-22 Thread Arun C Murthy
On Nov 22, 2011, at 12:05 PM, Niranjan Balasubramanian wrote: Hello We are currently using hadoop 0.20.203 on a 10 node cluster. We are considering upgrading to a newer version and I have two questions in this regard. 1) It seems 0.21 is unlikely to become a stable release anytime

Re: Hadoop next stable release

2011-11-22 Thread sridhar basam
On Tue, Nov 22, 2011 at 3:05 PM, Niranjan Balasubramanian niran...@cs.washington.edu wrote: Hello We are currently using hadoop 0.20.203 on a 10 node cluster. We are considering upgrading to a newer version and I have two questions in this regard. 1) It seems 0.21 is unlikely to become a

replication question

2011-11-22 Thread Rita
Hello, I am using hbase and I have a default replication factor of 2. Now, if I change the directory replication factor will all the new files being created there be automatically be replicated as 3? -- --- Get your facts first, then you can distort them as you please.--

Re: replication question

2011-11-22 Thread Harsh J
Rita, Yes. You may also `hadoop dfs -setrep -R N /the/older/dirs` to change their replication factor as well. On 23-Nov-2011, at 6:05 AM, Rita wrote: Hello, I am using hbase and I have a default replication factor of 2. Now, if I change the directory replication factor will all the new

Re: Matrix multiplication in Hadoop

2011-11-22 Thread Mike Spreitzer
I am looking at large dense matrix multiplication as an example problem for a class of middleware. I am also interested in sparse matrices, but am taking things one step at a time. There is a paper in IEEE CloudCom '10 about Hama, including a matrix multiplication technique. It is