date:20111122

Re: hadoop File loading

2011-11-22 Thread Jeff Zhang

It will work as long as you consider the xml tag boundary in your RecordReader. On Tue, Nov 22, 2011 at 9:20 AM, hari708 hari...@gmail.com wrote: Hi, I have a big file consisting of XML data.the XML is not represented as a single line in the file. if we stream this file using ./hadoop dfs

maven for hadoop

2011-11-22 Thread mohmmadanis moulavi

Hi all, How to create maven local repository for hadoop. I want to work with ant to build the code on machine which has proxy, what I can i do for it? please reply Thanks Regards, Mohmmadanis Moulavi

RE: Hadoop Metrics

2011-11-22 Thread Ravi teja ch n v

Hi Paolo, If you are using versions later than 23.0, then you can refer to the Hadoop Definitive Guide 2nd Edition, The metrics in the latest versions have changed a little bit,and documentation is awaited for Next Gen Mapreduce. Regards, Ravi Teja

Re: Regarding loading a big XML file to HDFS

2011-11-22 Thread Joey Echeverria

If your file is bigger than a block size (typically 64mb or 128mb), then it will be split into more than one block. The blocks may or may not be stored on different datanodes. If you're using a default InputFormat, then the input will be split between two task. Since you said you need the whole

Re: Regarding loading a big XML file to HDFS

2011-11-22 Thread Mridul Muralidharan

You cannot determine start of an xml document from a collection of xml documents (in the dfs file) if you start at some arbitrary point within it the collection (unless some data specific hints are used). Regards, Mridul On Tuesday 22 November 2011 09:28 AM, Michael Segel wrote: Just

Build failure

2011-11-22 Thread Oscar Kene

Hello, I'm a fairly novice hadoop user. I'm trying to compile on a completely newly installed debian squeeze machine. Getting the following error. I'm leaning to the repositories not being what is expected. Is this a problem on my side? [INFO]

Re: part-* output sorted?

2011-11-22 Thread Harsh J

Unless you use a TotalOrderPartitioner, the outputs are only sorted per partition file. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html for achieving a total order sort output, which would give you what you want. On Tue, Nov 22, 2011

Capacity Planning using Dfsadmin command

2011-11-22 Thread Abhishek Pratap Singh

Hi, I have a query about hadoop, this is very important for capacity planning and requesting new hardware. Hadoop command: *hadoop dfsadmin –report* depicts the DFS usage and DFS remaning. I have checked that the DFS usage does not match the size of *hadoop dfs –dus* on root or user

Re: Hadoop next stable release

2011-11-22 Thread Konstantin Boudnik

We are expecting to release 0.22 very shortly. 0.22 is suppose to be considered stable because it has been heavily tested at scale by eBay team (as far as I know). However, I will let 0.22's RM to comment on that. Cos On Tue, Nov 22, 2011 at 12:05PM, Niranjan Balasubramanian wrote: Hello We

How is network distance for nodes calculated

2011-11-22 Thread Edmon Begoli

I am reading Hadoop Definitive Guide 2nd Edition and I am struggling to figure out the exact Hadoop's formula for network distance calculation (page 64/65). (I have my guesses, but I would like to know the exact formula) There is an example showing following distances: For example, imagine a

Re: part-* output sorted?

2011-11-22 Thread Leon Mergen

Hello, On Tue, Nov 22, 2011 at 3:30 PM, Harsh J ha...@cloudera.com wrote: Unless you use a TotalOrderPartitioner, the outputs are only sorted per partition file. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html for achieving a

Re: Hadoop next stable release

2011-11-22 Thread Arun C Murthy

On Nov 22, 2011, at 12:05 PM, Niranjan Balasubramanian wrote: Hello We are currently using hadoop 0.20.203 on a 10 node cluster. We are considering upgrading to a newer version and I have two questions in this regard. 1) It seems 0.21 is unlikely to become a stable release anytime

Re: Hadoop next stable release

2011-11-22 Thread sridhar basam

On Tue, Nov 22, 2011 at 3:05 PM, Niranjan Balasubramanian niran...@cs.washington.edu wrote: Hello We are currently using hadoop 0.20.203 on a 10 node cluster. We are considering upgrading to a newer version and I have two questions in this regard. 1) It seems 0.21 is unlikely to become a

replication question

2011-11-22 Thread Rita

Hello, I am using hbase and I have a default replication factor of 2. Now, if I change the directory replication factor will all the new files being created there be automatically be replicated as 3? -- --- Get your facts first, then you can distort them as you please.--

Re: replication question

2011-11-22 Thread Harsh J

Rita, Yes. You may also `hadoop dfs -setrep -R N /the/older/dirs` to change their replication factor as well. On 23-Nov-2011, at 6:05 AM, Rita wrote: Hello, I am using hbase and I have a default replication factor of 2. Now, if I change the directory replication factor will all the new

Re: Matrix multiplication in Hadoop

2011-11-22 Thread Mike Spreitzer

I am looking at large dense matrix multiplication as an example problem for a class of middleware. I am also interested in sparse matrices, but am taking things one step at a time. There is a paper in IEEE CloudCom '10 about Hama, including a matrix multiplication technique. It is

Re: hadoop File loading

maven for hadoop

RE: Hadoop Metrics

Re: Regarding loading a big XML file to HDFS

Re: Regarding loading a big XML file to HDFS

Build failure

Re: part-* output sorted?

Capacity Planning using Dfsadmin command

Re: Hadoop next stable release

How is network distance for nodes calculated

Re: part-* output sorted?

Re: Hadoop next stable release

Re: Hadoop next stable release

replication question

Re: replication question

Re: Matrix multiplication in Hadoop

16 matches

Site Navigation

Mail list logo

Footer information