It will work as long as you consider the xml tag boundary in your
RecordReader.
On Tue, Nov 22, 2011 at 9:20 AM, hari708 hari...@gmail.com wrote:
Hi,
I have a big file consisting of XML data.the XML is not represented as a
single line in the file. if we stream this file using ./hadoop dfs
Hi all,
How to create maven local repository for hadoop.
I want to work with ant to build the code on machine which has proxy,
what I can i do for it?
please reply
Thanks Regards,
Mohmmadanis Moulavi
Hi Paolo,
If you are using versions later than 23.0, then you can refer to the Hadoop
Definitive Guide 2nd Edition,
The metrics in the latest versions have changed a little bit,and documentation
is awaited for Next Gen Mapreduce.
Regards,
Ravi Teja
If your file is bigger than a block size (typically 64mb or 128mb), then it
will be split into more than one block. The blocks may or may not be stored on
different datanodes. If you're using a default InputFormat, then the input will
be split between two task. Since you said you need the whole
You cannot determine start of an xml document from a collection of xml
documents (in the dfs file) if you start at some arbitrary point within
it the collection (unless some data specific hints are used).
Regards,
Mridul
On Tuesday 22 November 2011 09:28 AM, Michael Segel wrote:
Just
Hello,
I'm a fairly novice hadoop user. I'm trying to compile on a completely
newly installed debian squeeze machine. Getting the following error. I'm
leaning to the repositories not being what is expected. Is this a problem
on my side?
[INFO]
Unless you use a TotalOrderPartitioner, the outputs are only sorted
per partition file.
See
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html
for achieving a total order sort output, which would give you what you
want.
On Tue, Nov 22, 2011
Hi,
I have a query about hadoop, this is very important for capacity planning
and requesting new hardware.
Hadoop command: *hadoop dfsadmin –report* depicts the DFS usage and
DFS remaning. I have checked that the DFS usage does not match the
size of *hadoop
dfs –dus* on root or user
We are expecting to release 0.22 very shortly. 0.22 is suppose to be
considered stable because it has been heavily tested at scale by eBay team
(as far as I know). However, I will let 0.22's RM to comment on that.
Cos
On Tue, Nov 22, 2011 at 12:05PM, Niranjan Balasubramanian wrote:
Hello
We
I am reading Hadoop Definitive Guide 2nd Edition and I am struggling
to figure out the exact
Hadoop's formula for network distance calculation (page 64/65). (I
have my guesses, but I would like to know the exact formula)
There is an example showing following distances:
For example, imagine a
Hello,
On Tue, Nov 22, 2011 at 3:30 PM, Harsh J ha...@cloudera.com wrote:
Unless you use a TotalOrderPartitioner, the outputs are only sorted
per partition file.
See
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/TotalOrderPartitioner.html
for achieving a
On Nov 22, 2011, at 12:05 PM, Niranjan Balasubramanian wrote:
Hello
We are currently using hadoop 0.20.203 on a 10 node cluster. We are
considering upgrading to a newer version and I have two questions in this
regard.
1) It seems 0.21 is unlikely to become a stable release anytime
On Tue, Nov 22, 2011 at 3:05 PM, Niranjan Balasubramanian
niran...@cs.washington.edu wrote:
Hello
We are currently using hadoop 0.20.203 on a 10 node cluster. We are
considering upgrading to a newer version and I have two questions in this
regard.
1) It seems 0.21 is unlikely to become a
Hello,
I am using hbase and I have a default replication factor of 2. Now, if I
change the directory replication factor will all the new files being
created there be automatically be replicated as 3?
--
--- Get your facts first, then you can distort them as you please.--
Rita,
Yes.
You may also `hadoop dfs -setrep -R N /the/older/dirs` to change their
replication factor as well.
On 23-Nov-2011, at 6:05 AM, Rita wrote:
Hello,
I am using hbase and I have a default replication factor of 2. Now, if I
change the directory replication factor will all the new
I am looking at large dense matrix multiplication as an example problem
for a class of middleware. I am also interested in sparse matrices, but
am taking things one step at a time.
There is a paper in IEEE CloudCom '10 about Hama, including a matrix
multiplication technique. It is
16 matches
Mail list logo