how to sort the output by vlaue in reduce instead of by key?

2011-04-11 Thread leibnitz
yes,my key is ip,and value is a object(which inherited hadoop Record class,and will be converted a visualized data),e.g.: key field1,field2,field3(these are properties belong to object) 12.121.23.121 121,11,/img/dd.jpg 32.121.23.222 221,11,/img/xx.jpg 1.i want to sort by field1

Steps in execution time

2011-04-11 Thread Christian Kumpe
Hi, I'm doing some measurement on hadoop's execution time for my theses. I discovered some steps in the jobs execution time when raising the mappers' execution time continuously. Here is a plot of the execution times with 1, 30 and 60 parallel executing mappers:

Re: Architectural question

2011-04-11 Thread sumit ghosh
The original posting said - The app does simple match every line of input data with every line of persistent data. Hence the key should be replaced by a String from the 10 GB store or a hash of it. Hence, we can match it with the hash or String from the persistent Store.

Re: how to sort the output by value in reduce instead of by key?

2011-04-11 Thread leibnitz
can anyone get me a tips ? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-sort-the-output-by-value-in-reduce-instead-of-by-key-tp2805541p2805922.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: how to sort the output by vlaue in reduce instead of by key?

2011-04-11 Thread sumit ghosh
Your field1 data can be split over multiple reducers. Is it possible to emit field1 as the key from the reducer (in case you do not need the ip anymore)? From: leibnitz se3g2...@gmail.com To: hadoop-u...@lucene.apache.org Sent: Mon, 11 April, 2011 12:02:46 PM

how to sort the output by vlaue in reduce instead of by key?

2011-04-11 Thread sumit ghosh
Your field1 data can be split over multiple reducers. Is it possible to emit field1 as the key from the reducer (in case you do not need the ip anymore)? From: leibnitz se3g2...@gmail.com To: hadoop-u...@lucene.apache.org Sent: Mon, 11 April, 2011 12:02:46 PM

Re: INFO org.apache.hadoop.ipc.Server: Error register getProtocolVersion and other errors

2011-04-11 Thread Dieter Plaetinck
Anyone? Anyone at all? I figured out the issue with the jobtracker, but I still have the errors: * Error register getProtocolVersion * File (..) could only be replicated to 0 nodes, instead of 1 as explained in my first mail. The 2nd error can appear without _any_ errors in _any_ of the

What's the matter: problem cleaning system directory: null

2011-04-11 Thread 杨杰
HI, Some mr tasks upon small files had run on our hadoop cluster these days, with not such high work load. While when i check it tonight, the cluster refused to response. So i restart the hdfs mapred. BUT unexcepted exceptions were thrown when starting~ and even the hadoop fs -ls command could

Re: What's the matter: problem cleaning system directory: null

2011-04-11 Thread 杨杰
I found it similiar to HADDOP-3027(https://issues.apache.org/jira/browse/HADOOP-3027), but its error msg is: org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: /tmp/hadoop/mapred/system while mine is: org.apache.hadoop.mapred.JobTracker: problem cleaning system directory:

Re: how to sort the output by value in reduce instead of by key?

2011-04-11 Thread Josh Patterson
Leibnitz, I think you are looking for secondary sort in this case where the data arrives in some sort of order at the reducer as opposed to in a group by key. Is that the case? For a look at secondary sort I've got a few blog articles:

Re: Reg HDFS checksum

2011-04-11 Thread Josh Patterson
Thamizh, For a much older project I wrote a demo tool that computed the hadoop style checksum locally: https://github.com/jpatanooga/IvoryMonkey Checksum generator is a single threaded replica of Hadoop's internal Distributed hash-checksum mechanic. What its actually doing is saving the CRC32

Re: Birthday Calendar

2011-04-11 Thread Stephen Boesch
Forum moderator: pls mark emails from this user as spam. 2011/4/10 Tiru Murugan veera.tirumurugan...@gmail.com Hi I am creating a birthday calendar of all my friends and family. Can you please click on the link below to enter your birthday for me?

Re: Shared lib?

2011-04-11 Thread 顾荣
Hi Mark, I also met your problem,I found my way finally. Firstly,your basic idea is right,we need to move these jars in to HDFS,because files in HDFS are shared by all the node automatically. So,There seem to be two solutions here. solution: a)After you export your project as a jar,you add a

Re: Architectural question

2011-04-11 Thread Mehmet Tepedelenlioglu
That is how I interpreted it, but if by simple some other matching function then the most obvious one is meant, then it still is possible to extend theText class and overwrite the hashCode and equals functions to accommodate for this new sort of equality. On Apr 11, 2011, at 1:41 AM, sumit ghosh

RE: Shared lib?

2011-04-11 Thread Kevin.Leach
It seems like -libjars is for CLASSPATH only. To affect changes to LIBPATH on each node, -archives needs to be used along with a scheme to have each process set it's own LIBPATH, once the -archives are untarred, accordingly. I think the documentation for -libjars could be amended to

Using global reverse lookup tables

2011-04-11 Thread W.P. McNeill
I understand that part of the rules of MapReduce is that there's no shared global information; nevertheless I have a problem that requires shared global information and I'm trying to get a sense of what mechanisms are available to address it. I have a bunch of *sets* built on a vocabulary of

Steps in execution time

2011-04-11 Thread Christian Kumpe
Hi, I'm doing some measurement on hadoop's execution time for my theses. I discovered some steps in the jobs execution time when raising the mappers' execution time continuously. Here is a plot of the execution times with 1, 30 and 60 parallel executing mappers:

ganglia

2011-04-11 Thread malte . ehmke
Hello, I have a hadoop cluster with hbase 0.89 and hive 0.70 on ubuntu lucid 64 bit servers. I want to use ganglia. I did not install it per apt-get Install because this gives me ganglia 3.1 but I need ganglia 3.0.x (http://wiki.apache.org/hadoop/GangliaMetrics). This didn't help me out

Re: Steps in execution time

2011-04-11 Thread abhishek sharma
Christian, The TaskTrackers send heartbeat messages to the JobTracker. The default interval for these messages is 3 seconds. This is one reason why you see the 3 second steps. Abhishek On Mon, Apr 11, 2011 at 3:19 AM, Christian Kumpe christ...@kumpe.de wrote: Hi, I'm doing some measurement

Re: Using global reverse lookup tables

2011-04-11 Thread Ted Dunning
Depending on the function that you want to use, it sounds like you want to use a self join to compute transposed cooccurrence. That is, it sounds like you want to find all the sets that share elements with X. If you have a binary matrix A that represents your set membership with one row per set

Re: Steps in execution time

2011-04-11 Thread Christian Kumpe
Hi Abhishek, thanks for your answer. Thus this is the reason for the 1s and 3s raster in the whole plot. Do you (or someone else) have any ideas what maybe is causing the few outliers downwards? The outliers upwards can be caused by some latencies in the network or in the some of the nodes. No

Memory mapped resources

2011-04-11 Thread Benson Margulies
We have some very large files that we access via memory mapping in Java. Someone's asked us about how to make this conveniently deployable in Hadoop. If we tell them to put the files into hdfs, can we obtain a File for the underlying file on any given node?

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
Yes you can however it will require customization of HDFS. Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. I have been altering it for use with HBASE-3529. Note that the patch noted is for the -append branch which is mainly for HBase. On Mon, Apr 11, 2011 at 3:57

HOD exception: java.io.IOException: No valid local directories in property: mapred.local.dir

2011-04-11 Thread Boyu Zhang
Hi All, I was trying to run the program using HOD on a cluster, when I allocate using 5 nodes, it runs fine, but when I allocate using 6 nodes, everytime I tried to run a program, I get this error: 11/04/11 19:45:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath.

Re: Memory mapped resources

2011-04-11 Thread Edward Capriolo
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yes you can however it will require customization of HDFS.  Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.  I have been altering it for use with HBASE-3529.  Note that the patch

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yes you can however it will require customization

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
What do you mean by local chunk? I think it's providing access to the underlying file block? On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning tdunn...@maprtech.com wrote: Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward

Re: ganglia

2011-04-11 Thread Juwei Shi
You may specify --with-gmetad parameter if you compile ganglia yourself. For example: ./configure --sysconfdir=/etc/ganglia --with-gmetad 2011/4/11 malte.eh...@gmx.de Hello, I have a hadoop cluster with hbase 0.89 and hive 0.70 on ubuntu lucid 64 bit servers. I want to use ganglia. I did

Re: how to sort the output by value in reduce instead of by key?

2011-04-11 Thread leibnitz
thanks all. to : Josh,i think you are right.i have previously tried to use a group key by field1+ip at reduce.but it is failed(not sort). i will check your point:) -- View this message in context:

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: What do you mean by local chunk? I think it's

Retrying connect error while configuring hadoop

2011-04-11 Thread prasunb
Hello, I am trying to configure Hadoop in fully distributed mode on three virtual Fedora machines. During configuring I am not getting any error. Even when I am executing the script start-dfs.sh, there aren't any error. But practically the namenode isn't able to connect the datanodes. These are

Retrying connect to server error while configuring hadoop

2011-04-11 Thread prasunb
Hello, I am trying to configure Hadoop in fully distributed mode on three virtual Fedora machines. During configuring I am not getting any error. Even when I am executing the script start-dfs.sh, there aren't any error. But practically the namenode isn't able to connect the datanodes. These