Re: Hadoop for real time

2008-10-20 Thread Ted Dunning
Hadoop may not be quite what you want for this. You could definitely use Hadopo for storage and streaming. You can also do various kinds of processing on hadoop. But because Hadoop is primarily intended for batch style operations, there is a bit of an assumption that some administrative tasks

Re: If I use the third party jar file, where will I put the file?

2008-10-20 Thread δΈε…‰εŽ
$Hadoop_Home/lib 2008/10/20 imcaptor [EMAIL PROTECTED] -- --~--~-~--~~~---~--~~ Guanghua Ding My research interests include distributed computing, cloud-computing, HPC, and Data mining.

If I use the third party jar file, where will I put the file?

2008-10-20 Thread imcaptor

Re: Distributed cache Design

2008-10-20 Thread Ted Dunning
I was very surprised by this as well. I was doing variants on all-pairs shortest paths and found that the best representation really was triples containing from-node, to-node and distance. The nice side of this is that you get scaling like you wouldn't believe (subject to big-omega, of course)

Re: Does anybody have tried to setup a cluster with multiple namenodes?

2008-10-20 Thread Alex Loddengaard
I believe the common practice is to have a secondary namenode, which by default is enabled. Secondary namenodes serve the purpose of having a redundant backup. However, as far as I'm aware, they are not hot swappable. This means that if your namenode fails, then your cluster will go down until

mysql in hadoop

2008-10-20 Thread Deepak Diwakar
Hi all, I am sure someone must have tried mysql connection using hadoop. But I am getting problem. Basically I am not getting how to inlcude classpath of jar of jdbc connector in the run command of hadoop or is there any other way so that we can incorporate jdbc connector jar into the main jar

Re: mysql in hadoop

2008-10-20 Thread Sandeep Dey
Hi Deepak, I can suggest a crude solution :) that might work . you can extract the jdbc connecter jar and copy the classes from there to ur class/build directory. this way the jdbc driver classes are included in the final jar. Hope that helps, sandeep On Mon, 20 Oct 2008, Deepak Diwakar

Re: How do I implement a Writable into another Writable?

2008-10-20 Thread Chris Douglas
TupleWritable is not a general-purpose type. It's used for map-side joins, where the arity of a tuple is fixed by construction. Its intent is a transient type with very, very specific applications in mind. It sounds like you don't need a general list type, as you don't need to worry about

Re: Does anybody have tried to setup a cluster with multiple namenodes?

2008-10-20 Thread Chris Douglas
The secondary namenode is neither a backup service for the HDFS namespace nor a failover for requests: http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+NameNode The secondary namenode periodically merges an image (FSImage) of the namesystem with recent changes

Re: Hadoop for real time

2008-10-20 Thread Stas Oskin
Hi Ted. Thanks for sharing some of inner workings of Veoh, which btw I'm a frequent user of (or at least when time permits :) ). I indeed recall reading somewhere that Veoh used a heavily modified version of MogileFS, but have switched since as it wasn't ready enough for Veoh needs. If not

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Colin Evans
Hi Edward, At Metaweb, we're experimenting with storing raw triples in HDFS flat files, and have written a simple query language and planner that executes the queries with chained map-reduce jobs. This approach works well for warehousing triple data, and doesn't require HBase. Queries may

Re: How do I implement a Writable into another Writable?

2008-10-20 Thread Chris Douglas
On Oct 20, 2008, at 6:43 PM, Yih Sun Khoo wrote: Thanks Chris and Joman for your detailed explanations. Would this be a good example of using a shallow copy? Also I'm trying to wrap my head around why the shallow copy is needed. You mentioned it is to eliminate any state from the values

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Hyunsik Choi
Hi Colin, I'm a member of RDF proposal. I have one question as to Metaweb. Do you (or your company) have a plan to make Metaweb to be open source? Hyunsik Choi - Hyunsik Choi (Ph.D Student) Laboratory of Prof. Yon Dohn Chung

Re: How do I implement a Writable into another Writable?

2008-10-20 Thread Yih Sun Khoo
Awesome! One question about the example you gave me. When you say clears the collection, the expected value should just be B0, B1 right? Because the collection gets cleared of the old contained value A0...A2. MyWritable foo = new MyWritable(); // foo contains A0, A1, A2 in its namelist

Re: Problems running the Hadoop Quickstart

2008-10-20 Thread Amareshwari Sriramadasu
Has your task-tracker started? I mean, do you see non-zero nodes on your job tracker UI? -Amareshwari John Babilon wrote: Hello, I've been trying to get Hadoop up and running on a Windows Desktop running Windows XP. I've installed Cygwin and Hadoop. I run the start-all.sh script, it

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Colin Evans
We've got a lot of open source projects related to Hadoop and to our graph data available at http://research.freebase.com, but we aren't planning on open sourcing our graph processing work around Hadoop yet. Hyunsik Choi wrote: Hi Colin, I'm a member of RDF proposal. I have one question

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Edward J. Yoon
Oh, I remember freebase.com which are mentioned by barney pell (powerset CTO) at our company (NHN, corp) meeting. Hmm, The two approaches seem slightly different. However, I hope we can work together in the near future if it possible. /Edward On Tue, Oct 21, 2008 at 1:41 PM, Colin Evans [EMAIL

Re: Problems running the Hadoop Quickstart

2008-10-20 Thread Alex Loddengaard
Have you looked at your logs yet? You should look at your logs and post any errors or warnings. Alex On Mon, Oct 20, 2008 at 8:29 PM, Amareshwari Sriramadasu [EMAIL PROTECTED] wrote: Has your task-tracker started? I mean, do you see non-zero nodes on your job tracker UI? -Amareshwari

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Ted Dunning
At Veoh the recommendation data amounts to many billions of (roughly) these triples and this approach works very well indeed, even on tiny development clusters. On Mon, Oct 20, 2008 at 6:23 PM, Colin Evans [EMAIL PROTECTED] wrote: Hi Edward, At Metaweb, we're experimenting with storing raw