Is it possible? I want to group data blocks.
Hi all, I would like to give data locality. In other words, I want to place certain data blocks on one machine. In some problems, subsets of an entire dataset need one another for answer. Most of the graph problems are good examples. Is it possible? If impossible, can you advice about that? Thank you in advance. - Hyunsik Choi -
Why doesn't hadoop invoke the topology script with only IP address as the argument?
I tried to use the topology script defined by topology.script.file.name property in order for rack awareness. After setting and restart hadoop, I ran the wordcount example to test rack awareness. However, some hosts are assigned to valid rack IDs produced by ip address, but others are assigned to invalid rack IDs produced by hostnames. In order to find the problem, I added dump code to the topology script file in order to store all given arguments to certain file. As a result, I knew that in most times the topology script file is called by IP address but in some case it is called by hostname. Is it correct? -- Hyunsik Choi Database & Information Systems Group, Korea University
Re: Block placement in HDFS
Hi All, I try to divide some data into partitions explicitly (like regions of Hbase). I wonder the following way to do is the best method. For example, when we assume a block size 64MB, a file potion corresponding to 0~63MB is allocated to first block? I have three questions as follows: Is the above method valid? Is it the best method? Is there alternative method? Thank in advance. -- Hyunsik Choi Database & Information Systems Group Dept. of Computer Science & Engineering, Korea University On Mon, 2008-11-24 at 20:44 -0800, Mahadev Konar wrote: > Hi Dennis, > I don't think that is possible to do. The block placement is determined > by HDFS internally (which is local, rack local and off rack). > > > mahadev > > > On 11/24/08 6:59 PM, "dennis81" <[EMAIL PROTECTED]> wrote: > > > > > Hi everyone, > > > > I was wondering whether it is possible to control the placement of the > > blocks of a file in HDFS. Is it possible to instruct HDFS about which nodes > > will hold the block replicas? > > > > Thanks!
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Although we proposed the system for RDF data, we actually are considering more general system for graph data model. Actually, many data in real world can be represented graph data model. In particular, besides web data some data domains (i.e., biological data, chemical data, social networks, and so on) are rather represented as graph data. What do you think about that? -- Hyunsik Choi Database & Information Systems Lab, Korea University Edward J. Yoon wrote: > Hi all, > > This RDF proposal is a good long time ago. Now we'd like to settle > down to research again. I attached our proposal, We'd love to hear > your feedback & stories!! > > Thanks. >
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Hi Colin, I'm a member of RDF proposal. I have one question as to Metaweb. Do you intend to make Metaweb open source? Hyunsik Choi On Mon, 2008-10-20 at 18:23 -0700, Colin Evans wrote: > Hi Edward, > At Metaweb, we're experimenting with storing raw triples in HDFS flat > files, and have written a simple query language and planner that > executes the queries with chained map-reduce jobs. This approach works > well for warehousing triple data, and doesn't require HBase. Queries > may take a few minutes to execute, but the system scales for very large > datasets and result sets because it doesn't try to resolve queries in > memory. We're currently testing with more than 150MM triples and have > been happy with the results. > > -Colin > > > Edward J. Yoon wrote: > > Hi all, > > > > This RDF proposal is a good long time ago. Now we'd like to settle > > down to research again. I attached our proposal, We'd love to hear > > your feedback & stories!! > > > > Thanks. > > > -- - Hyunsik Choi (Ph.D Student) Laboratory of Prof. Yon Dohn Chung Database & Information Systems Group Dept. of Computer Science & Engineering, Korea University 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea TEL : +82-2-3290-3580 -
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Hi Colin, I'm a member of RDF proposal. I have one question as to Metaweb. Do you (or your company) have a plan to make Metaweb to be open source? Hyunsik Choi - Hyunsik Choi (Ph.D Student) Laboratory of Prof. Yon Dohn Chung Database & Information Systems Group Dept. of Computer Science & Engineering, Korea University 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea TEL : +82-2-3290-3580 - On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <[EMAIL PROTECTED]> wrote: > Hi Edward, > At Metaweb, we're experimenting with storing raw triples in HDFS flat files, > and have written a simple query language and planner that executes the > queries with chained map-reduce jobs. This approach works well for > warehousing triple data, and doesn't require HBase. Queries may take a few > minutes to execute, but the system scales for very large datasets and result > sets because it doesn't try to resolve queries in memory. We're currently > testing with more than 150MM triples and have been happy with the results. > > -Colin > > > Edward J. Yoon wrote: >> >> Hi all, >> >> This RDF proposal is a good long time ago. Now we'd like to settle >> down to research again. I attached our proposal, We'd love to hear >> your feedback & stories!! >> >> Thanks. >> > >