Is it possible? I want to group data blocks.

2009-06-23 Thread Hyunsik Choi
Hi all,

I would like to give data locality. In other words, I want to place
certain data blocks on one machine. In some problems, subsets of an
entire dataset need one another for answer. Most of the graph problems
are good examples.

Is it possible? If impossible, can you advice about that?

Thank you in advance.

- Hyunsik Choi -


Why doesn't hadoop invoke the topology script with only IP address as the argument?

2009-01-05 Thread Hyunsik Choi
I tried to use the topology script defined by
topology.script.file.name property in order for rack awareness. After
setting and restart hadoop, I ran the wordcount example to test rack
awareness. However, some hosts are assigned to valid rack IDs produced
by ip address, but others are assigned to invalid rack IDs produced by
hostnames.

In order to find the problem, I added dump code to the topology script
file in order to store all given arguments to certain file.
As a result, I knew that in most times the topology script file is
called by IP address but in some case it is called by hostname.

Is it correct?

--
Hyunsik Choi
Database & Information Systems Group,
Korea University


Re: Block placement in HDFS

2008-11-25 Thread Hyunsik Choi
Hi All,

I try to divide some data into partitions explicitly (like regions of
Hbase). I wonder the following way to do is the best method.

For example, when we assume a block size 64MB, a file potion
corresponding to 0~63MB is allocated to first block?

I have three questions as follows:

Is the above method valid?
Is it the best method?
Is there alternative method?

Thank in advance.

-- 
Hyunsik Choi
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University


On Mon, 2008-11-24 at 20:44 -0800, Mahadev Konar wrote:
> Hi Dennis,
>   I don't think that is possible to do.  The block placement is determined
> by HDFS internally (which is local, rack local and off rack).
> 
> 
> mahadev
> 
> 
> On 11/24/08 6:59 PM, "dennis81" <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Hi everyone,
> > 
> > I was wondering whether it is possible to control the placement of the
> > blocks of a file in HDFS. Is it possible to instruct HDFS about which nodes
> > will hold the block replicas?
> > 
> > Thanks!




Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Hyunsik Choi
Although we proposed the system for RDF data, we actually are
considering more general system for graph data model. Actually, many
data in real world can be represented graph data model. In particular,
besides web data some data domains (i.e., biological data, chemical
data, social networks, and so on) are rather represented as graph data.

What do you think about that?

-- 
Hyunsik Choi
Database & Information Systems Lab, Korea University


Edward J. Yoon wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
>   



Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Hyunsik Choi
Hi Colin,

I'm a member of RDF proposal. I have one question as to Metaweb. Do you
intend to make Metaweb open source?

Hyunsik Choi

On Mon, 2008-10-20 at 18:23 -0700, Colin Evans wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat 
> files, and have written a simple query language and planner that 
> executes the queries with chained map-reduce jobs.  This approach works 
> well for warehousing triple data, and doesn't require HBase.  Queries 
> may take a few minutes to execute, but the system scales for very large 
> datasets and result sets because it doesn't try to resolve queries in 
> memory.  We're currently testing with more than 150MM triples and have 
> been happy with the results.
> 
> -Colin
> 
> 
> Edward J. Yoon wrote:
> > Hi all,
> >
> > This RDF proposal is a good long time ago. Now we'd like to settle
> > down to research again. I attached our proposal, We'd love to hear
> > your feedback & stories!!
> >
> > Thanks.
> >   
> 
-- 
-
Hyunsik Choi (Ph.D Student)

Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea

TEL : +82-2-3290-3580
-



Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

2008-10-20 Thread Hyunsik Choi
Hi Colin,

I'm a member of RDF proposal. I have one question as to Metaweb. Do
you (or your company) have a plan to make Metaweb to be open source?

Hyunsik Choi

-
Hyunsik Choi (Ph.D Student)

Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea

TEL : +82-2-3290-3580
-

On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <[EMAIL PROTECTED]> wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat files,
> and have written a simple query language and planner that executes the
> queries with chained map-reduce jobs.  This approach works well for
> warehousing triple data, and doesn't require HBase.  Queries may take a few
> minutes to execute, but the system scales for very large datasets and result
> sets because it doesn't try to resolve queries in memory.  We're currently
> testing with more than 150MM triples and have been happy with the results.
>
> -Colin
>
>
> Edward J. Yoon wrote:
>>
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>
>