Re: BSP Task Input/InputSplit Filename

Edward J. Yoon Wed, 22 May 2013 01:19:42 -0700

Good luck.

BTW, if you have to manage a lot of documents, I think you need to
merge documents into map or sequence file (document ID key and
document value pairs) on HDFS. Apache Nutch will be helpful. Then, you
can create a inverted index MR program by editing few lines of the
word-count MR example.


On Wed, May 22, 2013 at 4:42 PM, Steven van Beelen <[email protected]> wrote:
> For a project I'm trying to implement an Inverted Indexing algorithm, which
> has a 'term' and 'postingslist', in which the postings list consists of a
> 'document id' and 'payload' (in my case term frequency per document).
> I was thinking of inserting multiple different documents and taking the
> filename as documentID, hence the necessity.
> But I've found a way to work around this problem of mine by using different
> input which does not require the filename to be retrievable in a BSP task.
>
> If I will be needing it later on in my project and am working on it, I'll
> let you know.
>
> Thanks for the help thus far!
>
>
>
> On Wed, May 22, 2013 at 1:16 AM, Edward J. Yoon <[email protected]>wrote:
>
>> Hi,
>>
>> Short answer is no, we don't provide API for what you are trying to do.
>>
>> However, it can be added easily. See BSPPeerImpl.initInput() method,
>> InputSplit interface and FileSplit classes.
>>
>> Why do you need that function? If there's reasonable necessity, Let's
>> add it together.
>>
>> On Tue, May 21, 2013 at 7:04 PM, Steven van Beelen <[email protected]>
>> wrote:
>> > Hi all,
>> >
>> > The title says it: is there a way to retrieve the filename of the
>> > input/inputsplit a BSP Task is working on? I've been looking for some
>> time
>> > in the docs and source files, but cannot seem to find if one is able to
>> > retrieve the filename/pathname from the input used.
>> >
>> > Cheers
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: BSP Task Input/InputSplit Filename

Reply via email to