Hi Tri, The link to download DBPedia data is http://wiki.dbpedia.org/Downloads36 . There might be some issues with DBPedia servers but I think that will be sorted out and we might be able to get data to download. ( this might take a day or 2 to get corrected )
Once I can download I should be able to help you more in getting DBPedia data to create training set for you. Regarding using Google search engine, might be a good option, but I have never tried myself as it will involve html parsing and other stuff. However I had in past used Yahoo's WebSearch API ( http://developer.yahoo.com/search/web/V1/webSearch.html ) where you would get some description of the query term and would not involve writing html parser. Let me know if any of the above things helps you. --Thanks and Regards Vaijanath N. Rao -----Original Message----- From: Tri Nguyen [mailto:[email protected]] Sent: Wednesday, June 15, 2011 2:16 PM To: [email protected] Subject: Re: Hotel Name model Hi Vaijanath, It means that the title is the name of a hotel and you try to find the sentences containing that name to be train data line, am I correct? Can we get the urls of the article in DBPedia? I am sorry to ask you so much because I don't know about DBPedia. Since we can not download data from DBPedia, can we choose the hotel names and query to Google to collect the top pages to be data sets? But I think this way is not high precision. Thanks for your explanation, Nguyen Van Tri. On Wed, Jun 15, 2011 at 3:21 PM, Rao, Vaijanath <[email protected]>wrote: > Hi Tri, > > The link of DBPedia says that it identified hotel, now if we parse the > DBPedia data and get only those elements which have Hotel as it class > ( Or parent class) we can then mark that data for training. So Each of > the article in DBPedia will have title and description, So in worst > case we can look for title in the description and mark that entity name for > training. > > For some reason DBPedia is not allowing me to download data. But Once > I get it to download I will able to code the wrapper from DBPedia to > OpenNLP in couple of days time. > > --Thanks and Regards > Vaijanath N. Rao > > -----Original Message----- > From: Tri Nguyen [mailto:[email protected]] > Sent: Wednesday, June 15, 2011 12:57 PM > To: [email protected] > Subject: Re: Hotel Name model > > Hi Vaijanath, > > Thanks so much for your reply. At first I think I can make a Hotel > model like the Job Title model which is described in chapter 6 of the > book Introduction to Linguistic Annotation an Text Analytics. But it > is difficult to me to choose the right corpus to build the train data. > Because Hotel is a sub class of the Organization class ( > http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), > I think I can get the corpus of Organization model and remove the > non-hotel train data to be train data for Hotel model?. But, I don't > know what is the corpus to build Organization model? Could you show to me > what is it? > > Could you please explain more detail on your link? You mean that we > can collect Hotel names and build a train data? I see a large list > hotel names at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it > helpful to us to build train data? > > Thanks so much for your patience to read long question, > > Nguyen Van Tri. > > > On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath > <[email protected]>wrote: > > > Hi Tri, > > > > You can try Model similar to Organization and you would need some > > training data for Hotel. You can start looking at DBPedia data as > > initial Sample data. > > > > http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is > > Hotel ontology ). If there is a larger interest I can work on > > contibuting DBPedia Data as training set for a particular type. > > > > > > --Thanks and Regards > > Vaijanath N. Rao > > > > ________________________________________ > > From: Tri Nguyen [[email protected]] > > Sent: Wednesday, June 15, 2011 08:33 > > To: [email protected] > > Subject: Hotel Name model > > > > Hi, > > > > Could somebody guide me how to build a Hotel Name model? > > > > Thanks, > > Tri. > > >
