Hi Vaijanath, Thanks so much for your help. It makes me more clearly. I will study the thing in your link. Yes, I have been using the Yahoo Search Boss, we can retrieve the text by Apache Tika. But I think it takes a lot of time to manually to tag the Hotel name as it require at least 15000 sentences.
Thanks so much, Nguyen Van Tri. On Wed, Jun 15, 2011 at 4:14 PM, Rao, Vaijanath <[email protected]>wrote: > Hi Tri, > > The link to download DBPedia data is http://wiki.dbpedia.org/Downloads36 . > There might be some issues with DBPedia servers but I think that will be > sorted out and we might be able to get data to download. ( this might take > a day or 2 to get corrected ) > > Once I can download I should be able to help you more in getting DBPedia > data to create training set for you. > > Regarding using Google search engine, might be a good option, but I have > never tried myself as it will involve html parsing and other stuff. However > I had in past used Yahoo's WebSearch API ( > http://developer.yahoo.com/search/web/V1/webSearch.html ) where you would > get some description of the query term and would not involve writing html > parser. > > Let me know if any of the above things helps you. > > --Thanks and Regards > Vaijanath N. Rao > > -----Original Message----- > From: Tri Nguyen [mailto:[email protected]] > Sent: Wednesday, June 15, 2011 2:16 PM > To: [email protected] > Subject: Re: Hotel Name model > > Hi Vaijanath, > > It means that the title is the name of a hotel and you try to find the > sentences containing that name to be train data line, am I correct? Can we > get the urls of the article in DBPedia? I am sorry to ask you so much > because I don't know about DBPedia. > Since we can not download data from DBPedia, can we choose the hotel names > and query to Google to collect the top pages to be data sets? But I think > this way is not high precision. > > Thanks for your explanation, > Nguyen Van Tri. > > On Wed, Jun 15, 2011 at 3:21 PM, Rao, Vaijanath > <[email protected]>wrote: > > > Hi Tri, > > > > The link of DBPedia says that it identified hotel, now if we parse the > > DBPedia data and get only those elements which have Hotel as it class > > ( Or parent class) we can then mark that data for training. So Each of > > the article in DBPedia will have title and description, So in worst > > case we can look for title in the description and mark that entity name > for training. > > > > For some reason DBPedia is not allowing me to download data. But Once > > I get it to download I will able to code the wrapper from DBPedia to > > OpenNLP in couple of days time. > > > > --Thanks and Regards > > Vaijanath N. Rao > > > > -----Original Message----- > > From: Tri Nguyen [mailto:[email protected]] > > Sent: Wednesday, June 15, 2011 12:57 PM > > To: [email protected] > > Subject: Re: Hotel Name model > > > > Hi Vaijanath, > > > > Thanks so much for your reply. At first I think I can make a Hotel > > model like the Job Title model which is described in chapter 6 of the > > book Introduction to Linguistic Annotation an Text Analytics. But it > > is difficult to me to choose the right corpus to build the train data. > > Because Hotel is a sub class of the Organization class ( > > http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), > > I think I can get the corpus of Organization model and remove the > > non-hotel train data to be train data for Hotel model?. But, I don't > > know what is the corpus to build Organization model? Could you show to me > what is it? > > > > Could you please explain more detail on your link? You mean that we > > can collect Hotel names and build a train data? I see a large list > > hotel names at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it > > helpful to us to build train data? > > > > Thanks so much for your patience to read long question, > > > > Nguyen Van Tri. > > > > > > On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath > > <[email protected]>wrote: > > > > > Hi Tri, > > > > > > You can try Model similar to Organization and you would need some > > > training data for Hotel. You can start looking at DBPedia data as > > > initial Sample data. > > > > > > http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is > > > Hotel ontology ). If there is a larger interest I can work on > > > contibuting DBPedia Data as training set for a particular type. > > > > > > > > > --Thanks and Regards > > > Vaijanath N. Rao > > > > > > ________________________________________ > > > From: Tri Nguyen [[email protected]] > > > Sent: Wednesday, June 15, 2011 08:33 > > > To: [email protected] > > > Subject: Hotel Name model > > > > > > Hi, > > > > > > Could somebody guide me how to build a Hotel Name model? > > > > > > Thanks, > > > Tri. > > > > > >
