Re: nutch-selenium help

Mattmann, Chris A (3980) Tue, 12 Apr 2016 22:30:54 -0700

Hi, the plugin is now part of Nutch, so you don’t need to use the
Github one and can you show me the wiki page by linking to it since
it’s likely out of date..


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 4/12/16, 10:29 PM, "Sabah Sajjad Khan" <[email protected]> wrote:

>The link that i provided is the same as the one on the wiki page.
>
>> On Apr 13, 2016, at 1:13 AM, Mattmann, Chris A (3980) 
>> <[email protected]> wrote:
>> 
>> Please use the selenium plugin that is part of Nutch and described
>> on the wiki in the Advanced Ajax Interaction section.
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 4/12/16, 9:38 PM, "Sabah Sajjad Khan" <[email protected]> wrote:
>> 
>>> Hello,
>>> 
>>> 
>>> I am very new to nutch and am having issues crawling to receive the content 
>>> that i need. i am crawling electronic part websites to see prices but when 
>>> using readdb to dump i don't see all the data under content. I have 
>>> attached the dump file.
>>> 
>>> 
>>> 
>>> 
>>> My setup is nutch with selenium using this link 
>>> https://github.com/momer/nutch-selenium 
>>> <https://github.com/momer/nutch-selenium> but i don't use the last 
>>> command(bin/crawl) because i am not using solr. selenium seems to be 
>>> working as well as the headless browser but it just doesn't seem to extract 
>>> any data. any help would be appreciated. Like
>>> i said i'm very new so if there is any other information i could provide to 
>>> help understand my problem let me know or let me know how i could track my 
>>> problem.
>>> 
>>> 
>>> Thank you in advance.
>>> 
>>> 
>

Re: nutch-selenium help

Reply via email to