Using Elasticsearch, Getting LUCENE_36 errors

2015-05-05 Thread Scott Lundgren
. Elasticsearch 1.3.4 was installed onto the master by fetching the debian package from elastic.cohttp://elastic.co and installed via dpkg. The elasticsearch service was started, then I created an index matching the value defined in elastic.index. Scott Lundgren Software Engineer (704) 973-7388 slundg

URL Structure Rounds/Crawl Depth

2015-04-07 Thread Scott Lundgren
/.* does my rounds need to be set to 2 (i.e.: everything under /prnewswire/press_releases/ is crawled ) or 3 (/triangle/prnewswire/press_releases/) Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp

website structure discovery?

2015-03-30 Thread Scott Lundgren
If I want to crawl learn the directory information structure of a website is nutch a good tool for this problem? Would you recommend a different tool? Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp

Re: website structure discovery?

2015-03-30 Thread Scott Lundgren
of the site’s directory/URL structure when a sitemap file is not available. Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp://www.quietstreamfinancial.com 11121 Carmel Commons Boulevard | Suite 250 Charlotte, North Carolina 28226

Re: [MASSMAIL]Re: website structure discovery?

2015-03-30 Thread Scott Lundgren
have a 200+ sites to set up. I’ll trying standing up a separate instance of nutch plus the link-extractor and D3.js solution. Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp://www.quietstreamfinancial.com 11121 Carmel

configure name of index in elasticsearch

2015-03-30 Thread Scott Lundgren
but not index to configure. Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp://www.quietstreamfinancial.com 11121 Carmel Commons Boulevard | Suite 250 Charlotte, North Carolina 28226 Our Portfolio of Commercial Real

url-regexfilter directory based sites

2015-03-25 Thread Scott Lundgren
directories of those subdirectories ? Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp://www.quietstreamfinancial.com 11121 Carmel Commons Boulevard | Suite 250 Charlotte, North Carolina 28226 Our Portfolio

How to verify URLFilterChecker

2015-02-09 Thread Scott Lundgren
the output is --http://www.cabinet.com/cabinet/cabinetobituaries/1054824-435/robert-g.-judy.html If I press enter without providing a URL then the output is (a blank line followed by a dash) - I’m not sure what to expect as a response or if that was passing or failure Scott Lundgren Software

Re: How to verify URLFilterChecker

2015-02-09 Thread Scott Lundgren
-435/robert-g.-judy.html Scott Lundgren Software Engineer (704) 973-7388 slundg...@qsfllc.commailto:slundg...@qsfllc.com QuietStream Financial, LLChttp://www.quietstreamfinancial.com 11121 Carmel Commons Boulevard | Suite 250 Charlotte, North Carolina 28226 Our Portfolio of Commercial Real Estate