[jira] [Updated] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-13 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Luis Betancourt Gonzalez updated NUTCH-1928: -- Attachment: NUTCH-1928v5.patch Indexing filter of documents by

[jira] [Commented] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-13 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319744#comment-14319744 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1928: --- [~lewismc]

Re: [nutch-cassandra-docker] Inquiry on contribution (#1)

2015-02-13 Thread Lewis John Mcgibbney
Hi Mohamed, This is fantastic thank you for the response. I created an issue in our Jira issue tracker for this https://issues.apache.org/jira/browse/NUTCH-1923 We recently added an HBase Docker container, this now resides in the docker directory of the Nutch source

Build failed in Jenkins: Nutch-nutchgora #1337

2015-02-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-nutchgora/1337/changes Changes: [markus] NUTCH-1925 Upgrade to Apache Tika 1.7 -- [...truncated 3200 lines...] compile: jar: deps-test: deploy: copy-generated-lib: deploy: [copy] Copying 1 file to

Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Shuo Li
Hey guys, I'm trying to use Nutch-Selenium to crawl nutch.apache.org. However, my vagrant seems crushed after a few minutes. I forced it to shut down and it turns out it only crawled 59 websites. My nutch version is 1.10 and my OS is Ubuntu Trusty, 14.04. Is there anything I can provide to you

Re: Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Mattmann, Chris A (3980)
Hi Shuo, Thanks for your email. I wonder if using selenium grid would help? Please see this plugin: https://github.com/momer/nutch-selenium-grid-plugin I’m CC’ing Mo the author of the plugin to see if he experienced this while running the original selenium plugin - Mo did using selenium grid

[jira] [Commented] (NUTCH-1942) Remove TopLevelDomain

2015-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320506#comment-14320506 ] Chris A. Mattmann commented on NUTCH-1942: -- OK I see this thread:

[GitHub] nutch pull request: Update README.txt

2015-02-13 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (NUTCH-1942) Remove TopLevelDomain

2015-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320498#comment-14320498 ] Chris A. Mattmann commented on NUTCH-1942: -- cool I will join the discussion. I

Re: Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Mattmann, Chris A (3980)
Oh yes, please up your memory to like at least 2Gb.. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop:

Integrate Splash with Nutch akin to Selenium

2015-02-13 Thread Mattmann, Chris A (3980)
Hi Guys, As we bring Nutch into the realm of the dynamic deep web, I would like to be working on a plugin that has a similar idea to the Selenium stuff that Mo started and that Lewis and I are integrating - I would like to bring Splash as a component into Nutch too:

Re: Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Mo Omer
Hey all, When I had run nutch-selenium, it was in a config such that zombies were created from closing Firefox windows and they couldn't be reaped (again, due to the docker configuration I had). In a normal setup, it should not be an issue - if you're running 20 threads in nutch that's

[jira] [Commented] (NUTCH-827) HTTP POST Authentication

2015-02-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320870#comment-14320870 ] Lewis John McGibbney commented on NUTCH-827: part 2 (new files) Committed

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Attachment: NUTCH-1934-trunkv2.patch Patch rebased against trunk. The same comments

[jira] [Updated] (NUTCH-1934) Refactor Fetcher in trunk

2015-02-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1934: Fix Version/s: 1.11 Refactor Fetcher in trunk -

[jira] [Created] (NUTCH-1943) Form authentication should not be global and ignore authScope

2015-02-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1943: --- Summary: Form authentication should not be global and ignore authScope Key: NUTCH-1943 URL: https://issues.apache.org/jira/browse/NUTCH-1943 Project:

Re: Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Mohammed Omer
No worries man, glad everything works! Glad, since I was having hostname issues with nutch/hbase just now as I quickly tried to get it working/fixed for ya, ha. Mo On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li sli...@usc.edu wrote: Hey guys, After change my RAM to 2GB, everything works fine. My

[jira] [Updated] (NUTCH-827) HTTP POST Authentication

2015-02-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-827: -- Attachment: NUTCH-827-trunk-v3.patch Hi [~lewismc], attached patch fixes two points * the CSS

[jira] [Commented] (NUTCH-827) HTTP POST Authentication

2015-02-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320843#comment-14320843 ] Lewis John McGibbney commented on NUTCH-827: Fantastic [~wastl-nagel] I will

Re: Vagrant Crushed When using Nutch-Selenium

2015-02-13 Thread Shuo Li
Hey guys, After change my RAM to 2GB, everything works fine. My bad. Thanks for your help. Regards, Shuo Li On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Thank you Mo. I sincerely appreciate your guidance and contribution. I will work to

[jira] [Commented] (NUTCH-1942) Remove TopLevelDomain

2015-02-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319992#comment-14319992 ] Julien Nioche commented on NUTCH-1942: -- See

[jira] [Resolved] (NUTCH-1925) Upgrade Tika to version 1.7

2015-02-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1925. -- Resolution: Fixed Committed to branch 2x in revision 1659532. Thanks Tyler! Upgrade Tika to

[jira] [Resolved] (NUTCH-1724) LinkDBReader to support regex output filtering

2015-02-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1724. -- Resolution: Fixed Fix Version/s: (was: 1.11) 1.10 Committed to

[jira] [Commented] (NUTCH-827) HTTP POST Authentication

2015-02-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321194#comment-14321194 ] Hudson commented on NUTCH-827: -- SUCCESS: Integrated in Nutch-trunk #2976 (See