Fetch command returns immediately

2010-12-05 Thread Alexis
) @@ -174,6 +174,7 @@ } else { currentJob.setNumReduceTasks(numTasks); } +currentJob.waitForCompletion(true); ToolUtil.recordJobStatus(null, currentJob, results); return results; } Alexis

Re: Does Nutch 2.0 in good enough shape to test?

2010-12-18 Thread Alexis
/NUTCH-899 which is the same problem. I tried to come up with a JUnit test but it is still rather imperfect (I want to use org.apache.nutch.util.CrawTestUtil.getServer for it). The whole patch is here: https://issues.apache.org/jira/secure/attachment/12466548/httpContentLimit.patch Alexis

Re: Does Nutch 2.0 in good enough shape to test?

2011-01-01 Thread Alexis
of the test. It worked for me after I patched a few stuff. They are described throughout the blog entry or in this new JIRA-950 issue which, among others, reopens JIRA-899. Hope this helps. Alexis.

Re: Welcome Alexis Detreglode as a Nutch Committer

2011-02-15 Thread Alexis
participate please refer to Nutch 2.0 section in the wiki. There are many ways to contribute: send a message on the mailing-list, create an issue on JIRA while attaching your patch to it or not, update the wiki... Give it a shot! Alexis http://techvineyard.blogspot.com On Tue, Feb 15, 2011 at

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
Hi, libthrift is a dependency of cassandra-thrift, as listed here: http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1 During Nutch build, you have to manually tweak the Ivy configuration depending on your choice of the Gora store, in this case Cassandra. Basically you ne

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
his line to get the hector dependency: > >         conf="*->default"/> > > -Original Message- > From: Alexis [mailto:alexis.detregl...@gmail.com] > Sent: Monday, August 01, 2011 2:28 PM > To: dev@nutch.apache.org > Subject: Re: Nutch 2 and Cassandra

Re: InvocationTargetException with Nutch 2.0 Gora 0.2 and Cassandra 0.8.4

2011-08-30 Thread Alexis
Hi Tom, I'm having the same issue. The two missing jars in the nutch-2.0-dev.job, cassandra-all-0.8.0.jar and hector-core-0.8.0-1.jar, have been manually uploaded for the Gora build to work into gora-cassandra/lib-ext SVN directory, because for some reason I did not get them downloaded through Mav

Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Alexis
order to implement search. They use HBase which is, by the way, Nutch 2.0 compatible. Take at look: http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I don't think any video of the summit is available yet, not sure why) Alexis On Mon, Sep 19, 2011 at 1:05 AM, Jul

Re: Choosing an efficient family configuration for GORA HBase

2011-10-01 Thread Alexis
Dear Ferdy, This mapping is user defined. It specifies where Avro fields required by Nutch jobs are stored in HBase. You can tweak the schema according to this kind of considerations by editing the config file. So content is populated by the Fetcher job (writes) that downloads the web page. It i

[jira] Commented: (NUTCH-873) Ivy configuration settings don't include Gora

2010-11-05 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928788#action_12928788 ] Alexis commented on NUTCH-873: -- It did not work as seamless for me. The gora build creat

[jira] Issue Comment Edited: (NUTCH-873) Ivy configuration settings don't include Gora

2010-11-05 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928788#action_12928788 ] Alexis edited comment on NUTCH-873 at 11/5/10 3:48 PM: --- It did

[jira] Issue Comment Edited: (NUTCH-873) Ivy configuration settings don't include Gora

2010-11-05 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928788#action_12928788 ] Alexis edited comment on NUTCH-873 at 11/5/10 3:51 PM: --- It did

[jira] Issue Comment Edited: (NUTCH-873) Ivy configuration settings don't include Gora

2010-11-05 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928788#action_12928788 ] Alexis edited comment on NUTCH-873 at 11/5/10 3:52 PM: --- It did

[jira] Commented: (NUTCH-880) REST API for Nutch

2010-11-05 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928896#action_12928896 ] Alexis commented on NUTCH-880: -- This revision introduced a bug in the nutch inject command

[jira] Commented: (NUTCH-899) java.sql.BatchUpdateException: Data truncation: Data too long for column 'content' at row 1

2010-12-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970336#action_12970336 ] Alexis commented on NUTCH-899: -- I ran into the exact same issue, with MySQL. The blob co

[jira] Updated: (NUTCH-899) java.sql.BatchUpdateException: Data truncation: Data too long for column 'content' at row 1

2010-12-18 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-899: - Attachment: httpContentLimit.patch We stick with the default gora schema for the MySQL backend, which says

[jira] Created: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
Reporter: Alexis 1. crawl command (nutch1.patch) The class was renamed to Crawler but the references to it were not updated. 2. URL filter (nutch2.patch) This avoids a NPE on bogus urls which host do not have a suffix. 3. Content-Length limit (nutch3.patch) This is related to NUTCH-899

[jira] Updated: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-950: - Attachment: nutch4.patch > Content-Length limit, URL filter and few minor iss

[jira] Updated: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-950: - Attachment: nutch3.patch nutch2.patch nutch1.patch > Content-Length limit,

[jira] Created: (NUTCH-955) Ivy configuration

2011-01-10 Thread Alexis (JIRA)
Ivy configuration - Key: NUTCH-955 URL: https://issues.apache.org/jira/browse/NUTCH-955 Project: Nutch Issue Type: Improvement Components: build Affects Versions: 2.0 Reporter: Alexis As mentioned

[jira] Updated: (NUTCH-955) Ivy configuration

2011-01-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-955: - Attachment: ivy.patch In the patch, the required dependencies for MySQL and HBase are included in the Ivy config

[jira] Issue Comment Edited: (NUTCH-955) Ivy configuration

2011-01-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979525#action_12979525 ] Alexis edited comment on NUTCH-955 at 1/10/11 5:27 AM: --- In the p

[jira] Resolved: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis resolved NUTCH-950. -- Resolution: Fixed Fix Version/s: 2.0 Sorry I missed the Ivy configuration file in the plugin directory

[jira] Created: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)
soldindex issues Key: NUTCH-956 URL: https://issues.apache.org/jira/browse/NUTCH-956 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.0 Reporter: Alexis I ran into a few

[jira] Updated: (NUTCH-956) soldindex issues

2011-01-13 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Attachment: solr.patch Here are the changes: - Avoid multiple values for id field. (NUTCH-819) - Allow multiple

[jira] Updated: (NUTCH-956) solrindex issues

2011-01-13 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Summary: solrindex issues (was: soldindex issues) > solrindex issues > > >

[jira] Commented: (NUTCH-955) Ivy configuration

2011-01-18 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983125#action_12983125 ] Alexis commented on NUTCH-955: -- Sorry please disregard the nutch.root first bullet in

[jira] Created: (NUTCH-965) Parsing takes up 100% CPU

2011-02-08 Thread Alexis (JIRA)
Parsing takes up 100% CPU - Key: NUTCH-965 URL: https://issues.apache.org/jira/browse/NUTCH-965 Project: Nutch Issue Type: Improvement Components: parser Reporter: Alexis The issue you&#x

[jira] Updated: (NUTCH-965) Parsing takes up 100% CPU

2011-02-08 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-965: - Attachment: parserJob.patch In the parser mapper, compare Content-Length header to the size of the content

[jira] Updated: (NUTCH-965) Skip parsing for truncated documents

2011-02-10 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-965: - Summary: Skip parsing for truncated documents (was: Parsing takes up 100% CPU) > Skip parsing for trunca

[jira] [Commented] (NUTCH-956) solrindex issues

2011-07-12 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064148#comment-13064148 ] Alexis commented on NUTCH-956: -- I do get the NPE when indexing this url

[jira] [Updated] (NUTCH-956) solrindex issues

2011-07-12 Thread Alexis (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis updated NUTCH-956: - Attachment: solr.patch2 - NPE related to content-type field - tld field in Solr schema - string comparison in