Re: failed to subscribe 'nutch-user' maillist

2007-06-30 Thread Susam Pal
From: Oscar <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: subscribe This is how you are trying to subscribe. This is incorrect. You should send a mail to the following email address to subscribe to the mailing list. [EMAIL PROTECTED] Regards, Susam Pal http://susam.in/ On 6/30/07,

Re: Build failed in Hudson: Nutch-Nightly #203

2007-09-11 Thread Susam Pal
Is it that the interface 'org.apache.nutch.net.URLFilter' was compiled with JDK 1.5 earlier? I have seen this problem happening with a beta version of JDK 1.6. Are you using the latest version, JDK 1.6 Update 2? Regards, Susam Pal http://susam.in/ On 9/11/07, Doğacan Güney <[EM

protocol-httpclient Authentication schemes

2007-09-14 Thread Susam Pal
where. Any suggestions? Regards, Susam Pal http://susam.in/

Re: Two suggestions

2007-10-05 Thread Susam Pal
-Jim Jim, Have you tried parse-pdf? Regards, Susam Pal http://susam.in/

Re: Choices in Nutch Web interface?

2007-10-10 Thread Susam Pal
mailing list. Regards, Susam Pal http://susam.in/ On 10/10/07, Christopher Bader <[EMAIL PROTECTED]> wrote: > I ran Nutch on a subset of Wikipedia, and it works. But for each search it > always gives exactly two choices. > > > > How do I configure it so that it gives (a) N

Re: [jira] Created: (NUTCH-599) nutch crawl and index problem

2008-01-07 Thread Susam Pal
t is not a bug in Nutch 0.9 This looks like a configuration problem at your end. Please discuss this properly in [EMAIL PROTECTED] instead of submitting it as a bug in Nutch. Regards, Susam Pal On Jan 8, 2008 7:16 AM, sudarat (JIRA) <[EMAIL PROTECTED]> wrote: > nutch craw

Re: [jira] Created: (NUTCH-599) nutch crawl and index problem

2008-01-07 Thread Susam Pal
I wanted to send this as a private reply but sent it to the list instead. Sorry for the inconvenience. On Jan 8, 2008 10:21 AM, Susam Pal <[EMAIL PROTECTED]> wrote: > I have replied this query of yours yesterday in > [EMAIL PROTECTED] If you haven't received the reply, > p

Re: nutch latest build - inject operation failing

2008-02-14 Thread Susam Pal
r in Linux. I am not well acquainted with the Hadoop code yet. Could someone throw light on what might be going wrong? Regards, Susam Pal On 2/7/08, DS jha <[EMAIL PROTECTED]> wrote: Hi - > > Looks like latest trunk version of nutch is failing with the following > exception when

Re: nutch latest build - inject operation failing

2008-02-14 Thread Susam Pal
this failed with the same error. Right now I don't have a Windows system with me. I will try setting it as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows system and then I'll update the mailing list with the observations. Thanks for the suggestion. Regards, Susam Pal O

Re: nutch latest build - inject operation failing

2008-02-15 Thread Susam Pal
org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165) Regards, Susam Pal On Thu, Feb 14, 2008 at 10:07 PM, Susam Pal <[EMAIL PROTECTED]> wrote: > What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the &g

Re: Problem in running Nutch where proxy authentication is required.

2008-03-14 Thread Susam Pal
I still can't see any DEBUG logs in your log file. Did you go through my earlier mail? Regards, Susam Pal On Wed, Mar 12, 2008 at 9:39 PM, <[EMAIL PROTECTED]> wrote: > > Hi All, > > I am facing a problem in running nutch where the proxy authentication is > requ

Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-22 Thread Susam Pal
some valuable work can be done. What do you say? Regards, Susam Pal

Re: Why is Nutch not involved in Google Summer of Code - 2008?

2008-03-30 Thread Susam Pal
interact with the community and his assigned mentor through the mailing list and since the whole community is there to guide him, there is not much of a burden on the mentor. Regards, Susam Pal On Sun, Mar 30, 2008 at 8:55 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > How much of a time commitm

Re: Pending Commits for Nutch Issues

2008-12-02 Thread Susam Pal
I agree with John too. Probably you meant $ 0.02, since 0.02 cents is too less. It is usually 2 cents. :-P Regards, Susam Pal On Tue, Dec 2, 2008 at 6:09 PM, John Martyniak <[EMAIL PROTECTED]> wrote: > Is NUTCH-442 going to be part of the 1.0 release? I hope so, Nutch/Solr > integ

crawl-tool.xml mentions nutch-site.xml for overriding but it is not possible

2009-04-06 Thread Susam Pal
es 39 to 40) : conf.addResource("nutch-default.xml"); conf.addResource("nutch-site.xml"); So, shouldn't that XML comment be removed from 'conf/crawl-tool.xml' ? Regards, Susam Pal

Re: crawl-tool.xml mentions nutch-site.xml for overriding but it is not possible

2009-05-09 Thread Susam Pal
On Tue, Apr 7, 2009 at 1:07 AM, Susam Pal wrote: > The inline documentation of 'conf/crawl-tool.xml' mentions: > > > > > > However, I don't see any way of overriding the properties defined in > 'conf/crawl-tool.xml' as 'conf/nutch-site.x

Re: How can I get startted with Nutch 1.0

2009-06-01 Thread Susam Pal
a 6. The subversion details are available at: http://lucene.apache.org/nutch/version_control.html Regards, Susam Pal

Re: Crawling authenticated websites !

2010-03-18 Thread Susam Pal
e right place ask this. I've included it in CC. This feature is not present in Nutch. We have recorded the summary of some old discussions regarding this here: http://wiki.apache.org/nutch/HttpPostAuthentication But this was never implemented. Regards, Susam Pal

[jira] Updated: (NUTCH-44) too many search results

2007-09-08 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-44: --- Attachment: NUTCH-44.patch Attached a patch. To apply:- patch -p0 < NUTCH-44.patch ant war cp bu

[jira] Updated: (NUTCH-44) too many search results

2007-09-08 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-44: --- Attachment: (was: NUTCH-44.patch) > too many search resu

[jira] Updated: (NUTCH-44) too many search results

2007-09-08 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-44: --- Attachment: NUTCH-44.patch Updated my previous patch to fix the issue in opensearch too. To apply:- patch

[jira] Updated: (NUTCH-281) cached.jsp: base-href needs to be outside comments

2007-09-09 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-281: Attachment: NUTCH-281.patch Uploading a patch. Put the tag outside comments and now the relative links in

[jira] Created: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-18 Thread Susam Pal (JIRA)
Type: Improvement Components: fetcher Affects Versions: 1.0.0 Reporter: Susam Pal 'protocol-http11' is a protocol plugin which supports retrieving documents via the HTTP 1.0, HTTP 1.1 and HTTPS protocols, optionally with Basic, Digest and NTLM authentication s

[jira] Updated: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-18 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-557: Attachment: protocol-http11v0.1.patch I have generated this patch against Nutch trunk. To apply:- patch

[jira] Updated: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-18 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-557: Priority: Minor (was: Major) > protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authenticat

[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-19 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528854 ] Susam Pal commented on NUTCH-557: - No, there isn't any significant difference in performance. Here's a l

[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-21 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529528 ] Susam Pal commented on NUTCH-557: - Thank you, Doğacan and Andrzej for your comments. I started developing it in a

[jira] Commented: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-21 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529530 ] Susam Pal commented on NUTCH-557: - Point no. 2 of my previous comment is incorrect. The SSL related files are being

[jira] Created: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-09-24 Thread Susam Pal (JIRA)
Components: fetcher Affects Versions: 1.0.0 Reporter: Susam Pal Priority: Minor Added basic, digest and NTLM authentication schemes to protocol-httpclient. The authentication schemes can be configured for proxy server as well as web servers of a domain

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-09-24 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Attachment: NUTCH-559v0.1.patch I have generated this patch against Nutch trunk. It will add support for

[jira] Closed: (NUTCH-557) protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication

2007-09-24 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal closed NUTCH-557. --- Resolution: Won't Fix As per the discussion, 'protocol-http11' has been turned into a patch

[jira] Issue Comment Edited: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication

2007-09-25 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530175 ] susam edited comment on NUTCH-539 at 9/25/07 10:54 AM: --- 1. There is a bug in the patch. The

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-09-25 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Priority: Major (was: Minor) Apart from adding the authentication features, this patch would fix three

[jira] Commented: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit

2007-09-26 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530519 ] Susam Pal commented on NUTCH-560: - I analysed 'protocol-http' and it behaves almost in the same man

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-09-27 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Attachment: NUTCH-559v0.2.patch Uploading a revised (v0.2) patch which accommodates most of the suggestions

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-10-30 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Attachment: NUTCH-559v0.3.patch Uploading a revised (v0.3) patch that allows flexible authentication

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-11-01 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Attachment: NUTCH-559v0.4.patch Uploading a revised (v0.4) patch that has all authentication configuration

[jira] Updated: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server

2007-11-28 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-559: Attachment: NUTCH-559v0.5.patch Uploading a revised (v0.5) patch with some test cases. Added a 's

[jira] Updated: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-04 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-601: Attachment: NUTCH-601v0.2.patch Attached a revised patch (NUTCH-601v0.2.patch), which removes the old

[jira] Updated: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-04 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-601: Attachment: NUTCH-601v0.1.patch Patch attached. > Recrawling on existing crawl directory using fo

[jira] Created: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-04 Thread Susam Pal (JIRA)
Versions: 1.0.0 Reporter: Susam Pal Priority: Minor Added a '-force' option to the 'bin/nutch crawl' command line. With this option, one can crawl and recrawl in the following manner: {code} bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5 bin/n

[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-05 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565848#action_12565848 ] Susam Pal commented on NUTCH-601: - The 'if (newIndex != index)' condition i

[jira] Updated: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-15 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-601: Attachment: NUTCH-601v1.0.patch Attached another patch (NUTCH-601v1.0.patch) that always deletes the old

[jira] Updated: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-15 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-601: Attachment: NUTCH-601v0.3.patch Attached a revised patch (NUTCH-601v0.3.patch) that makes the code simpler

[jira] Updated: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl

2008-02-15 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-612: Attachment: NUTCH-612v0.1.patch Attached patch to fix the bug. This modifies Crawl.java and Generator.java

[jira] Created: (NUTCH-612) URL filtering is always disabled in Generator when invoked by Crawl

2008-02-15 Thread Susam Pal (JIRA)
Components: generator Affects Versions: 1.0.0 Reporter: Susam Pal Fix For: 1.0.0 When a crawl is done using the 'bin/nutch crawl' command, no filtering is done in Generator even if 'crawl.generate.filter' is set to true in the configuration f

[jira] Commented: (NUTCH-601) Recrawling on existing crawl directory using force option

2008-02-29 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573790#action_12573790 ] Susam Pal commented on NUTCH-601: - It continues the recrawl using the existing c

[jira] Created: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

2009-05-08 Thread Susam Pal (JIRA)
Issue Type: Bug Components: web gui Affects Versions: 1.0.0 Reporter: Susam Pal Priority: Minor The inline documentation of 'conf/crawl-tool.xml' mentions: {code:xml} {code} However, I don't see any way of overriding the proper

[jira] Updated: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

2009-05-09 Thread Susam Pal (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Susam Pal updated NUTCH-735: Attachment: NUTCH-735v0.1.patch Attached patch. > crawl-tool.xml must be read before nutch-site.xml w