[jira] [Resolved] (NUTCH-2136) Implement a different version of Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra resolved NUTCH-2136. --- Resolution: Fixed > Implement a different version of Naive Bayes Parse Filter >

[jira] [Updated] (NUTCH-2137) add changes.txt and ALV2 headers to the Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2137: -- Issue Type: Task (was: Bug) > add changes.txt and ALV2 headers to the Naive Bayes Parse Filter

[jira] [Reopened] (NUTCH-2136) Implement a different version of Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra reopened NUTCH-2136: --- Add ALv2 headers and add author to changes.txt > Implement a different version of Naive Bayes

[jira] [Commented] (NUTCH-2136) Implement a different version of Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953438#comment-14953438 ] Asitang Mishra commented on NUTCH-2136: --- Will do. > Implement a different version of Naive Bayes

[jira] [Updated] (NUTCH-2137) add changes.txt and ALV2 headers to the Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2137: -- Priority: Trivial (was: Major) > add changes.txt and ALV2 headers to the Naive Bayes Parse

[jira] [Created] (NUTCH-2137) add changes.txt and ALV2 headers to the Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2137: - Summary: add changes.txt and ALV2 headers to the Naive Bayes Parse Filter Key: NUTCH-2137 URL: https://issues.apache.org/jira/browse/NUTCH-2137 Project: Nutch

[jira] [Resolved] (NUTCH-2136) Implement a different version of Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra resolved NUTCH-2136. --- Resolution: Fixed > Implement a different version of Naive Bayes Parse Filter >

[jira] [Resolved] (NUTCH-2137) add changes.txt and ALV2 headers to the Naive Bayes Parse Filter

2015-10-12 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra resolved NUTCH-2137. --- Resolution: Fixed > add changes.txt and ALV2 headers to the Naive Bayes Parse Filter >

[jira] [Created] (NUTCH-2136) Implement a different version of Naive Bayes Parse Filter

2015-10-11 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2136: - Summary: Implement a different version of Naive Bayes Parse Filter Key: NUTCH-2136 URL: https://issues.apache.org/jira/browse/NUTCH-2136 Project: Nutch

[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-10-09 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951411#comment-14951411 ] Asitang Mishra commented on NUTCH-2110: --- Ack > Create the capability to provide seeds in the form

[jira] [Resolved] (NUTCH-2109) Create a brute force click-all-ajax-links utility fucntion for selenium interactive plugin

2015-10-08 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra resolved NUTCH-2109. --- Resolution: Fixed > Create a brute force click-all-ajax-links utility fucntion for selenium

[jira] [Resolved] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-10-08 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra resolved NUTCH-2108. --- Resolution: Fixed > Add a function to the selenium interactive plugin interface to do

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-10-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940025#comment-14940025 ] Asitang Mishra commented on NUTCH-2108: --- [~chrismattmann] > Add a function to the selenium

[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933845#comment-14933845 ] Asitang Mishra commented on NUTCH-2110: --- To keep everything under one single url in the end (how it

[jira] [Created] (NUTCH-2126) Use selenium protocol for specific sites when switched on

2015-09-28 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2126: - Summary: Use selenium protocol for specific sites when switched on Key: NUTCH-2126 URL: https://issues.apache.org/jira/browse/NUTCH-2126 Project: Nutch

[jira] [Updated] (NUTCH-2126) Use selenium protocol for specific sites

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2126: -- Summary: Use selenium protocol for specific sites (was: Use selenium protocol for specific

[jira] [Updated] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2110: -- Description: Create the capability to provide seeds in the form of "url+xpath(including option

[jira] [Created] (NUTCH-2127) Provide the selenium protocol with basic authentication capabilities.

2015-09-28 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2127: - Summary: Provide the selenium protocol with basic authentication capabilities. Key: NUTCH-2127 URL: https://issues.apache.org/jira/browse/NUTCH-2127 Project: Nutch

[jira] [Updated] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2108: -- Priority: Major (was: Minor) > Add a function to the selenium interactive plugin interface to

[jira] [Updated] (NUTCH-2091) Increase robustness and crawling versatility of Nutch for the Deep Web

2015-09-28 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2091: -- Priority: Major (was: Minor) > Increase robustness and crawling versatility of Nutch for the

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907197#comment-14907197 ] Asitang Mishra commented on NUTCH-2108: --- Successfully tested a new idea where don't have to change

[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901798#comment-14901798 ] Asitang Mishra commented on NUTCH-2110: --- Also updated the description to tackle some basic problems

[jira] [Updated] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2110: -- Description: Create the capability to provide seeds in the form of "url+xpath(including option

[jira] [Commented] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-21 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901059#comment-14901059 ] Asitang Mishra commented on NUTCH-2110: --- Hi Sebastain, Yes, using the crawldatum is the perfect

[jira] [Commented] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-21 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901458#comment-14901458 ] Asitang Mishra commented on NUTCH-2108: --- Hi [~jo...@apache.org], Can you take a look at the changes

[jira] [Created] (NUTCH-2108) Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data

2015-09-17 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2108: - Summary: Add a function to the selenium interactive plugin interface to do multiple manipulation of driver and then return the data Key: NUTCH-2108 URL:

[jira] [Created] (NUTCH-2109) Create a brute force click-all-ajax-links utility fucntion for selenium interactive plugin

2015-09-17 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2109: - Summary: Create a brute force click-all-ajax-links utility fucntion for selenium interactive plugin Key: NUTCH-2109 URL: https://issues.apache.org/jira/browse/NUTCH-2109

[jira] [Created] (NUTCH-2110) Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium"

2015-09-17 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2110: - Summary: Create the capability to provide seeds in the form of "url+xpath(including option to enter seach terms).selenium" Key: NUTCH-2110 URL:

[jira] [Created] (NUTCH-2091) Make Nutch more robust and smart

2015-09-08 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2091: - Summary: Make Nutch more robust and smart Key: NUTCH-2091 URL: https://issues.apache.org/jira/browse/NUTCH-2091 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.10.2

2015-08-19 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703291#comment-14703291 ] Asitang Mishra commented on NUTCH-1486: --- Hey Lewis, Your fix for the jar soup did

[jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.10.2

2015-08-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701696#comment-14701696 ] Asitang Mishra commented on NUTCH-1486: --- Hey Lewis, Just noticed when I was

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop 2.4 stable

2015-08-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701498#comment-14701498 ] Asitang Mishra commented on NUTCH-2049: --- Hi Lewis, Had some issues applying your

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop 2.4 stable

2015-08-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701514#comment-14701514 ] Asitang Mishra commented on NUTCH-2049: --- Ack!! Upgrade Trunk to Hadoop 2.4 stable

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop 2.4 stable

2015-08-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701495#comment-14701495 ] Asitang Mishra commented on NUTCH-2049: --- Hi Chris, The Naive Bayes plugin, since

[jira] [Commented] (NUTCH-1486) Upgrade to Solr 4.10.2

2015-08-03 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652644#comment-14652644 ] Asitang Mishra commented on NUTCH-1486: --- yepp!! Upgrade to Solr 4.10.2

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610439#comment-14610439 ] Asitang Mishra commented on NUTCH-2038: --- Sure... Naive Bayes classifier based html

[jira] [Updated] (NUTCH-2056) Move the Mahout and Lucene dependencies to the plugin from the main ivy.xml for the Naive Bayes Parse Filter (NUTCH-2038)

2015-07-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2056: -- Labels: memex nutch (was: memex) Move the Mahout and Lucene dependencies to the plugin from

[jira] [Created] (NUTCH-2056) Move the Mahout and Lucene dependencies to the plugin from the main ivy.xml for the Naive Bayes Parse Filter (NUTCH-2038)

2015-07-01 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2056: - Summary: Move the Mahout and Lucene dependencies to the plugin from the main ivy.xml for the Naive Bayes Parse Filter (NUTCH-2038) Key: NUTCH-2056 URL:

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610476#comment-14610476 ] Asitang Mishra commented on NUTCH-2038: --- NUTCH-2057 Naive Bayes classifier based

[jira] [Created] (NUTCH-2057) Put all the files produced during training of the model for Naive Bayes classifier, in the Naive Bayed Parse Filter (NUTCH-2038), in a single folder

2015-07-01 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2057: - Summary: Put all the files produced during training of the model for Naive Bayes classifier, in the Naive Bayed Parse Filter (NUTCH-2038), in a single folder Key: NUTCH-2057

[jira] [Updated] (NUTCH-2057) Put all the files produced during training of the model for Naive Bayes classifier, in the Naive Bayes Parse Filter (NUTCH-2038), in a single folder

2015-07-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2057: -- Summary: Put all the files produced during training of the model for Naive Bayes classifier, in

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-07-01 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610439#comment-14610439 ] Asitang Mishra edited comment on NUTCH-2038 at 7/1/15 3:34 PM:

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609295#comment-14609295 ] Asitang Mishra commented on NUTCH-2038: --- I tried with adding the jars to the main

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609542#comment-14609542 ] Asitang Mishra commented on NUTCH-2038: --- woot!!1 Naive Bayes classifier based html

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609542#comment-14609542 ] Asitang Mishra edited comment on NUTCH-2038 at 7/1/15 4:23 AM:

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606592#comment-14606592 ] Asitang Mishra edited comment on NUTCH-2038 at 6/30/15 5:43 PM:

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-30 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606592#comment-14606592 ] Asitang Mishra edited comment on NUTCH-2038 at 6/30/15 5:43 PM:

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-29 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605661#comment-14605661 ] Asitang Mishra commented on NUTCH-2038: --- Yup dint fail for me as well.. gonna list

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-29 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606592#comment-14606592 ] Asitang Mishra commented on NUTCH-2038: --- Hi [~wastl-nagel], I am facing the

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-26 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603977#comment-14603977 ] Asitang Mishra commented on NUTCH-2038: --- Oh Great will fix them all :) Naive

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600419#comment-14600419 ] Asitang Mishra edited comment on NUTCH-2038 at 6/25/15 12:19 AM:

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600419#comment-14600419 ] Asitang Mishra commented on NUTCH-2038: --- maybe rename the plugin to

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599656#comment-14599656 ] Asitang Mishra commented on NUTCH-2038: --- I still have to transfer the external

[jira] [Updated] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2038: -- Description: A html parse filter that will filter out the outlinks in two stages. One:

[jira] [Updated] (NUTCH-2038) Naive Bayes classifier based html Parse filter (for filtering outlinks)

2015-06-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2038: -- Description: A html parse filter that will filter out the outlinks in two stages. Classify the

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-23 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598715#comment-14598715 ] Asitang Mishra commented on NUTCH-2038: --- Hey [~wastl-nagel], I have decided to

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-22 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596239#comment-14596239 ] Asitang Mishra edited comment on NUTCH-2038 at 6/22/15 5:09 PM:

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-22 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596239#comment-14596239 ] Asitang Mishra commented on NUTCH-2038: --- From what I understand the problem is that

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591974#comment-14591974 ] Asitang Mishra commented on NUTCH-2038: --- Its the wrong one see the 32nd patch

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592089#comment-14592089 ] Asitang Mishra commented on NUTCH-2038: --- There are three columns: 1. 1 or 0, 1 for

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592089#comment-14592089 ] Asitang Mishra edited comment on NUTCH-2038 at 6/18/15 4:53 PM:

[jira] [Updated] (NUTCH-2043) Interface and high level design for classification using models

2015-06-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2043: -- Description: To discuss and come up with a high level design or an interface for classification

[jira] [Commented] (NUTCH-2043) Interface and high level design for classification using models

2015-06-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592223#comment-14592223 ] Asitang Mishra commented on NUTCH-2043: --- [~chrismattmann] [~kwhitehall] [~sujenshah]

[jira] [Created] (NUTCH-2043) Interface and high level design for classification using models

2015-06-18 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2043: - Summary: Interface and high level design for classification using models Key: NUTCH-2043 URL: https://issues.apache.org/jira/browse/NUTCH-2043 Project: Nutch

[jira] [Issue Comment Deleted] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2038: -- Comment: was deleted (was: I am done with the code and Nutch integration in my branch, Will

[jira] [Updated] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2038: -- Description: A url filter that will filter out the urls (after the parsing stage, will keep

[jira] [Commented] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590080#comment-14590080 ] Asitang Mishra commented on NUTCH-2038: --- Have made a pull request for a rather

[jira] [Comment Edited] (NUTCH-2038) Naive Bayes classifier based url filter

2015-06-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590080#comment-14590080 ] Asitang Mishra edited comment on NUTCH-2038 at 6/17/15 4:51 PM:

[jira] [Commented] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-16 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589069#comment-14589069 ] Asitang Mishra commented on NUTCH-2038: --- I am done with the code and Nutch

[jira] [Created] (NUTCH-2038) url filter that uses a model (from a classifier)

2015-06-10 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2038: - Summary: url filter that uses a model (from a classifier) Key: NUTCH-2038 URL: https://issues.apache.org/jira/browse/NUTCH-2038 Project: Nutch Issue

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-04 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573895#comment-14573895 ] Asitang Mishra commented on NUTCH-2027: --- Done seed list REST endpoint for Nutch

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571565#comment-14571565 ] Asitang Mishra commented on NUTCH-2027: --- Here is an example of the request format :

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571310#comment-14571310 ] Asitang Mishra commented on NUTCH-2027: --- [~lewismc] seed list REST endpoint for

[jira] [Commented] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-06-03 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572102#comment-14572102 ] Asitang Mishra commented on NUTCH-2027: --- Done ... [~chrismattmann] try again!!

[jira] [Created] (NUTCH-2027) seed list REST endpoint for Nutch 1.10

2015-05-31 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2027: - Summary: seed list REST endpoint for Nutch 1.10 Key: NUTCH-2027 URL: https://issues.apache.org/jira/browse/NUTCH-2027 Project: Nutch Issue Type: New

[jira] [Created] (NUTCH-2026) Crawl endpoint for the REST api

2015-05-31 Thread Asitang Mishra (JIRA)
Asitang Mishra created NUTCH-2026: - Summary: Crawl endpoint for the REST api Key: NUTCH-2026 URL: https://issues.apache.org/jira/browse/NUTCH-2026 Project: Nutch Issue Type: New Feature

[jira] [Commented] (NUTCH-2015) Make FetchNodeDb optional (off by default) if NutchServer is not used

2015-05-27 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561434#comment-14561434 ] Asitang Mishra commented on NUTCH-2015: --- Hi [~chrismattmann], I think [~sujenshah]

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-22 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556861#comment-14556861 ] Asitang Mishra commented on NUTCH-2011: --- Hi [~wastl-nagel], -For the persistent

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-18 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547778#comment-14547778 ] Asitang Mishra commented on NUTCH-2011: --- Hi [~wastl-nagel], -The answer to your

[jira] [Issue Comment Deleted] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-2011: -- Comment: was deleted (was: Thanks [~chrismattmann] !!) Endpoint to support realtime JSON

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547380#comment-14547380 ] Asitang Mishra commented on NUTCH-2011: --- Thanks [~chrismattmann] !! Endpoint to

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547381#comment-14547381 ] Asitang Mishra commented on NUTCH-2011: --- Thanks [~chrismattmann] !! Endpoint to

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547408#comment-14547408 ] Asitang Mishra commented on NUTCH-2011: --- ACK!! thanks... +1 Endpoint to support

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547420#comment-14547420 ] Asitang Mishra commented on NUTCH-2011: --- Hi [~wastl-nagel], The fetcher going out

[jira] [Commented] (NUTCH-2011) Endpoint to support realtime JSON output from the fetcher

2015-05-17 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547373#comment-14547373 ] Asitang Mishra commented on NUTCH-2011: --- I was trying to test the memory issue

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-14 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1854: -- Attachment: NUTCH-1854ver4.patch Added NUTCH-1854ver4.patch : formatted the

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-14 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495687#comment-14495687 ] Asitang Mishra commented on NUTCH-1854: --- okay done Lewis.. ./bin/crawl fails with

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-13 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1854: -- Attachment: NUTCH-1854ver3.patch Added: NUTCH-1854ver3.patch Rebased it and made changes for

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-09 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487622#comment-14487622 ] Asitang Mishra commented on NUTCH-1854: --- Sounds logical to add this check to the

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-09 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1854: -- Attachment: NUTCH-1854ver2.patch Added NUTCH-1854ver2.patch : 1. made changes to ver1 of the

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-06 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481649#comment-14481649 ] Asitang Mishra commented on NUTCH-1854: --- what should be the default behavior when we

[jira] [Updated] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-06 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1854: -- Attachment: NUTCH-1854ver1.patch Added patch NUTCH-1854ver1.patch. This patch makes changes in

[jira] [Commented] (NUTCH-1854) ./bin/crawl fails with a parsing fetcher

2015-04-06 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482418#comment-14482418 ] Asitang Mishra commented on NUTCH-1854: --- I agree Lewis. ./bin/crawl fails with a

[jira] [Updated] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-27 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1941: -- Attachment: NUTCH-1941-ver6.patch Added NUTCH-1941-ver6.patch In this patch I made the

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-27 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384905#comment-14384905 ] Asitang Mishra commented on NUTCH-1941: --- Thank you and great work!! Optional

[jira] [Updated] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-26 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asitang Mishra updated NUTCH-1941: -- Attachment: NUTCH-1941-itr4.patch Added NUTCH-1941-itr4.patch I have updated the patch and made

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-26 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382974#comment-14382974 ] Asitang Mishra commented on NUTCH-1941: --- Looked into why it's not working for

[jira] [Comment Edited] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-26 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382974#comment-14382974 ] Asitang Mishra edited comment on NUTCH-1941 at 3/26/15 11:50 PM:

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-25 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380959#comment-14380959 ] Asitang Mishra commented on NUTCH-1941: --- I see the problem now. Basically the code

[jira] [Commented] (NUTCH-1941) Optional rolling http.agent.name's

2015-03-24 Thread Asitang Mishra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378994#comment-14378994 ] Asitang Mishra commented on NUTCH-1941: --- Hi Sebastian, The solution 1 is what the

  1   2   >