[jira] [Commented] (NUTCH-1631) Display Document Count Added To Solr Server

2013-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748967#comment-13748967
 ] 

Hudson commented on NUTCH-1631:
---

SUCCESS: Integrated in Nutch-nutchgora #730 (See 
[https://builds.apache.org/job/Nutch-nutchgora/730/])
NUTCH-1631 Display Document Count Added to Solr Server (lewismc: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1517003)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/java/org/apache/nutch/indexer/IndexerJob.java
* /nutch/branches/2.x/src/java/org/apache/nutch/indexer/solr/SolrIndexerJob.java
* /nutch/branches/2.x/src/java/org/apache/nutch/indexer/solr/SolrWriter.java


> Display Document Count Added To Solr Server
> ---
>
> Key: NUTCH-1631
> URL: https://issues.apache.org/jira/browse/NUTCH-1631
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 2.1, 2.2, 2.2.1
>Reporter: Furkan KAMACI
>Priority: Minor
> Fix For: 2.3
>
> Attachments: NUTCH-1631.patch
>
>
> Currently you can not see how many documents are added to Solr Server from 
> Nutch. One should be able to see how many documents are added to Solr Server 
> simultaneously (as a hadoop counter) and also total document count should be 
> logged too after all documents are added to Solr Server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-693) Add configurable option for treating nofollow behaviour.

2013-08-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748958#comment-13748958
 ] 

Lewis John McGibbney commented on NUTCH-693:


Hi Santiago, if you would like to update the patch then please do so. Patch 
against trunk and/or 2.x HEAD and we will see where this goes.

> Add configurable option for treating nofollow behaviour.
> 
>
> Key: NUTCH-693
> URL: https://issues.apache.org/jira/browse/NUTCH-693
> Project: Nutch
>  Issue Type: New Feature
>Reporter: Andrew McCall
>Priority: Minor
> Attachments: nutch.nofollow.patch
>
>
> For my purposes I'd like to follow links even if they're marked nofollow- 
> Ideally I'd like to follow them, but not pass the link juice between them. 
> I've attached a patch that adds a configuration element 
> parser.html.outlinks.ignore_nofollow which allows the parser to ignore the 
> nofollow elements on a page. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1631) Display Document Count Added To Solr Server

2013-08-23 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1631.
-

Resolution: Fixed

Committed @revision 1517003 in 2.x HEAD
Thank you very much Furkan for the patch.
In the future it would be great if you could produce your patches like the 
following

git diff --no-prefix 2.x > NUTCH-1631.patch

This will mean we can apply cleanly to SVN. Thank v much. :)

> Display Document Count Added To Solr Server
> ---
>
> Key: NUTCH-1631
> URL: https://issues.apache.org/jira/browse/NUTCH-1631
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 2.1, 2.2, 2.2.1
>Reporter: Furkan KAMACI
>Priority: Minor
> Fix For: 2.3
>
> Attachments: NUTCH-1631.patch
>
>
> Currently you can not see how many documents are added to Solr Server from 
> Nutch. One should be able to see how many documents are added to Solr Server 
> simultaneously (as a hadoop counter) and also total document count should be 
> logged too after all documents are added to Solr Server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1619) Writes Dmoz Description and Title information to db with snippet argument

2013-08-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748921#comment-13748921
 ] 

Lewis John McGibbney commented on NUTCH-1619:
-

Please commit when you can Feng.

> Writes Dmoz Description and Title information to db with snippet argument
> -
>
> Key: NUTCH-1619
> URL: https://issues.apache.org/jira/browse/NUTCH-1619
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Yasin Kılınç
>Priority: Minor
> Fix For: 2.3
>
> Attachments: NUTCH-1619.patch, NUTCH-DMOZ-Snippet.patch
>
>
> We need Dmoz information of fetched URLs can be written to database. So these 
> information can be used like snipppet by indexer of the search engine we are 
> working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1631) Display Document Count Added To Solr Server

2013-08-23 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748913#comment-13748913
 ] 

Lewis John McGibbney commented on NUTCH-1631:
-

Nice work.
The patch looks good and I would be +1 to getting it in to the codebase.
Thanks
Lewis

> Display Document Count Added To Solr Server
> ---
>
> Key: NUTCH-1631
> URL: https://issues.apache.org/jira/browse/NUTCH-1631
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 2.1, 2.2, 2.2.1
>Reporter: Furkan KAMACI
>Priority: Minor
> Fix For: 2.3
>
> Attachments: NUTCH-1631.patch
>
>
> Currently you can not see how many documents are added to Solr Server from 
> Nutch. One should be able to see how many documents are added to Solr Server 
> simultaneously (as a hadoop counter) and also total document count should be 
> logged too after all documents are added to Solr Server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1631) Display Document Count Added To Solr Server

2013-08-23 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748595#comment-13748595
 ] 

lufeng commented on NUTCH-1631:
---

Good statistical methods. +1 

> Display Document Count Added To Solr Server
> ---
>
> Key: NUTCH-1631
> URL: https://issues.apache.org/jira/browse/NUTCH-1631
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 2.1, 2.2, 2.2.1
>Reporter: Furkan KAMACI
>Priority: Minor
> Fix For: 2.3
>
> Attachments: NUTCH-1631.patch
>
>
> Currently you can not see how many documents are added to Solr Server from 
> Nutch. One should be able to see how many documents are added to Solr Server 
> simultaneously (as a hadoop counter) and also total document count should be 
> logged too after all documents are added to Solr Server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1629) there is no need to fail on empty lines in seed file when injecting.

2013-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748427#comment-13748427
 ] 

Hudson commented on NUTCH-1629:
---

SUCCESS: Integrated in Nutch-nutchgora #729 (See 
[https://builds.apache.org/job/Nutch-nutchgora/729/])
NUTCH-1629 Injector skips empty lines (jnioche: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1516752)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/java/org/apache/nutch/crawl/InjectorJob.java


> there is no need to fail on empty lines in seed file when injecting.
> 
>
> Key: NUTCH-1629
> URL: https://issues.apache.org/jira/browse/NUTCH-1629
> Project: Nutch
>  Issue Type: Improvement
>  Components: injector
>Affects Versions: 1.7, 2.2.1
> Environment: Java 1.7.0_25
>Reporter: kaveh minooie
>  Labels: easyfix
> Fix For: 2.3, 1.8
>
> Attachments: NUTCH-1629--2.x.svn.patch, NUTCH-1629--trunk.svn.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> right now, if there is an empty line in a seed file, TableUtil.reversUrl 
> would throw an exception that would kill the inject job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1629) there is no need to fail on empty lines in seed file when injecting.

2013-08-23 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche resolved NUTCH-1629.
--

Resolution: Fixed

Trunk : Committed revision 1516746.
2.x : Committed revision 1516752.

Thanks Kaveh

> there is no need to fail on empty lines in seed file when injecting.
> 
>
> Key: NUTCH-1629
> URL: https://issues.apache.org/jira/browse/NUTCH-1629
> Project: Nutch
>  Issue Type: Improvement
>  Components: injector
>Affects Versions: 1.7, 2.2.1
> Environment: Java 1.7.0_25
>Reporter: kaveh minooie
>  Labels: easyfix
> Fix For: 2.3, 1.8
>
> Attachments: NUTCH-1629--2.x.svn.patch, NUTCH-1629--trunk.svn.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> right now, if there is an empty line in a seed file, TableUtil.reversUrl 
> would throw an exception that would kill the inject job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1629) there is no need to fail on empty lines in seed file when injecting.

2013-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748401#comment-13748401
 ] 

Hudson commented on NUTCH-1629:
---

SUCCESS: Integrated in Nutch-trunk #2328 (See 
[https://builds.apache.org/job/Nutch-trunk/2328/])
NUTCH-1629 Injector skips empty lines (jnioche: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1516746)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java


> there is no need to fail on empty lines in seed file when injecting.
> 
>
> Key: NUTCH-1629
> URL: https://issues.apache.org/jira/browse/NUTCH-1629
> Project: Nutch
>  Issue Type: Improvement
>  Components: injector
>Affects Versions: 1.7, 2.2.1
> Environment: Java 1.7.0_25
>Reporter: kaveh minooie
>  Labels: easyfix
> Fix For: 2.3, 1.8
>
> Attachments: NUTCH-1629--2.x.svn.patch, NUTCH-1629--trunk.svn.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> right now, if there is an empty line in a seed file, TableUtil.reversUrl 
> would throw an exception that would kill the inject job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira