[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Fix Version/s: 1.12
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Component/s: parser
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148590#comment-15148590
]
Markus Jelsma commented on NUTCH-1233:
--
Awesome! Everything works as expected s
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2210.
--
Resolution: Fixed
Committed to trunk in revision 1730686.
> Upgrade to Tika 1
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148572#comment-15148572
]
Markus Jelsma commented on NUTCH-2210:
--
Test passes, will commit shortly.
>
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2210:
-
Attachment: NUTCH-2210.patch
Patch for trunk.
> Upgrade to Tika 1
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148489#comment-15148489
]
Markus Jelsma commented on NUTCH-2197:
--
Hello Arun - no, this is not applie
[
https://issues.apache.org/jira/browse/NUTCH-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147735#comment-15147735
]
Markus Jelsma commented on NUTCH-2210:
--
Apache Tika 1.12 is available. Will upg
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2216-NUTCH-2220-NUTCH-2221.patch
Patch for trunk. This includes all three
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Attachment: NUTCH-2216.patch
Patch for trunk, introducing db.ignore.treat.redirects.as.links
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2216:
-
Summary: db.ignore.*.links to optionally follow internal redirects (was:
ignore.internal.links
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Attachment: NUTCH-2221.patch
Patch for trunk. This includes the modified config of NUTCH-2220
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Patch Info: Patch Available
> Rename db.* options used only by the linkdb to lin
[
https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2220:
-
Attachment: NUTCH-2220.patch
Patch for trunk
> Rename db.* options used only by the linkdb
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Description:
FetcherThread has support for db.ignore.external.links. In config you can find
[
https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2221:
-
Summary: Introduce db.ignore.internal.links to FetcherThread (was:
Introduce
Markus Jelsma created NUTCH-2221:
Summary: Introduce db.ignore.external.links to FetcherThread
Key: NUTCH-2221
URL: https://issues.apache.org/jira/browse/NUTCH-2221
Project: Nutch
Issue Type
Markus Jelsma created NUTCH-2220:
Summary: Rename db.* options used only by the linkdb to linkdb.*
Key: NUTCH-2220
URL: https://issues.apache.org/jira/browse/NUTCH-2220
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2189.
--
Resolution: Fixed
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2189:
-
Fix Version/s: 1.12
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2189:
-
Affects Version/s: 1.11
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reopened NUTCH-2189:
--
Fix version missing
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2189.
> Domain filter must deactivate if no rules are pres
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144518#comment-15144518
]
Markus Jelsma commented on NUTCH-2216:
--
An option is to change the default
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144497#comment-15144497
]
Markus Jelsma commented on NUTCH-2216:
--
Additionally, it probably should no
[
https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144463#comment-15144463
]
Markus Jelsma commented on NUTCH-2216:
--
Apparently db.ignore.internal.links is
Markus Jelsma created NUTCH-2216:
Summary: ignore.internal.links to optionally follow internal
redirects
Key: NUTCH-2216
URL: https://issues.apache.org/jira/browse/NUTCH-2216
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Attachment: NUTCH-2215.patch
Tiny error in nutch-default description.
> Generator to restr
[
https://issues.apache.org/jira/browse/NUTCH-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2215:
-
Attachment: NUTCH-2215.patch
Patch for trunk. Unit test passes!
> Generator to restrict crawl
Markus Jelsma created NUTCH-2215:
Summary: Generator to restrict crawl to mime type
Key: NUTCH-2215
URL: https://issues.apache.org/jira/browse/NUTCH-2215
Project: Nutch
Issue Type
Markus Jelsma created NUTCH-2214:
Summary: Index clean to be flexible on what it deletes
Key: NUTCH-2214
URL: https://issues.apache.org/jira/browse/NUTCH-2214
Project: Nutch
Issue Type
Markus Jelsma created NUTCH-2212:
Summary: Decrease memory consumption by tuning stack size
Key: NUTCH-2212
URL: https://issues.apache.org/jira/browse/NUTCH-2212
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-2211.
> Filter and normalizer checkers missing in bin/nu
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2211.
--
Resolution: Fixed
Committed to trunk in revision 1728339.
> Filter and normalizer check
[
https://issues.apache.org/jira/browse/NUTCH-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2211:
-
Attachment: NUTCH-2211.patch
Patch for trunk.
> Filter and normalizer checkers missing in
Markus Jelsma created NUTCH-2211:
Summary: Filter and normalizer checkers missing in bin/nutch
Key: NUTCH-2211
URL: https://issues.apache.org/jira/browse/NUTCH-2211
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Fix Version/s: 1.12
> Add solr5 solrcloud indexer supp
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Affects Version/s: (was: 1.12)
1.11
> Add solr5 solrcloud inde
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2197.
--
Resolution: Fixed
Committed to trunk in revision 1728313. Thanks Jurian Broertjes!
>
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Attachment: NUTCH-2197.patch
Previous patch was missing a proper version in plugin.xml. Will
Markus Jelsma created NUTCH-2210:
Summary: Upgrade to Tika 1.12
Key: NUTCH-2210
URL: https://issues.apache.org/jira/browse/NUTCH-2210
Project: Nutch
Issue Type: Task
Reporter
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2197:
-
Attachment: NUTCH-2197.patch
Here's the updated patch with Solr 5.4.1
> Add solr5 s
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128366#comment-15128366
]
Markus Jelsma commented on NUTCH-2197:
--
I am going to commit this soon un
[
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1465:
-
Fix Version/s: 1.13
> Support sitemaps in Nu
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117024#comment-15117024
]
Markus Jelsma commented on NUTCH-961:
-
Yes! :)
> Expose Tika's boiler
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116975#comment-15116975
]
Markus Jelsma commented on NUTCH-961:
-
With boilerpipe, you get only a very
[
https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114991#comment-15114991
]
Markus Jelsma commented on NUTCH-2205:
--
This looks like your cluster was down, n
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114989#comment-15114989
]
Markus Jelsma commented on NUTCH-961:
-
That is probably due to the patch parsing t
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111292#comment-15111292
]
Markus Jelsma commented on NUTCH-961:
-
Some news, the upstream Tika issue has
[
https://issues.apache.org/jira/browse/NUTCH-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110947#comment-15110947
]
Markus Jelsma commented on NUTCH-2202:
--
Yes, a patch would be a good place to s
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110797#comment-15110797
]
Markus Jelsma commented on NUTCH-2197:
--
This Solr 5 plugin is capable of indexin
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Patch Info: Patch Available
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2201.
--
Resolution: Fixed
Committed to trunk revision 1725981. Thanks Dennis!
> Remove loops prog
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110729#comment-15110729
]
Markus Jelsma commented on NUTCH-1325:
--
Yes, they are very useful for fin
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Attachment: NUTCH-2201.patch
Patch for trunk which removed the loops program and all references
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1325.
--
Resolution: Fixed
Committed to trunk in revision 1725952. Many thanks to all contributors
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Component/s: hostdb
> HostDB for Nutch
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Fix Version/s: 1.12
> HostDB for Nutch
>
>
> Key
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
Updated patch for trunk contains more thorough config descriptions
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Patch Info: Patch Available
Description:
h1. HostDB for Apache Nutch 1.x
* automatically
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
TDigest is awesome! Here's with support for user configurable
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
Updated patch to use TDigest for streaming percentiles. But because
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110375#comment-15110375
]
Markus Jelsma commented on NUTCH-1233:
--
Yes, we'll get this support with
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110373#comment-15110373
]
Markus Jelsma commented on NUTCH-961:
-
Hello - that doesn't seem related to t
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Fix Version/s: 1.12
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Affects Version/s: 1.11
> Remove loops program from webgraph pack
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1325:
-
Attachment: NUTCH-1325.patch
Updated patch for trunk, i think it's fairly complete now, incl
[
https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-1325:
Assignee: Markus Jelsma
> HostDB for Nutch
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2203:
Assignee: Markus Jelsma
> Suffix URL filter can't handle trailing/leading whi
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2203.
--
Resolution: Fixed
Committed to trunk in revision 1725538. Thanks Jurian Broertjes.
> Suf
[
https://issues.apache.org/jira/browse/NUTCH-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2203:
-
Fix Version/s: 1.12
> Suffix URL filter can't handle trailing/leading whi
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106783#comment-15106783
]
Markus Jelsma commented on NUTCH-961:
-
Update, i've updated NUTCH-1233 fo
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106633#comment-15106633
]
Markus Jelsma edited comment on NUTCH-1233 at 1/19/16 11:5
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106633#comment-15106633
]
Markus Jelsma commented on NUTCH-1233:
--
It seems Tika's link extraction
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: pre-1233-2.txt
post-1233-2.txt
Here's another set to compare
&
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: NUTCH-1233.patch
Updated patch. Patch now contains the old link extraction commented
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: pre-1233.txt
post-1233.txt
Two lists of extracted URL's, befor
[
https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1233:
-
Attachment: NUTCH-1233.patch
Updated patch for trunk
> Rely on Tika for outlink extract
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106570#comment-15106570
]
Markus Jelsma commented on NUTCH-961:
-
Yes but it requires NUTCH-1233.
>
[
https://issues.apache.org/jira/browse/NUTCH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1107.
Resolution: Won't Fix
> Log slow parse entries
> --
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-1149.
Resolution: Won't Fix
Will upload proper patch for NUTCH-1325 soon which already contains nu
[
https://issues.apache.org/jira/browse/NUTCH-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1838.
--
Resolution: Fixed
> Host and domain based regex and automaton filter
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2201:
Assignee: Markus Jelsma
> Remove loops program from webgraph pack
communication problem
I am using solr 5.4 and nutch 1.11
On Tue, Jan 19, 2016 at 1:46 AM, Markus Jelsma mailto:markus.jel...@openindex.io>> wrote:
Hi - it was an answer to your question whether i have ever used it. Yes, i
patched and committed it. And therefore i asked if youre using Solr 5 or no
[
https://issues.apache.org/jira/browse/NUTCH-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma reassigned NUTCH-2197:
Assignee: Markus Jelsma
> Add solr5 solrcloud indexer supp
[
https://issues.apache.org/jira/browse/NUTCH-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2201:
-
Summary: Remove loops program from webgraph package (was: Remove loops
program from webgrapg
Markus Jelsma created NUTCH-2201:
Summary: Remove loops program from webgrapg package
Key: NUTCH-2201
URL: https://issues.apache.org/jira/browse/NUTCH-2201
Project: Nutch
Issue Type: Task
: dev@nutch.apache.org
Subject: Re: Nutch/Solr communication problem
Mind to share that patch ?
On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma mailto:markus.jel...@openindex.io>> wrote:
Yes i have used it, i made the damn patch myself years ago, and i used the same
configuration. Command line or conf
.
thanks
On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma mailto:markus.jel...@openindex.io>> wrote:
Hi - This doesnt look like a HTTP basic authentication problem. Are you running
Solr 5.x?
Markus
-Original message-
From: Zara Parstmailto:edotserv...@gmail.com>>
Sent:
xingJob.run(IndexingJob.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma mailto:markus.jel...@openindex.io>> wrote:
Hi - can you post the lo
Hi - can you post the log output?
Markus
-Original message-
From: Zara Parst
Sent: Monday 18th January 2016 2:06
To: dev@nutch.apache.org
Subject: Nutch/Solr communication problem
Hi everyone,
I have situation here, I am using nutch 1.11 and solr 5.4
Solr is protected by user name and
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2194.
--
Resolution: Fixed
Committed to trunk in revision 1724771.
> Run IndexingFilterChecker
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Attachment: NUTCH-2194.patch
Updated patch. Signature is now also added to CrawlDatum, in case an
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Patch Info: Patch Available
> Run IndexingFilterChecker as simple Telnet ser
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096263#comment-15096263
]
Markus Jelsma commented on NUTCH-2194:
--
Please check it out :)
&
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Description:
We have used a customized IndexingFilterChecker running as server to be able to
[
https://issues.apache.org/jira/browse/NUTCH-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2194:
-
Attachment: NUTCH-2194.patch
Patch for trunk. With default settings this server needs just about
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096156#comment-15096156
]
Markus Jelsma commented on NUTCH-2196:
--
Committed to trunk in revision 172
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-2196.
--
Resolution: Fixed
> IndexingFilterChecker to optionally normal
[
https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-2196:
-
Attachment: NUTCH-2196.patch
Patch for trunk introducing the -normalize flag. If enabled, input
801 - 900 of 3815 matches
Mail list logo