[
https://issues.apache.org/jira/browse/NUTCH-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132676#comment-17132676
]
Hudson commented on NUTCH-2789:
---
SUCCESS: Integrated in Jenkins build Nutch-trunk #3686 (See
[
https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132637#comment-17132637
]
Hudson commented on NUTCH-2788:
---
SUCCESS: Integrated in Jenkins build Nutch-trunk #3685 (See
[
https://issues.apache.org/jira/browse/NUTCH-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132638#comment-17132638
]
Hudson commented on NUTCH-2787:
---
SUCCESS: Integrated in Jenkins build Nutch-trunk #3685 (See
[
https://issues.apache.org/jira/browse/NUTCH-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132639#comment-17132639
]
Hudson commented on NUTCH-2790:
---
SUCCESS: Integrated in Jenkins build Nutch-trunk #3685 (See
[
https://issues.apache.org/jira/browse/NUTCH-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2791:
---
Affects Version/s: (was: 1.17)
1.16
> domainstats, protocolstats
[
https://issues.apache.org/jira/browse/NUTCH-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2791:
---
Fix Version/s: 1.17
> domainstats, protocolstats and crawlcomplete do not handle GCS URLs
>
[
https://issues.apache.org/jira/browse/NUTCH-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2789.
Resolution: Fixed
> Documentation: update links to point to cwiki
>
[
https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2788.
Resolution: Implemented
Committed/merged for 1.17 - thank for the reviews!
> ParseData:
sebastian-nagel merged pull request #529:
URL: https://github.com/apache/nutch/pull/529
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
[
https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132621#comment-17132621
]
ASF GitHub Bot commented on NUTCH-2788:
---
sebastian-nagel merged pull request #529:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2788:
---
Fix Version/s: (was: 1.18)
1.17
> ParseData: improve presentation of
[
https://issues.apache.org/jira/browse/NUTCH-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2787.
Resolution: Fixed
Fixed/merged. Thanks for the reviews!
> CrawlDb JSON dump does not
[
https://issues.apache.org/jira/browse/NUTCH-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2790.
Resolution: Fixed
Thanks, [~pmezard]!
> CSVIndexWriter does not escape leading quotes
[
https://issues.apache.org/jira/browse/NUTCH-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132603#comment-17132603
]
ASF GitHub Bot commented on NUTCH-2790:
---
sebastian-nagel merged pull request #532:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2790:
---
Fix Version/s: 1.17
> CSVIndexWriter does not escape leading quotes properly
>
[
https://issues.apache.org/jira/browse/NUTCH-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2790:
---
Component/s: plugin
> CSVIndexWriter does not escape leading quotes properly
>
sebastian-nagel merged pull request #532:
URL: https://github.com/apache/nutch/pull/532
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
sebastian-nagel commented on a change in pull request #533:
URL: https://github.com/apache/nutch/pull/533#discussion_r438319215
##
File path: src/java/org/apache/nutch/util/CrawlCompletionStats.java
##
@@ -153,9 +153,7 @@ public int run(String[] args) throws Exception {
[
https://issues.apache.org/jira/browse/NUTCH-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132600#comment-17132600
]
ASF GitHub Bot commented on NUTCH-2791:
---
sebastian-nagel commented on a change in pull request
sebastian-nagel commented on pull request #534:
URL: https://github.com/apache/nutch/pull/534#issuecomment-642132183
> Is it OK to just change the interface and implement what you suggest?
Yes, that's ok. We'll put a notice about a breaking change to the release
notes, so that users
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130921#comment-17130921
]
ASF GitHub Bot commented on NUTCH-2793:
---
sebastian-nagel commented on pull request #534:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130918#comment-17130918
]
ASF GitHub Bot commented on NUTCH-2793:
---
sebastian-nagel commented on a change in pull request
sebastian-nagel commented on a change in pull request #534:
URL: https://github.com/apache/nutch/pull/534#discussion_r438267053
##
File path: src/plugin/indexer-csv/README.md
##
@@ -39,4 +39,4 @@ escapechar | Escape character used to escape a quote
character |
pmezard commented on a change in pull request #534:
URL: https://github.com/apache/nutch/pull/534#discussion_r438258817
##
File path: src/plugin/indexer-csv/README.md
##
@@ -39,4 +39,4 @@ escapechar | Escape character used to escape a quote
character |
maxfieldlength | Max.
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130906#comment-17130906
]
ASF GitHub Bot commented on NUTCH-2793:
---
pmezard commented on a change in pull request #534:
URL:
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130905#comment-17130905
]
ASF GitHub Bot commented on NUTCH-2793:
---
pmezard commented on pull request #534:
URL:
pmezard commented on pull request #534:
URL: https://github.com/apache/nutch/pull/534#issuecomment-642122887
What are the backward compatibility requirements for nutch? Is it OK to just
change the interface and implement what you suggest? Should it be best-effort
to keep things BC? Or is
[
https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130814#comment-17130814
]
Patrick Mézard commented on NUTCH-2792:
---
What solution would you favor then [2], [3], something
[
https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130803#comment-17130803
]
Sebastian Nagel commented on NUTCH-2792:
Agreed, the -params option should be used by all index
[
https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2792:
---
Fix Version/s: 1.18
> nutch index -params is only used in Solr indexer
>
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2793:
---
Fix Version/s: 1.18
> CSV indexer does not work in distributed mode
>
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-2793:
---
Component/s: plugin
> CSV indexer does not work in distributed mode
>
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130783#comment-17130783
]
ASF GitHub Bot commented on NUTCH-2793:
---
sebastian-nagel commented on a change in pull request
sebastian-nagel commented on a change in pull request #534:
URL: https://github.com/apache/nutch/pull/534#discussion_r438197577
##
File path:
src/plugin/indexer-csv/src/java/org/apache/nutch/indexwriter/csv/CSVIndexWriter.java
##
@@ -192,7 +189,7 @@ protected int find(String
[
https://issues.apache.org/jira/browse/NUTCH-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130718#comment-17130718
]
Sebastian Nagel commented on NUTCH-2755:
Hi [~mfeltscher], should be possible but I've never
[
https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130713#comment-17130713
]
ASF GitHub Bot commented on NUTCH-2501:
---
sebastian-nagel commented on a change in pull request
sebastian-nagel commented on a change in pull request #279:
URL: https://github.com/apache/nutch/pull/279#discussion_r438150802
##
File path: src/bin/crawl
##
@@ -171,6 +175,8 @@ fi
CRAWL_PATH="$1"
LIMIT="$2"
+JAVA_CHILD_HEAP_MB=`expr "$NUTCH_HEAP_MB" / "$NUM_TASKS"`
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130607#comment-17130607
]
Patrick Mézard commented on NUTCH-2793:
---
PR sent here https://github.com/apache/nutch/pull/534
>
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Mézard updated NUTCH-2793:
--
Comment: was deleted
(was: PR sent here https://github.com/apache/nutch/pull/534)
> CSV
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130593#comment-17130593
]
ASF GitHub Bot commented on NUTCH-2793:
---
pmezard opened a new pull request #534:
URL:
pmezard opened a new pull request #534:
URL: https://github.com/apache/nutch/pull/534
Before the change, the output file name was hard-coded to "nutch.csv".
When running in distributed mode, multiple reducers would clobber each
other output.
After the change, the filename is
[
https://issues.apache.org/jira/browse/NUTCH-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Mézard updated NUTCH-2793:
--
Description:
Reasons are discussed in
Patrick Mézard created NUTCH-2793:
-
Summary: CSV indexer does not work in distributed mode
Key: NUTCH-2793
URL: https://issues.apache.org/jira/browse/NUTCH-2793
Project: Nutch
Issue Type:
Hello Lewis,
I understand the proposal. As an engineer, however, i have some points i would
like to address:
* The proposed change is not backward compatible, which weighs heavy because it
is also not a technical necessity.
* Our users, myself included, have to make a small or, depending on
[
https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Mézard updated NUTCH-2792:
--
Description:
`nutch index` help displays:
{code:java}
General options:
...
-params
[
https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130421#comment-17130421
]
Patrick Mézard commented on NUTCH-2792:
---
Patrick Mézard created NUTCH-2792:
-
Summary: nutch index -params is only used in Solr indexer
Key: NUTCH-2792
URL: https://issues.apache.org/jira/browse/NUTCH-2792
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130289#comment-17130289
]
ASF GitHub Bot commented on NUTCH-2787:
---
pmezard commented on pull request #531:
URL:
pmezard commented on pull request #531:
URL: https://github.com/apache/nutch/pull/531#issuecomment-641768990
+1, the change fixes my issue.
This is an automated message from the Apache Git Service.
To respond to the message,
49 matches
Mail list logo