[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-03-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923720#comment-13923720 ] Markus Jelsma commented on NUTCH-1113: -- Alright! This fixes the issue! I will commit

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-03-07 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923773#comment-13923773 ] Hudson commented on NUTCH-1113: --- SUCCESS: Integrated in Nutch-trunk #2553 (See

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-03-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917087#comment-13917087 ] Sebastian Nagel commented on NUTCH-1113: Hi [~markus17], junit tests

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915831#comment-13915831 ] Sebastian Nagel commented on NUTCH-1113: Results of tests: The number of documents

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915917#comment-13915917 ] Julien Nioche commented on NUTCH-1113: -- Well done, thanks guys! Merging segments

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915969#comment-13915969 ] Hudson commented on NUTCH-1113: --- FAILURE: Integrated in Nutch-trunk #2545 (See

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-21 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908265#comment-13908265 ] Sebastian Nagel commented on NUTCH-1113: Hi [~markus17], your patch should work

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-21 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908520#comment-13908520 ] Markus Jelsma commented on NUTCH-1113: -- I'll get back to this next monday, i

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-02-20 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907160#comment-13907160 ] Sebastian Nagel commented on NUTCH-1113: Hi [~markus17], tried test data from

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-23 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880026#comment-13880026 ] Markus Jelsma commented on NUTCH-1113: -- I have tried running long sequences with

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880007#comment-13880007 ] Sebastian Nagel commented on NUTCH-1113: Great! I'll try to verify it within the

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878441#comment-13878441 ] Markus Jelsma commented on NUTCH-1113: -- The last chronological index went wrong, some

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876283#comment-13876283 ] Markus Jelsma commented on NUTCH-1113: -- Yes, i reindexed them segment for segment.

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876304#comment-13876304 ] Markus Jelsma commented on NUTCH-1113: -- Ok, i got something! A record that wasn't

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876384#comment-13876384 ] Markus Jelsma commented on NUTCH-1113: -- I got less documents indexed when ignoring

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876426#comment-13876426 ] Markus Jelsma commented on NUTCH-1113: -- I have to reindex my control cluster segment

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876036#comment-13876036 ] Sebastian Nagel commented on NUTCH-1113: If (re)indexing multiple segments also

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873324#comment-13873324 ] Markus Jelsma commented on NUTCH-1113: -- I'm now indexing segment for segment to a

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873464#comment-13873464 ] Markus Jelsma commented on NUTCH-1113: -- SOLR-4260 is blocking every test i do, if

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873533#comment-13873533 ] Markus Jelsma commented on NUTCH-1113: -- NUTCH-1706 causes some stuff not to be

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870690#comment-13870690 ] Markus Jelsma commented on NUTCH-1113: -- This works too, but if we ditch most LINKED

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869399#comment-13869399 ] Markus Jelsma commented on NUTCH-1113: -- Ignoring LINKED completely means around line

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869439#comment-13869439 ] Markus Jelsma commented on NUTCH-1113: -- Sebastian's patch does solve a few problems

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869441#comment-13869441 ] Markus Jelsma commented on NUTCH-1113: -- Another record is also missing {code}

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-10 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867945#comment-13867945 ] Markus Jelsma commented on NUTCH-1113: -- I have two narrowed the problem down to a

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868024#comment-13868024 ] Sebastian Nagel commented on NUTCH-1113: Hi [~markus17], I've run your unit tests

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866635#comment-13866635 ] Markus Jelsma commented on NUTCH-1113: -- Alright, NUTCH-1113 isn't correct as well.

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866752#comment-13866752 ] Markus Jelsma commented on NUTCH-1113: -- There are some issues with the checks, will

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865624#comment-13865624 ] Sebastian Nagel commented on NUTCH-1113: Isn't this fixed with NUTCH-1520?

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865631#comment-13865631 ] Markus Jelsma commented on NUTCH-1113: -- No. We don't really seem to be losing

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865632#comment-13865632 ] Markus Jelsma commented on NUTCH-1113: -- I'll run some more tests tomorrow, at least i

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2012-01-25 Thread Sebastian Nagel (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193115#comment-13193115 ] Sebastian Nagel commented on NUTCH-1113: I had a look at the attached segment

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-16 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105972#comment-13105972 ] Lewis John McGibbney commented on NUTCH-1113: - We have a pretty meaty JUnit

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105661#comment-13105661 ] Markus Jelsma commented on NUTCH-1113: -- Can you rule out the indexer and see what you

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105693#comment-13105693 ] Edward Drapkin commented on NUTCH-1113: --- Using this command: nutch readseg -get

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105704#comment-13105704 ] Edward Drapkin commented on NUTCH-1113: --- I don't have any idea what's causing this

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105715#comment-13105715 ] Markus Jelsma commented on NUTCH-1113: -- Investigation, debug report; same stuff

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105714#comment-13105714 ] Edward Drapkin commented on NUTCH-1113: --- Upon further inspection, it appears that

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-15 Thread Edward Drapkin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105758#comment-13105758 ] Edward Drapkin commented on NUTCH-1113: --- The more I look into this, the more I'm