[jira] [Created] (NUTCH-1484) TableUtil unreverseURL fails on file:// URLs

2012-11-01 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1484: -- Summary: TableUtil unreverseURL fails on file:// URLs Key: NUTCH-1484 URL: https://issues.apache.org/jira/browse/NUTCH-1484 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2012-11-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488558#comment-13488558 ] Sebastian Nagel commented on NUTCH-1483: Thanks! Issue with un-reversing URLs

[jira] [Comment Edited] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2012-11-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488558#comment-13488558 ] Sebastian Nagel edited comment on NUTCH-1483 at 11/1/12 8:55 AM:

[jira] [Created] (NUTCH-1485) TableUtil reverseURL to keep userinfo part

2012-11-01 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1485: -- Summary: TableUtil reverseURL to keep userinfo part Key: NUTCH-1485 URL: https://issues.apache.org/jira/browse/NUTCH-1485 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1461) Problem with TableUtil

2012-11-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488585#comment-13488585 ] Sebastian Nagel commented on NUTCH-1461: Cf. NUTCH-1484: same error with file://

[Nutch Wiki] Update of NutchTutorial by SebastianNagel

2012-11-01 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The NutchTutorial page has been changed by SebastianNagel: http://wiki.apache.org/nutch/NutchTutorial?action=diffrev1=58rev2=59 Comment: (because of recent request on the user mailing

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488721#comment-13488721 ] Lewis John McGibbney commented on NUTCH-1480: - Hi Markus. Can I run multiple

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488728#comment-13488728 ] Julien Nioche commented on NUTCH-1480: -- Hi Lewis bq. Can I run multiple Solr servers

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488730#comment-13488730 ] Markus Jelsma commented on NUTCH-1480: -- Hi Lewis, Solr does not have a notion of

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488732#comment-13488732 ] Markus Jelsma commented on NUTCH-1480: -- Julien, yes. The Nutch tools are modified to

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488738#comment-13488738 ] Julien Nioche commented on NUTCH-1480: -- OK thanks. What about having a mechanism for

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488744#comment-13488744 ] Markus Jelsma commented on NUTCH-1480: -- I think you mean the justed linked issue

[jira] [Created] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1486: --- Summary: schema-solr4.xml does not work with Solr 4.0 Key: NUTCH-1486 URL: https://issues.apache.org/jira/browse/NUTCH-1486 Project: Nutch

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488762#comment-13488762 ] Markus Jelsma commented on NUTCH-1486: -- Ah yes. The version field must be available

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488764#comment-13488764 ] Lewis John McGibbney commented on NUTCH-1486: - Neither does Nutch schema.xml,

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488766#comment-13488766 ] Lewis John McGibbney commented on NUTCH-1486: - What do you guys want to do

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488768#comment-13488768 ] Markus Jelsma commented on NUTCH-1486: -- Hmm, perhaps it was moved. Try

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488776#comment-13488776 ] Lewis John McGibbney commented on NUTCH-1486: - Neither

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488777#comment-13488777 ] Markus Jelsma commented on NUTCH-1486: -- Then you got something built the wrong way,

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488783#comment-13488783 ] Lewis John McGibbney commented on NUTCH-1486: - This is the example schema [0]

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488786#comment-13488786 ] Julien Nioche commented on NUTCH-1480: -- nope. I meant implementing the distribution

[jira] [Created] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread kiran (JIRA)
kiran created NUTCH-1487: Summary: Nutch parse fails first time for PDF files and works on reparse Key: NUTCH-1487 URL: https://issues.apache.org/jira/browse/NUTCH-1487 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1487: - Component/s: storage parser Nutch parse fails first time for PDF files and

[jira] [Updated] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1487: - Labels: mysql (was: ) Nutch parse fails first time for PDF files and works on reparse

[jira] [Updated] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1486: Attachment: NUTCH-1486-trunk.patch Patch for trunk. I've never played with Solr

[jira] [Updated] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1486: Attachment: NUTCH-1486-nutchgora.patch Patch for Nutchgora

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488852#comment-13488852 ] Lewis John McGibbney commented on NUTCH-1480: - Mmm. Additionally there is a

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488854#comment-13488854 ] Markus Jelsma commented on NUTCH-1486: -- I'm a bit puzzled about the text field. Why

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488856#comment-13488856 ] Markus Jelsma commented on NUTCH-1480: -- Jul, yes, doing it via SolrJ and Zookeeper is

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488857#comment-13488857 ] Lewis John McGibbney commented on NUTCH-1486: - Same here Markus. It seems that

[jira] [Comment Edited] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488857#comment-13488857 ] Lewis John McGibbney edited comment on NUTCH-1486 at 11/1/12 5:30 PM:

[jira] [Commented] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488864#comment-13488864 ] Markus Jelsma commented on NUTCH-1245: -- Sebastian, very interesting! Can you close

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488869#comment-13488869 ] Markus Jelsma commented on NUTCH-1486: -- Ah, i see, makes sense. You can copyField the

[jira] [Commented] (NUTCH-1486) schema-solr4.xml does not work with Solr 4.0

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488870#comment-13488870 ] Lewis John McGibbney commented on NUTCH-1486: - OK so there patches are NOT OK.

[jira] [Commented] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2012-11-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488935#comment-13488935 ] Sebastian Nagel commented on NUTCH-1245: They are not duplicates but the effects

[jira] [Created] (NUTCH-1488) bin/nutch to run junit from any directory

2012-11-01 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1488: -- Summary: bin/nutch to run junit from any directory Key: NUTCH-1488 URL: https://issues.apache.org/jira/browse/NUTCH-1488 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-1488) bin/nutch to run junit from any directory

2012-11-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1488: --- Attachment: NUTCH-1488.patch bin/nutch to run junit from any directory

[jira] [Commented] (NUTCH-1488) bin/nutch to run junit from any directory

2012-11-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488994#comment-13488994 ] Lewis John McGibbney commented on NUTCH-1488: - Hi Seb. In short, no, there is