How could I make more metadata indexed in Solr?

2014-10-25 Thread Mengying Wang
Hi everyone, When I use the ./nutch parsechecker command to a pdf file, I see a number of metadata, e.g., ETag="cbf961-5aafc-41e4319014b80" meta:creation-date=2004-11-10T21:34:35Z dcterms:modified=2004-11-10T21:34:35Z meta:save-date=2004-11-10T21:34:35Z xmpTPg:NPages=10, etc. However, when I run t

Can't crawl filesystem with protocol-file plugin - java.lang.NullPointerException

2014-10-25 Thread Mengying Wang
Hi Sebastian, I have downloaded the Nutch source code from github ( https://github.com/apache/nutch), applied the patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch. Now the good news is that all urls contain only 1 slash. But unfortunately, java.lang.NullPointerException warnin

[jira] [Updated] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value

2014-10-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1883: --- Attachment: NUTCH-1883-2x-v1.patch > bin/crawl: use function to run bin/nutch and check exit v

[jira] [Issue Comment Deleted] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2014-10-25 Thread Mengying Wang (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mengying Wang updated NUTCH-1483: - Comment: was deleted (was: Dear Sebastian, Sorry for the previous long comment. I have edited it

[jira] [Commented] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value

2014-10-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184128#comment-14184128 ] Chris A. Mattmann commented on NUTCH-1883: -- looking great Seb, +1! > bin/crawl:

[jira] [Updated] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value

2014-10-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1883: --- Attachment: NUTCH-1883-trunk-v1.patch > bin/crawl: use function to run bin/nutch and check exi

[jira] [Created] (NUTCH-1883) bin/crawl: use function to run bin/nutch and check exit value

2014-10-25 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1883: -- Summary: bin/crawl: use function to run bin/nutch and check exit value Key: NUTCH-1883 URL: https://issues.apache.org/jira/browse/NUTCH-1883 Project: Nutch

[Nutch Wiki] Trivial Update of "Becoming_A_Nutch_Developer" by SebastianNagel

2014-10-25 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Becoming_A_Nutch_Developer" page has been changed by SebastianNagel: https://wiki.apache.org/nutch/Becoming_A_Nutch_Developer?action=diff&rev1=14&rev2=15 Comment: link to HowToContr

[Nutch Wiki] Update of "HowToContribute" by SebastianNagel

2014-10-25 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "HowToContribute" page has been changed by SebastianNagel: https://wiki.apache.org/nutch/HowToContribute?action=diff&rev1=12&rev2=13 Comment: add section about testing/reviewing/appl

[jira] [Commented] (NUTCH-1483) Can't crawl filesystem with protocol-file plugin

2014-10-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184038#comment-14184038 ] Sebastian Nagel commented on NUTCH-1483: Not everything is ok: the url appears in