[jira] [Created] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Dietrich Schmidt (JIRA)
Problems indexing lastModifiedDate in Solr
--

 Key: NUTCH-985
 URL: https://issues.apache.org/jira/browse/NUTCH-985
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Reporter: Dietrich Schmidt


I am using the index-more plugin to parse the lastModified data in web
pages in order to store it in a Solr data field.

In solrindex-mapping.xml I am mapping lastModified to a field changed in Solr:
field dest=changed source=lastModified/

However, when posting data to Solr the SolrIndexer posts it as a long,
not as a date:
adddoc boost=1.0field
name=changed107932680/fieldfield
name=tstamp20110414144140188/fieldfield
name=date20040315/field

Solr rejects the data because of the improper data type.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021712#comment-13021712
 ] 

Markus Jelsma commented on NUTCH-985:
-

This is similar to another issue described today about the failing dedup. 
Although i believe it would be a good idea to port longs to properly formatted 
dates for 1.3 i do think it'll be quite a task since it's not only reformatting 
before sending it over. Dedup for example relies on dates as long stored in 
Solr for it to work. I'm also unsure whether a simple reformat in the Solr 
indexer is a better idea than changing it in the plugins themselves.

Thoughts?

 Problems indexing lastModifiedDate in Solr
 --

 Key: NUTCH-985
 URL: https://issues.apache.org/jira/browse/NUTCH-985
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Reporter: Dietrich Schmidt
 Attachments: indexlastmodifieddate.jar


 I am using the index-more plugin to parse the lastModified data in web
 pages in order to store it in a Solr data field.
 In solrindex-mapping.xml I am mapping lastModified to a field changed in 
 Solr:
 field dest=changed source=lastModified/
 However, when posting data to Solr the SolrIndexer posts it as a long,
 not as a date:
 adddoc boost=1.0field
 name=changed107932680/fieldfield
 name=tstamp20110414144140188/fieldfield
 name=date20040315/field
 Solr rejects the data because of the improper data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Dietrich Schmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021732#comment-13021732
 ] 

Dietrich Schmidt commented on NUTCH-985:


Ideally org.apache.nutch.indexer.more.MoreIndexingFilter should store the 
lastModifiedDate in date format. Having limited knowledge about the Nutch 
source, I am not sure whether  dependencies exist that would break things by 
doing that, but at this point I can't see what that would be.  

 Problems indexing lastModifiedDate in Solr
 --

 Key: NUTCH-985
 URL: https://issues.apache.org/jira/browse/NUTCH-985
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Reporter: Dietrich Schmidt
 Attachments: indexlastmodifieddate.jar


 I am using the index-more plugin to parse the lastModified data in web
 pages in order to store it in a Solr data field.
 In solrindex-mapping.xml I am mapping lastModified to a field changed in 
 Solr:
 field dest=changed source=lastModified/
 However, when posting data to Solr the SolrIndexer posts it as a long,
 not as a date:
 adddoc boost=1.0field
 name=changed107932680/fieldfield
 name=tstamp20110414144140188/fieldfield
 name=date20040315/field
 Solr rejects the data because of the improper data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021739#comment-13021739
 ] 

Markus Jelsma commented on NUTCH-985:
-

Yes, something has to be done. What did you attach anyway, is that a recompiled 
plugin with your modification? If so, please include sources. Jar's are not 
really useful here ;)

Anyway, thanks for pointing to this issue Dietrich.

 Problems indexing lastModifiedDate in Solr
 --

 Key: NUTCH-985
 URL: https://issues.apache.org/jira/browse/NUTCH-985
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Reporter: Dietrich Schmidt
 Attachments: indexlastmodifieddate.jar


 I am using the index-more plugin to parse the lastModified data in web
 pages in order to store it in a Solr data field.
 In solrindex-mapping.xml I am mapping lastModified to a field changed in 
 Solr:
 field dest=changed source=lastModified/
 However, when posting data to Solr the SolrIndexer posts it as a long,
 not as a date:
 adddoc boost=1.0field
 name=changed107932680/fieldfield
 name=tstamp20110414144140188/fieldfield
 name=date20040315/field
 Solr rejects the data because of the improper data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Nutch-trunk #1462

2011-04-19 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1462/

--
[...truncated 1009 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A src/plugin/parse-html/src/test/org/apache/nutch
A src/plugin/parse-html/src/test/org/apache/nutch/parse
A