Apologies for the delayed response Terry.  

Mahout's presently at Lucene 4.6.1 (both 0.9 and trunk).  The practice so far 
has been to upgrade to the latest Lucene version right before a planned 
release. 

Not sure what has changed in Solr/Lucene 4.7.1.

You could try either of 2 things:-
a) Is your index spread across multiple shards?
b) Upgrade Mahout locally to Lucene 4.7.1 and run ur tests again and see if 
that works.
c) It could possibly be a bug in lucene2seq and we may not have adequate test 
coverage, could u create a unit test to reproduce this scenario?

Would it possible for u to share a sample index along with the Solr Schema for 
testing?





On Thursday, April 10, 2014 11:34 PM, Terry Blankers <[email protected]> 
wrote:
 
Hi All, I'm very new to trying to use lucene2seq so I'm not sure if it's 
just user error, but I'm experiencing some unexpected behavior when 
running lucene2seq against my solr index (4.7.1). I've tried using both 
0.9 and the trunk build of mahout. (And BTW, I have been able to 
successfully run Reuters example as a test baseline.)


Here's the command I'm running:

    $MAHOUT_HOME/bin/mahout lucene2seq -i
    /home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
    key_sha1hex -f body -xm sequential -q topics:diabetes -n 500


Excerpts from my solr schema:

<fieldname="content"type="text"stored="false"indexed="true"multiValued="true"/>
<fieldname="body"type="string"stored="true"indexed="false"/>

<!-- Use the indexed/un-stored "content" field for searching 
--><copyField source="body" dest="content" />
<!-- field for the QueryParser to use when an explicit fieldname is 
absent --><defaultSearchField>content</defaultSearchField>



When I use SolrAdmin and specify fl=body the search handler returns the 
'body' field with data as expected. Yet I get the following error when 
running lucene2seq and specify '-f body':

    /IllegalArgumentException: Field 'body' does not exist in the index/



And if I specify '-f content', lucene2seq runs without errors or 
warnings, but seqdumper output shows no values for any key:

    /Key class: class org.apache.hadoop.io.Text Value Class: class
    org.apache.hadoop.io.Text
    Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
    Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
    Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
    Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /


Can anyone give me any suggestions as to how to track down what might be 
happening here?

Many thanks,

Terry

Reply via email to