Where is the "title-clustering" field defined? On Aug 13, 2011, at 1:59 PM, [email protected] wrote:
> I have solr version 1.4.1 and mahout version 0.4 > I have created index from the xml files given in exampledocs directory. It is > working fine. Queries are also working on those indexes. But when I'm trying > to create mahout vectors from the solr index it is giving message wrote: 0 > vectors > I've set termVectors="true" in the schema.xml file for all the fields. > Is there any configuration settings I'm missing? > write now I'm posting only two files solr.xml & payload.xml from > example/solr/conf directory for trial. > > portion of my schema.xml looks like this: > <field name="id" type="string" indexed="true" stored="true" required="true" > termVectors="true" /> > > <field name="sku" type="textTight" indexed="true" stored="true" > omitNorms="true" termVectors="true"/> > > <field name="name" type="textgen" indexed="true" stored="true" > termVectors="true"/> > > <field name="alphaNameSort" type="alphaOnlySort" indexed="true" > stored="false" termVectors="true"/> > > <field name="manu" type="textgen" indexed="true" stored="true" > omitNorms="true" termVectors="true"/> > > <field name="cat" type="text_ws" indexed="true" stored="true" > multiValued="true" omitNorms="true" termVectors="true" /> > > <field name="features" type="text" indexed="true" stored="true" > multiValued="true"/> > > <field name="includes" type="text" indexed="true" stored="true" > termVectors="true" termPositions="true" termOffsets="true" /> > > > > <field name="weight" type="float" indexed="true" stored="true" > termVectors="true"/> > > <field name="price" type="float" indexed="true" stored="true" > termVectors="true"/> > > <field name="popularity" type="int" indexed="true" stored="true" > termVectors="true"/> > > <field name="inStock" type="boolean" indexed="true" stored="true" > termVectors="true"/> > > > > > > <!-- Common metadata fields, named specifically to match up with > > SolrCell metadata when parsing rich documents such as Word, PDF. > > Some fields are multiValued only because Tika currently may return > > multiple values for them. > > --> > > <field name="title" type="text" indexed="true" stored="true" > multiValued="true" termVectors="true"/> > > <field name="subject" type="text" indexed="true" stored="true" > termVectors="true"/> > > <field name="description" type="text" indexed="true" stored="true" > termVectors="true"/> > > <field name="comments" type="text" indexed="true" stored="true" > termVectors="true"/> > > <field name="author" type="textgen" indexed="true" stored="true" > termVectors="true"/> > > <field name="keywords" type="textgen" indexed="true" stored="true" > termVectors="true"/> > > <field name="category" type="textgen" indexed="true" stored="true" > termVectors="true"/> > > <field name="content_type" type="string" indexed="true" stored="true" > multiValued="true" termVectors="true"/> > > <field name="last_modified" type="date" indexed="true" stored="true" > termVectors="true"/> > > <field name="links" type="string" indexed="true" stored="true" > multiValued="true" termVectors="true"/> > > > > > > <!-- catchall field, containing all other searchable text fields > (implemented > > via copyField further on in this schema --> > > <field name="text" type="text" indexed="true" stored="false" > multiValued="true"/> > > > > <!-- catchall text field that indexes tokens both normally and in reverse > for efficient > > leading wildcard queries. --> > > <field name="text_rev" type="text_rev" indexed="true" stored="false" > multiValued="true"/> > > > > <!-- non-tokenized version of manufacturer to make it easier to sort or > group > > results by manufacturer. copied from "manu" via copyField --> > > <field name="manu_exact" type="string" indexed="true" stored="false"/> > > > > <field name="payloads" type="payloads" indexed="true" stored="true"/> > > > > <!-- Uncommenting the following will create a "timestamp" field using > > a default value of "NOW" to indicate when each document was indexed. > > --> > > <!-- > > <field name="timestamp" type="date" indexed="true" stored="true" > default="NOW" multiValued="false"/> > > --> > > > > > > <!-- Dynamic field definitions. If a field name is not found, dynamicFields > > will be used if the name matches any of the patterns. > > RESTRICTION: the glob-like pattern in the name attribute must have > > a "*" only at the start or the end. > > EXAMPLE: name="*_i" will match any field ending in _i (like myid_i, > z_i) > > Longer patterns will be matched first. if equal size patterns > > both match, the first appearing in the schema will be used. --> > > <dynamicField name="*_i" type="int" indexed="true" stored="true"/> > > <dynamicField name="*_s" type="string" indexed="true" stored="true"/> > > <dynamicField name="*_l" type="long" indexed="true" stored="true"/> > > <dynamicField name="*_t" type="text" indexed="true" stored="true"/> > > <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/> > > <dynamicField name="*_f" type="float" indexed="true" stored="true"/> > > <dynamicField name="*_d" type="double" indexed="true" stored="true"/> > > <dynamicField name="*_dt" type="date" indexed="true" stored="true" > termVectors="true"/> > > This is the output I'm getting: > hadoop@dahlia:/home/sai/project/mahout-distribution-0.4$ bin/mahout > lucene.vector --dir > /home/sai/project/apache-solr-1.4.1/example/solr/data/index/ --output > /home/sai/project/output/part-out.vec --field title-clustering --idField id > --dictOut /home/sai/project/output/dict.out --norm 2 > Running on hadoop, using HADOOP_HOME=/usr/local/hadoop > HADOOP_CONF_DIR=/usr/local/hadoop/conf > 11/08/11 14:55:15 INFO lucene.Driver: Output File: > /home/sai/project/output/part-out.vec > 11/08/11 14:55:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library > 11/08/11 14:55:16 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > 11/08/11 14:55:16 INFO compress.CodecPool: Got brand-new compressor > 11/08/11 14:55:16 INFO lucene.Driver: Wrote: 0 vectors > 11/08/11 14:55:16 INFO lucene.Driver: Dictionary Output file: > /home/sai/project/output/dict.out > 11/08/11 14:55:16 INFO driver.MahoutDriver: Program took 1078 ms > > > sai -------------------------- Grant Ingersoll
