Re: SOLR 1.2 - Duplicate Documents??

2007-12-28 Thread cricdigs

I am having the same issue. . Here is my schema.xml entries:

 field name=id type=string indexed=true stored=true 
multiValued=false required=true/
 uniqueKeyid/uniqueKey

I am using EmbeddedSolr instructions from the current wiki page and setting
the following for my AddUpdateCommand:

  AddUpdateCommand addcmd = new AddUpdateCommand();
  addcmd.allowDups = false;
  addcmd.overwritePending = true;
  addcmd.overwriteCommitted = true;

Thanks!


ryantxu wrote:
 
 
 Schema.xml
  field name=id type=string indexed=true stored=true/
 
 Have you edited schema.xml since building a full index from scratch?  If 
 so, try rebuilding the index.
 
 People often get the behavior you describe if the 'id' is a 'text' field.
 
 ryan
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-1.2---Duplicate-Documents---tp13621332p14531206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR 1.2 - Duplicate Documents??

2007-11-08 Thread Yonik Seeley
On Nov 7, 2007 12:30 PM, realw5 [EMAIL PROTECTED] wrote:
 We did have Tomcat crash once (JVM OutOfMem) durning an indexing process,
 could that be a possible source of the issue?

Yes.
Deletes are buffered and carried out in a different phase.

-Yonik


Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread Ryan McKinley


Schema.xml
 field name=id type=string indexed=true stored=true/


Have you edited schema.xml since building a full index from scratch?  If 
so, try rebuilding the index.


People often get the behavior you describe if the 'id' is a 'text' field.

ryan



Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread Chris Hostetter
: Hey all, I have a fairly odd case of duplicate documents in our solr index
: (See attached xml sample). THe index is roughtly 35k in documents. The only

How did you index those documents?  

Any chance you inadvertently set the allowDups=true attribute when 
sending them to Solr (possibly becuase of an option whose meaning you 
didn't fully understand in solrj or solr-ruby etc...)

?




-Hoss



Re: SOLR 1.2 - Duplicate Documents??

2007-11-07 Thread realw5

I haven't made any changes to the schema since the intial full-index. Do you
know if there is a way to rebuild the full index in the background, without
having to take down the current live index?

Dan



ryantxu wrote:
 
 
 Schema.xml
  field name=id type=string indexed=true stored=true/
 
 Have you edited schema.xml since building a full index from scratch?  If 
 so, try rebuilding the index.
 
 People often get the behavior you describe if the 'id' is a 'text' field.
 
 ryan
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SOLR-1.2---Duplicate-Documents---tf4762687.html#a13629639
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR 1.2 - Duplicate Documents??

2007-11-06 Thread realw5

Hey all, I have a fairly odd case of duplicate documents in our solr index
(See attached xml sample). THe index is roughtly 35k in documents. The only
way I've found to fix the problem is to run a delete statement by id, which
deletes both, I can then re-index that one document. This happened
previosuly but it ended up being an issue with case-sensitivity but this
time the id's appear identical! 

Any assistance in tracking this down would be appeciated! I can provide any
other logs if nesseary.

Thanks,

Dan

Sample Select Query:
  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
  /lst
- result name=response numFound=2 start=0
- doc
- arr name=categoryId
  int151/int 
  int962/int 
  int1493/int 
  int1830/int 
  /arr
- arr name=finish
  strN/A/str 
  /arr
  bool name=hasDigiCastfalse/bool 
  bool name=hasDigiVistafalse/bool 
  str name=idhr-802waclighting/str 
- arr name=inStock
  boolfalse/bool 
  /arr
  bool name=isNewfalse/bool 
  bool name=isTopSellertrue/bool 
  str name=manufacturerwac lighting/str 
- arr name=masterFinish
  strnot applicable/str 
  /arr
  date name=modifiedDate2007-10-15T23:10:01.510Z/date 
  bool name=onSalefalse/bool 
  int name=popularity1683/int 
- arr name=price
  float53.91/float 
  /arr
  date name=productAddDate2007-07-05T00:00:00Z/date 
  str name=productIDHR-802/str 
  str name=productTitleLow Voltage Miniature Housing for Recessed
Lighting Fixture/str 
  str name=serieslow voltage miniature housings/str 
- arr name=sku
  str / 
  /arr
  str name=theme / 
- arr name=upc
  str / 
  /arr
  /doc
- doc
- arr name=categoryId
  int151/int 
  int962/int 
  int1493/int 
  int1830/int 
  /arr
- arr name=finish
  strN/A/str 
  /arr
  bool name=hasDigiCastfalse/bool 
  bool name=hasDigiVistafalse/bool 
  str name=idhr-802waclighting/str 
- arr name=inStock
  boolfalse/bool 
  /arr
  bool name=isNewfalse/bool 
  bool name=isTopSellertrue/bool 
  str name=manufacturerwac lighting/str 
- arr name=masterFinish
  strnot applicable/str 
  /arr
  date name=modifiedDate2007-11-02T15:33:21.154Z/date 
  bool name=onSalefalse/bool 
  int name=popularity1683/int 
- arr name=price
  float53.91/float 
  /arr
  date name=productAddDate2007-07-05T00:00:00Z/date 
  str name=productIDHR-802/str 
  str name=productTitleLow Voltage Miniature Housing for Recessed
Lighting Fixture/str 
  str name=serieslow voltage miniature housings/str 
- arr name=sku
  str / 
  /arr
  str name=theme / 
- arr name=upc
  str / 
  /arr
  /doc
  /result
  /response

Schema.xml
 field name=id type=string indexed=true stored=true/
   field name=sku type=textTight indexed=true stored=true
multiValued=true/
   field name=upc type=textTight indexed=true stored=true
multiValued=true/
.
!-- field to use to determine and enforce document uniqueness. --
 uniqueKeyid/uniqueKey

 !-- field for the QueryParser to use when an explicit fieldname is absent
--
 defaultSearchFieldtext/defaultSearchField

 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/

-- 
View this message in context: 
http://www.nabble.com/SOLR-1.2---Duplicate-Documents---tf4762687.html#a13621332
Sent from the Solr - User mailing list archive at Nabble.com.