OK, that is puzzling.

bq: If there were duplicates only one of the duplicates should be
removed and I still should be able to search for the ID and find one
correct?

Correct.

Your bad request error is puzzling, you may be on to something there.
What it looks like is that somehow some of the documents you're
sending to Solr aren't getting
indexed, either being dropped through the network or perhaps have
invalid fields, field formats (i.e. a date in the wrong format,
whatever) or some such. When you complete the run, what are the maxDoc
and numDocs numbers on one of the nodes?

What else do you see in the logs? They're pretty big after that many
adds, but maybe you can grep for ERROR and see something interesting
like stack traces. Or even "org.apache.solr". This latter will give
you some false hits, but at least it's better than paging through a
huge log file....

Personally, in this kind of situation I sometimes use SolrJ to do my
indexing rather than DIH, I find it easier to debug so that's another
possibility. In the worst case with SolrJ, you can send the docs one
at a time....

Best,
Erick

On Fri, Oct 31, 2014 at 7:37 AM, AJ Lemke <aj.le...@securitylabs.com> wrote:
> Hi Erick:
>
> All of the records are coming out of an auto numbered field so the ID's will 
> all be unique.
>
> Here is the the test I ran this morning:
>
> Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. 
> (Duration: 28m)
> Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0, Processed: 903,993 
> (538/s)
> Started: 33 minutes ago
>
> Last Modified:4 minutes ago
> Num Docs:903829
> Max Doc:903829
> Heap Memory Usage:-1
> Deleted Docs:0
> Version:1517
> Segment Count:16
> Optimized: checked
> Current: checked
>
> If there were duplicates only one of the duplicates should be removed and I 
> still should be able to search for the ID and find one correct?
> As it is right now I am missing records that should be in the collection.
>
> I also noticed this:
>
> org.apache.solr.common.SolrException: Bad Request
>
>
>
> request: 
> http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
>         at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> AJ
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, October 30, 2014 7:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Missing Records
>
> First question: Is there any possibility that some of the docs have duplicate 
> IDs (<uniqueKey>s)? If so, then some of the docs will be replaced, which will 
> lower your returns.
> One way to figuring this out is to go to the admin screen and if numDocs < 
> maxDoc, then documents have been replaced.
>
> Also, if numDocs is smaller than 903,993 then you probably have some docs 
> being replaced. One warning, however. Even if docs are deleted, then this 
> could still be the case because when segments are merged the deleted docs are 
> purged.
>
> Best,
> Erick
>
> On Thu, Oct 30, 2014 at 3:12 PM, S.L <simpleliving...@gmail.com> wrote:
>> I am curious , how many shards do you have and whats the replication
>> factor you are using ?
>>
>> On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke <aj.le...@securitylabs.com> wrote:
>>
>>> Hi All,
>>>
>>> We have a SOLR cloud instance that has been humming along nicely for
>>> months.
>>> Last week we started experiencing missing records.
>>>
>>> Admin DIH Example:
>>> Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A
>>> *:* search claims that there are only 903,902 this is the first full
>>> index.
>>> Subsequent full indexes give the following counts for the *:* search
>>> 903,805
>>> 903,665
>>> 826,357
>>>
>>> All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0,
>>> Processed: 903,993 (x/s) every time. ---records per second is
>>> variable
>>>
>>>
>>> I found an item that should be in the index but is not found in a search.
>>>
>>> Here are the referenced lines of the log file.
>>>
>>> DEBUG - 2014-10-30 15:10:51.160;
>>> org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE
>>> add{,id=750041421}
>>> {{params(debug=false&optimize=true&indent=true&commit=true&clean=true
>>> &wt=json&command=full-import&entity=ads&verbose=false),defaults(confi
>>> g=data-config.xml)}}
>>> DEBUG - 2014-10-30 15:10:51.160;
>>> org.apache.solr.update.SolrCmdDistributor; sending update to
>>> http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0
>>> add{,id=750041421}
>>> params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.5
>>> 7%3A8983%2Fsolr%2Finventory_shard1_replica1%2F
>>>
>>> --- there are 746 lines of log between entries ---
>>>
>>> DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire;  >>
>>> "[0x2][0xc3][0xe0]&params[0xa2][0xe0].update.distrib(TOLEADER[0xe0],d
>>> istrib.from?[0x17]
>>> http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delBy
>>> Q[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Zi
>>> p%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lower
>>> 'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit
>>> Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.4
>>> 8929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2
>>> DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*P
>>> hotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine
>>> [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux
>>> City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162"
>>> Long Track
>>> [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0xe0
>>> ]1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!2[
>>> 0xe0]+Description?VThis Bad boy will pull you through the deepest
>>> snow!With the 162" track and 1000cc of power you can fly up any
>>> hill!![0xe0],DealerRadius+8046.720000[0xe0],Transmission
>>> [0xe0]*ModelFacet7Ski-Doo|Summit
>>> Highmark[0xe0]/DealerNameFacet9Certified
>>> Auto,
>>> Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](Dealer
>>> ID$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified
>>> Auto,
>>> Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-ExteriorCo
>>> lor+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000
>>> SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].
>>> FuelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Cert
>>> ified Auto, Inc.|Sioux
>>> City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit
>>> Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&00010
>>> 5[0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit
>>> highmark[\n]"
>>> What could be the issue and how does one fix this issue?
>>>
>>> Thanks so much and if more information is needed I have preserved the
>>> log files.
>>>
>>> AJ
>>>

Reply via email to