Re: Multi Language Suggester Solr Issue

2014-12-28 Thread Michael Sokolov

I noticed that your suggester analyzers include

filter class=solr.PatternReplaceFilterFactory pattern=([^\w\d\*æøåÆØÅ ])
replacement=  replace=all /

which seems like a bad idea -- this will strip all those arabic, russian 
and japanese characters entirely, leaving you with probably only 
whitespace in your tokens.  Try just removing that?


-Mike

On 12/24/14 6:09 PM, alaa.abuzaghleh wrote:

I am trying create suggester handler using solr 4.8, everything work fine but
when I try to get suggestion using different language Arabic, or Japanese
for example I got result in mixed language, but I am trying to search only
using Japanese, I got Arabic with that too. the following is my Schema.xml

?xml version=1.0 encoding=UTF-8 ?
schema name=people_schema version=1.5
 fields
 field name=_version_ type=long indexed=true
stored=true /
 field name=id type=string indexed=true stored=true
 required=true /
 field name=first_name type=txt_general indexed=true
 stored=true multiValued=false /
 field name=last_name type=txt_general indexed=true
 stored=true multiValued=false /
 field name=about type=text_general_edge_ngram
indexed=true
 stored=true multiValued=false /
 field name=year_birth type=tint indexed=true
stored=true
 multiValued=false /
 field name=month_birth type=tint indexed=true
stored=true
 multiValued=false /
 field name=day_birth type=tint indexed=true
stored=true
 multiValued=false /
 field name=country type=string indexed=true
stored=true
 required=false multiValued=false /
 field name=country_tree type=placetree indexed=true
 stored=false multiValued=false /
 field name=state type=string indexed=true
stored=true
 required=false multiValued=false /
 field name=state_tree type=placetree indexed=true
stored=false
 multiValued=false /
 field name=city type=string indexed=true
stored=true
 required=false multiValued=false /
 field name=city_tree type=placetree indexed=true
stored=false
 multiValued=false /
 field name=job type=string indexed=true stored=true
 required=false multiValued=false /
 field name=job_tree type=txt_general indexed=true
stored=true
 multiValued=false /
 field name=company type=string indexed=true
stored=true
 required=false multiValued=false /
 field name=company_tree type=companytree indexed=true
 stored=false multiValued=false /

 field name=full_name type=txt_general indexed=true
 stored=true multiValued=false /
 field name=full_name_suggest type=text_suggest
indexed=true
 stored=true multiValued=false /
 field name=full_name_edge type=text_suggest_edge
indexed=true
 stored=true multiValued=false /
 field name=full_name_ngram type=text_suggest_ngram
indexed=true
 stored=true multiValued=false /
 field name=full_name_sort type=alphaNumericSort
indexed=true
 stored=true multiValued=false /


 field name=job_suggest type=text_suggest indexed=true

 stored=true multiValued=false /
 field name=job_edge type=text_suggest_edge
indexed=true
 stored=true multiValued=false /
 field name=job_ngram type=text_suggest_ngram
indexed=true
 stored=true multiValued=false /
 field name=job_sort type=alphaNumericSort
indexed=true
 stored=true multiValued=false /


 copyField source=full_name dest=full_name_suggest /

 copyField source=full_name dest=full_name_edge /
 copyField source=full_name dest=full_name_ngram /
 copyField source=full_name dest=full_name_sort /

 copyField source=job_tree dest=job_suggest /

 copyField source=job_tree dest=job_edge /
 copyField source=job_tree dest=job_ngram /
 copyField source=job_tree dest=job_sort /

 /fields

 uniqueKeyid/uniqueKey
 types

 fieldType name=string class=solr.StrField

 sortMissingLast=true 

Re: distrib=false

2014-12-28 Thread S.L
Erik

I have attached the screen shot of the toplogy , as you can see I have
three nodes and no two replicas of the same shard reside on the same node,
this was made sure so as not affect the availability.

The query that I use is a general get all query of type *:* to test .

The behavior I notice is that even though when a particular replica of a
shard is queried using distrib=false , the request goes to the other
replica of the same shard.

Thanks.

On Sat, Dec 27, 2014 at 2:10 PM, Erick Erickson erickerick...@gmail.com
wrote:

 How are you sending the request? AFAIK, setting distrib=false
 should should keep the query from being sent to any other node,
 although I'm not quite sure what happens when you host multiple
 replicas of the _same_ shard on the same node.

 So we need:
 1 your topology, How many nodes and what replicas on each?
 2 the actual query you send.

 Best,
 Erick

 On Sat, Dec 27, 2014 at 8:14 AM, S.L simpleliving...@gmail.com wrote:
  Hi All,
 
  I have a question regarding distrib=false on the Solr query , it seems
 that
  the distribution is restricted across only the shards  when the parameter
  is set to false, meaning if I query a particular node with in a shard
 with
  replication factor of more than one  , the request could go to another
 node
  with in the same shard which is a replica of the node that I made the
  initial request to, is my understanding correct ?
 
  If the answer to my question is yes, then how do we make sure that the
  request goes to only the node I intend to make the request to  ?
 
  Thanks.



How to implement multi-set in a Solr schema.

2014-12-28 Thread S.L
Hi All,

I have a use case where I need to group documents that have a same field
called bookName , meaning if there are a multiple documents with the same
bookName value and if the user input is searched by a query on  bookName ,
I need to be able to group all the documents by the same bookName together,
so that I could display them as a group in the UI.

What kind of support does Solr provide for such a scenario , and how should
I look at changing my schema.xml which as bookName as single valued text
field ?

Thanks.


Re: Solr performance issues

2014-12-28 Thread Shawn Heisey
On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.
 
 The response time is about 1 and 3 seconds for simple queries (1 token).
 
 Is the LVM become a bottleneck for our index?

SSD is very fast, but its speed is very slow when compared to RAM.  The
problem here is that Solr must read data off the disk in order to do a
query, and even at SSD speeds, that is slow.  LVM is not the problem
here, though it's possible that it may be a contributing factor.  You
need more RAM.

For Solr to be fast, a large percentage (ideally 100%, but smaller
fractions can often be enough) of the index must be loaded into unused
RAM by the operating system.  Your information seems to indicate that
the index is about 3 terabytes.  If that's the index size, I would guess
that you would need somewhere between 1 and 2 terabytes of total RAM for
speed to be acceptable.  Because RAM is *very* expensive on Amazon and
is not available in sizes like 256GB-1TB, that typically means a lot of
their virtual machines, with a lot of shards in SolrCloud.  You may find
that real hardware is less expensive for very large Solr indexes in the
long term than cloud hardware.

Thanks,
Shawn



How does text-rev work?

2014-12-28 Thread Alexandre Rafalovitch
I am looking at the collection1/techproducts schema and I can't figure
out how the reversed wildcard example is supposed to work.

We define text_general_rev type and text_rev field, but we don't seem
to be populating it at any point. And running the example does not
seem to show any tokens in the field even when the non-inverted text
field does have some.

Apparently, there is some magic in the QueryParser to do something
about this at query time, but I see no explanation of what is supposed
to happen at the index/schema time.

Anybody has the skinny on this one?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


Re: distrib=false

2014-12-28 Thread Shawn Heisey
On 12/28/2014 8:48 AM, S.L wrote:
 I have attached the screen shot of the toplogy , as you can see I have
 three nodes and no two replicas of the same shard reside on the same
 node, this was made sure so as not affect the availability.
 
 The query that I use is a general get all query of type *:* to test .
 
 The behavior I notice is that even though when a particular replica of a
 shard is queried using distrib=false , the request goes to the other
 replica of the same shard.

Attachments almost never make it through the mailing list processing.
The screenshot you mentioned did not make it.

You'll need to host the image somewhere and provide a URL.  The dropbox
service is a good way to do this, but it's not the only way.  Just make
sure you don't remove the image quickly.  The message will live on for
years in the archive ... it would be nice to have the image live on for
years as well, though I know that is often not realistic.

I do not know exactly how SolrCloud handles such requests, but it would
not surprise me to learn that it forwards the request to another replica
of the same shard on another server.

An issue has been put forward to change the general load-balancing
behavior of SolrCloud.  There has been a fair amount of discussion on it:

https://issues.apache.org/jira/browse/SOLR-6832

Thanks,
Shawn



Re: How to implement multi-set in a Solr schema.

2014-12-28 Thread Aman Tandon
HI,

You can use the grouping in the solr. You can does this by via query or via
solrconfig.xml.

*A) via query*

http://localhost:8983?your_query_params*group=truegroup.field=bookName*

You can limit the size of group (how many documents you wants to show),
suppose you want to show 5 documents per group on this bookName field then
you can specify the parameter *group.limit=5.*

*B) via solrconfig*
str name=grouptrue/str str name=group.field*bookName*/str
str name=group.ngroupstrue/str str name=group.truncatetrue/str

With Regards
Aman Tandon

On Sun, Dec 28, 2014 at 10:29 PM, S.L simpleliving...@gmail.com wrote:

 Hi All,

 I have a use case where I need to group documents that have a same field
 called bookName , meaning if there are a multiple documents with the same
 bookName value and if the user input is searched by a query on  bookName ,
 I need to be able to group all the documents by the same bookName together,
 so that I could display them as a group in the UI.

 What kind of support does Solr provide for such a scenario , and how should
 I look at changing my schema.xml which as bookName as single valued text
 field ?

 Thanks.



Re: Multi Language Suggester Solr Issue

2014-12-28 Thread alaa.abuzaghleh
thanks it is work for me 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-Language-Suggester-Solr-Issue-tp4176075p4176324.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to implement multi-set in a Solr schema.

2014-12-28 Thread Meraj A. Khan
Thanks Aman, the thing is the bookName field values are not exactly
identical , but nearly identical , so at the time of indexing I need to
figure out which other book name field this is similar to using NLP
techniques and then put it in the appropriate bag, so that at the retrieval
time I only retrieve all the elements from that bag if any one of the
element matches with the search query.

Thanks.
On Dec 28, 2014 1:54 PM, Aman Tandon amantandon...@gmail.com wrote:

 HI,

 You can use the grouping in the solr. You can does this by via query or via
 solrconfig.xml.

 *A) via query*

 http://localhost:8983?your_query_params*group=truegroup.field=bookName*

 You can limit the size of group (how many documents you wants to show),
 suppose you want to show 5 documents per group on this bookName field then
 you can specify the parameter *group.limit=5.*

 *B) via solrconfig*
 str name=grouptrue/str str name=group.field*bookName*/str
 str name=group.ngroupstrue/str str name=group.truncatetrue/str

 With Regards
 Aman Tandon

 On Sun, Dec 28, 2014 at 10:29 PM, S.L simpleliving...@gmail.com wrote:

  Hi All,
 
  I have a use case where I need to group documents that have a same field
  called bookName , meaning if there are a multiple documents with the same
  bookName value and if the user input is searched by a query on  bookName
 ,
  I need to be able to group all the documents by the same bookName
 together,
  so that I could display them as a group in the UI.
 
  What kind of support does Solr provide for such a scenario , and how
 should
  I look at changing my schema.xml which as bookName as single valued text
  field ?
 
  Thanks.
 



RE: Solr performance issues

2014-12-28 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote:
 We've installed a cluster of one collection of 350M documents on 3
 r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is
 about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS
 General purpose (1x1TB + 1x500GB) on each instance. Then we create logical
 volume using LVM of 1.5TB to fit our index.

Your search speed will be limited by the slowest storage in your group, which 
would be your 500GB EBS. The General Purpose SSD option means (as far as I can 
read at http://aws.amazon.com/ebs/details/#piops) that your baseline of 3 
IOPS/MB = 1500 IOPS, with bursts of 3000 IOPS. Unfortunately they do not say 
anything about latency.

For comparison, I checked the system logs from a local test with our 21TB / 7 
billion documents index. It used ~27,000 IOPS during the test, with mean search 
time a bit below 1 second. That was with ~100GB RAM for disk cache, which is 
about ½% of index size. The test was with simple term queries (1-3 terms) and 
some faceting. Back of the envelope: 27,000 IOPS for 21TB is ~1300 IOPS/TB. 
Your indexes are 1.1TB, so 1.1*1300 IOPS ~= 1400 IOPS.

All else being equal (which is never the case), getting 1-3 second response 
times for a 1.1TB index, when one link in the storage chain is capped at a few 
thousand IOPS, you are using networked storage and you have little RAM for 
caching, does not seem unrealistic. If possible, you could try temporarily 
boosting performance of the EBS, to see if raw IO is the bottleneck.

 The response time is about 1 and 3 seconds for simple queries (1 token).

Is the index updated while you are searching?
Do you do any faceting or other heavy processing as part of a search?
How many hits does a search typically have and how many documents are returned?
How many concurrent searches do you need to support? How fast should the 
response time be?

- Toke Eskildsen


Re: Solr server becomes non-responsive.

2014-12-28 Thread Modassar Ather
Thanks Jack for your suggestions.

Regards,
Modassar

On Fri, Dec 26, 2014 at 6:04 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Either you have too little RAM on each node or too much data on each node.

 You may need to shard the data much more heavily so that the total work on
 a single query is distributed in parallel to more nodes, each node having a
 much smaller amount of data to work on.

 First, always make sure that the entire Lucene index for each node fits
 entirely in the system memory available for file system caching. Otherwise
 the queries will be I/O bound. Check your current queries to see if that is
 the case - are the nodes compute bound or I/O bound? If I/O bound, add more
 system memory until the queries are no longer I/O bound. If compute bound,
 shard more heavily until the query latency becomes acceptable.



 -- Jack Krupansky

 On Fri, Dec 26, 2014 at 1:02 AM, Modassar Ather modather1...@gmail.com
 wrote:

  Thanks for your suggestions Erick.
 
  This may be one of those situations where you really have to
  push back at the users and understand why they insist on these
  kinds of queries. They must be very patient since it won't be
  very performant. That said, I've seen this pattern; there are
  certainly valid conditions under which response times can be
  many seconds if there are few users and they are doing very
  complex/expert-level things.
 
  We have tried educating the users but it did not work because they are
 used
  to the old way. They feel that wildcard gives more control over the
 results
  and may not fully understand stemming.
 
  Regards,
  Modassar
 
  On Thu, Dec 25, 2014 at 3:17 AM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
   There's no magic bullet here that I know of. If your requirements
   are to support these huge, many-wildcard queries then you only
   have a few choices:
  
   1 redo the index. I was surprised at how little it bloated the
   index as far as memory required is concerned to add ngrams.
   The key here is that there really aren't very many unique terms.
   If you use bigrams, then there are only maybe 36^2 distinct
   combinations. (assuming English and including numbers).
  
   2 Increase the number of shards, putting many fewer docs
   on each shard.
  
   3 give each shard a lot more memory. This isn't actually one
   of my preferred solutions since GC issues may raise their ugly
   heads here.
  
   4 insert creative solution here.
  
   This may be one of those situations where you really have to
   push back at the users and understand why they insist on these
   kinds of queries. They must be very patient since it won't be
   very performant. That said, I've seen this pattern; there are
   certainly valid conditions under which response times can be
   many seconds if there are few users and they are doing very
   complex/expert-level things.
  
   Now, all that said, wildcards are often examples of poor habits
   or habits learned in DB systems where the only hammer was
   %whatever%. I've seen situations where users didn't
   understand that Solr broke the input stream up into words. And
   stemmed. And WordDelimiterFilterFactory did all the magic
   for finding, say D.C. and DC. So it's worth looking at the actual
   queries that are sent, perhaps talking to users and understanding
   what they _want_ out of the system, then perhaps educating them
   as to better ways to get what they want.
  
   Literally I've seen people insist on entering queries that
   wildcarded _everything_ both pre and post wildcards because
   they didn't realize that Solr tokenizes...
  
   Once you hit an OOM, all bets are off as Shawn outlined.
  
   Best,
   Erick
  
   On Wed, Dec 24, 2014 at 1:57 AM, Modassar Ather 
 modather1...@gmail.com
   wrote:
Thanks for your response.
   
How many items in the collection ?
There are about 100 millions documents.
   
How are configured cache in solrconfig.xml ?
Each cache has size attribute as 128.
   
Can you provide a sample of the query ?
Does it fail immediately after solrcloud startup or after several
  hours ?
It is a query with many terms(more than a thousand) and phrase where
phrases have many wildcards in it.
Once such query is executed there are many zookeeper related
 exceptions
   and
with a couple of such queries executed it goes for OutOfMemory.
   
Thanks,
Modassar
   
   
On Wed, Dec 24, 2014 at 1:37 PM, Dominique Bejean 
   dominique.bej...@eolya.fr
wrote:
   
And you didn’t give how many RAM on each servers ?
   
2014-12-24 8:17 GMT+01:00 Dominique Bejean 
 dominique.bej...@eolya.fr
  :
   
 Modassar,

 How many items in the collection ?
 I mean how many documents per collection ? 1 million, 10 millions,
  …?

 How are configured cache in solrconfig.xml ?
 What are the size attribute value for each cache ?

 Can you provide a sample of the query ?
 Does it fail 

Re: Loading data to FieldValueCache

2014-12-28 Thread Manohar Sripada
Erick,

I am trying to do a premature optimization. *There will be no updates to my
index. So, no worries about ageing out or garbage collection.*
Let me get my understanding correctly; when we talk about filterCache, it
just stores the document IDs in the cache right?

And my setup is as follows. There are 16 nodes in my SolrCloud. Each having
64 GB of RAM, out of which I am allocating 45 GB to Solr. I have a
collection (say Products, which contains around 100 million Docs), which I
created with 64 shards, replication factor 2, and 8 shards per node. Each
shard is getting around 1.6 Million Documents. So my math here for
filterCache for a specific filter will be -


   - an average filter query will be 20 bytes, so 1000 (distinct number of
   states) x 20 = 2 MB
   - and considering union of DocIds for all the values of a given filter
   equals to total number of DocId's present in the index. There are 1.6
   Million Documents in a  solr core. So, 1,600,000 x 8 Bytes (for each Doc
   Id) equals to 12.8 MB
   - There will be 8 solrcores per node - 8 x 12.8 MB = *102 MB. *

This is the size of cache for a single filter in a single node. Considering
the heapsize I have given, I think this shouldn't be an issue..

Thanks,
Manohar

On Fri, Dec 26, 2014 at 10:56 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Manohar:

 Please approach this cautiously. You state that you have hundreds of
 states.
 Every 100 states will use roughly 1.2G of your filter cache. Just for this
 field. Plus it'll fill up the cache and they may soon be aged out anyway.
 Can you really afford the space? Is it really a problem that needs to be
 solved at this point? This _really_ sounds like premature optimization
 to me as you haven't
 demonstrated that there's an actual problem you're solving.

 OTOH, of course, if you're experimenting to better understand all the
 ins and outs
 of the process that's another thing entirely ;)

 Toke:

 I don't know the complete algorithm, but if the number of docs that
 satisfy the fq is small enough,
 then just the internal Lucene doc IDs are stored rather than a bitset.
 What exactly small enough is
 I don't know off the top of my head. And I've got to assume looking
 stuff up in a list is slower than
 indexing into a bitset so I suspect small enough is very small

 On Fri, Dec 26, 2014 at 3:00 AM, Manohar Sripada manohar...@gmail.com
 wrote:
  Thanks Toke for the explanation, I will experiment with
  f.state.facet.method=enum
 
  Thanks,
  Manohar
 
  On Fri, Dec 26, 2014 at 4:09 PM, Toke Eskildsen t...@statsbiblioteket.dk
  wrote:
 
  Manohar Sripada [manohar...@gmail.com] wrote:
   I have 100 million documents in my index. The maxDoc here is the
 maximum
   Documents in each shard, right? How is it determined that each entry
 will
   occupy maxDoc/8 approximately.
 
  Assuming that it is random whether a document is part of the result set
 or
  not, the most efficient representation is 1 bit/doc (this is often
 called a
  bitmap or bitset). So the total number of bits will be maxDoc, which is
 the
  same as maxDoc/8 bytes.
 
  Of course, result sets are rarely random, so it is possible to have
 other
  and more compact representations. I do not know how that plays out in
  Lucene. Hopefully somebody else can help here.
 
   If I have to add facet.method=enum every time in the query, how
 should I
   specify for each field separately?
 
  f.state.facet.method=enum
 
  See https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
 
  - Toke Eskildsen
 



Re: How to implement multi-set in a Solr schema.

2014-12-28 Thread Jack Krupansky
You can also use group.query or group.func to group documents matching a
query or unique values of a function query. For the latter you could
implement an NLP algorithm.


-- Jack Krupansky

On Sun, Dec 28, 2014 at 5:56 PM, Meraj A. Khan mera...@gmail.com wrote:

 Thanks Aman, the thing is the bookName field values are not exactly
 identical , but nearly identical , so at the time of indexing I need to
 figure out which other book name field this is similar to using NLP
 techniques and then put it in the appropriate bag, so that at the retrieval
 time I only retrieve all the elements from that bag if any one of the
 element matches with the search query.

 Thanks.
 On Dec 28, 2014 1:54 PM, Aman Tandon amantandon...@gmail.com wrote:

  HI,
 
  You can use the grouping in the solr. You can does this by via query or
 via
  solrconfig.xml.
 
  *A) via query*
 
 
 http://localhost:8983?your_query_params*group=truegroup.field=bookName*
 
  You can limit the size of group (how many documents you wants to show),
  suppose you want to show 5 documents per group on this bookName field
 then
  you can specify the parameter *group.limit=5.*
 
  *B) via solrconfig*
  str name=grouptrue/str str name=group.field*bookName*/str
  str name=group.ngroupstrue/str str
 name=group.truncatetrue/str
 
  With Regards
  Aman Tandon
 
  On Sun, Dec 28, 2014 at 10:29 PM, S.L simpleliving...@gmail.com wrote:
 
   Hi All,
  
   I have a use case where I need to group documents that have a same
 field
   called bookName , meaning if there are a multiple documents with the
 same
   bookName value and if the user input is searched by a query on
 bookName
  ,
   I need to be able to group all the documents by the same bookName
  together,
   so that I could display them as a group in the UI.
  
   What kind of support does Solr provide for such a scenario , and how
  should
   I look at changing my schema.xml which as bookName as single valued
 text
   field ?
  
   Thanks.
  
 



Re: solr export get wrong results

2014-12-28 Thread Sandy Ding
Hi, Joel

Thanks for your reply.
It seems that the weird export results is because that I removed the str
namexsort/str invariant of the export request handler in the default
sorlconfig.xml to get csv-format output.
I don't quite understand the meaning of xsort, but I removed it because I
always get json response (as you said) with the xsort invariant.
Is there a way to get a csv output using export?
And also, can I get full results from all shards? (I tried to set
distrib=true but get SyntaxError:xport RankQuery is required for xsort:
rq={!xport}, and I do have rq={!xport} in the export invariants)


2014-12-27 3:21 GMT+08:00 Joel Bernstein joels...@gmail.com:

 Hi Sandy,

 I pulled Solr 4.10.3 to see if I could recreate the issue you are seeing
 with export and I wasn't able to recreate the bug you are seeing. For
 example the following query:

 http://localhost:8983/solr/collection1/export?q=join_i:[50 TO
 500010]wt=jsonindent=truesort=join_i+ascfl=join_i,ShopId_i


 Brings back the following result:


 {responseHeader: {status: 0}, response:{numFound:11,

 docs:[{join_i:50,ShopId_i:578917},{join_i:51,ShopId_i:294217},{join_i:52,ShopId_i:199805},{join_i:53,ShopId_i:633461},{join_i:54,ShopId_i:472995},{join_i:55,ShopId_i:672122},{join_i:56,ShopId_i:394637},{join_i:57,ShopId_i:446443},{join_i:58,ShopId_i:697329},{join_i:59,ShopId_i:166988},{join_i:500010,ShopId_i:191261}]}}


 Notice the join_i values are all within the correct range.

 If you can post the export handler configuration we should be able to
 see the issue.


 Joel Bernstein
 Search Engineer at Heliosearch

 On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein joels...@gmail.com
 wrote:

  Hi Sandy,
 
  The export handler should only return documents in JSON format. The
  results in your second example are in XML for format so something looks
 to
  be wrong in the configuration. Can you post what your solrconfig looks
 like?
 
  Joel
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
  On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I think you missed a very important part of Jack's reply:
 
  bq: I notice that you don't have distrib=false on your select, which
  would make your select be from all nodes, while export would only be
  docs from the specific node you sent the request to.
 
  And from the Reference Guide on export
 
  bq: The initial release treats all queries as non-distributed
  requests. So the client is responsible for making the calls to each
  Solr instance and merging the results.
 
  So the export statement you're sending is _only_ exporting the results
  from the shard on 8983 and completely ignoring the other (6?) shards,
  whereas the query you're sending is getting the results from all the
  shards.
 
  As Jack said, add distrib=false to the query, send it to the same
  shard you send the export command to and the results should match.
 
  Also, be sure your configuration for the /select handler doesn't have
  any additional default parameters that might alter the results, but I
  doubt that's really a problem here.
 
  Best,
  Erick
 
  On Fri, Dec 26, 2014 at 7:02 AM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
   Hi,
  
   Do you have any custom solr components deployed? May be custom
 response
  writer?
  
   Ahmet
  
  
  
  
   On Friday, December 26, 2014 3:26 PM, Sandy Ding 
  sandy.ding...@gmail.com wrote:
   Hi, Ahmet,
  
   I use libuuid for unique id and I guess there shouldn't be duplicate
  ids.
   Also, the results are not just incomplete, they are screwed.
  
  
   2014-12-26 20:19 GMT+08:00 Ahmet Arslan iori...@yahoo.com.invalid:
  
   Hi,
  
   Two different things :
  
   If you have unique key defined document with same id override within
 a
   single shard.
  
   Plus, uniqueIDs expected to be unique across shards.
  
   Ahmet
  
  
  
   On Friday, December 26, 2014 11:00 AM, Sandy Ding 
  sandy.ding...@gmail.com
   wrote:
   Hi, all
  
   I've recently set up a solr cluster and found that export returns
   different results from select.
   And I confirmed that the export results are wrong by manually query
  the
   results.
   Even simple queries as follows will get different results:
  
   curl 
  http://localhost:8983/solr/pa_info/select?q=*:*fl=idsort=id+desc:
  
   responselst name=responseHeaderint
 name=status0/intint
   name=QTime11/intlst name=paramsstr name=sortid
  desc/strstr
   name=flid/strstr name=q*:*/str/lst/lstresult
   name=response *numFound=1197* start=0doc.../doc/result
  
   curl 
  http://localhost:8983/solr/pa_info/export?q=*:*fl=idsort=id+desc;
   :
   {*numFound:172*, docs:[..]
  
   Don't have a clue why this happen! Anyone help?
  
   Best,
   Sandy