Re: Documents cannot be searched immediately when indexed using REST API with Solr Cloud

2015-03-19 Thread Liu Bo
Hi Edvin

Please review your commit/soft-commit configuration,
soft commits are about visibility, hard commits are about durability
  by a wise man. :)

If you are doing NRT index and searching, your probably need a short soft
commit interval or commit explicitly in your request handler. Be advised
that these strategies and configurations need to be tested and adjusted
according to your data size, searching and index updating frequency.

You should be able to find the answer yourself here:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

All the best

Liu Bo

On 19 March 2015 at 17:54, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Hi,

 I'm using Solr Cloud now, with 2 shards known as shard1 and shard2, and
 when I try to index rich-text documents using REST API or the default
 Documents module in Solr Admin UI, the documents that are indexed do not
 appear immediately when I do a search. It only appears after I restarted
 the Solr services (both shard1 and shard2).

 However, the same issue do not happen when I index the same documents using
 post.jar, and I can search for the indexed documents immediately.

 Here's my ExtractingRequestHandler in solrconfig.xml.

   requestHandler name=/update/extract
   class=solr.extraction.ExtractingRequestHandler 
 lst name=defaults
   str name=lowernamestrue/str
   str name=uprefixignored_/str

   !-- capture link hrefs but ignore div attributes --
   str name=captureAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored_/str
 /lst
   /requestHandler

 What could be the reason why this is happening, and any solutions to solve
 it?

 Regards,
 Edwin



Re: Where to specify numShards when startup up a cloud setup

2014-04-18 Thread Liu Bo
Hi zzT

Putting numShards in core.properties also works.

I struggled a little bit while figuring out this configuration approach.
I knew I am not alone! ;-)


On 2 April 2014 18:06, zzT zis@gmail.com wrote:

 It seems that I've figured out a configuration approach to this issue.

 I'm having the exact same issue and the only viable solutions found on the
 net till now are
 1) Pass -DnumShards=x when starting up Solr server
 2) Use the Collections API as indicated by Shawn.

 What I've noticed though - after making the call to /collections to create
 a
 node solr.xml - is that a new core entry is added inside solr.xml with
 the
 attribute numShards.

 So, right now I'm configuring solr.xml with numShards attribute inside my
 core nodes. This way I don't have to worry with annoying stuff you've
 already mentioned e.g. waiting for Solr to start up etc.

 Of course same logic applies here, numShards param is meanigful only the
 first time. Even if you change it at a later point the # of shards stays
 the
 same.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo


Re: Multiple Languages in Same Core

2014-03-26 Thread Liu Bo
Hi Jeremy

There're a lot of multi language discussions, two main approaches
 1. like yours, a language is one core
 2. all in one core, different language has it's own field.

We have multi-language support in a single core, each multilingual field
has it's own suffix such as name_en_US. We customized query handler to hide
the query details to client.
The main reason we want to do this is about NRT index and search,
take product for example:

product has price, quantity which is common and it's used by filtering
and sorting, name, description is multi language field,
if we split product in do different cores, the common field updating
may end up a update in all of the multi language cores.

As to scalability, we don't change solr cores/collections when a new
language is added, but we probably need update our customized index process
and run a full re-index.

This approach suits our requirement for now, but you may have your own
concerns.

We have similar suggest filter problem like yours, we want to return
suggest result filtering by stores. I can't find a way to build dictionary
with query at my version of solr 4.6

What I do is run a query on a N-Gram analyzed field and with filter queries
on store_id field. The suggest is actually a query. It may not perform as
well as suggestion but can do the trick.

You can try it to build a additional N-GRAM field for suggestion only and
search on it with fq on your Locale field.

All the best

Liu Bo




On 25 March 2014 09:15, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Solr In Action has a significant discussion on the multi-lingual
 approach. They also have some code samples out there. Might be worth a
 look

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
 jer...@thomersonfamily.com wrote:
  I recently deployed Solr to back the site search feature of a site I work
  on. The site itself is available in hundreds of languages. With the
 initial
  release of site search we have enabled the feature for ten of those
  languages. This is distributed across eight cores, with two Chinese
  languages plus Korean combined into one CJK core and each of the other
  seven languages in their own individual cores. The reason for splitting
  these into separate cores was so that we could have the same field names
  across all cores but have different configuration for analyzers, etc, per
  core.
 
  Now I have some questions on this approach.
 
  1) Scalability: Considering I need to scale this to many dozens more
  languages, perhaps hundreds more, is there a better way so that I don't
 end
  up needing dozens or hundreds of cores? My initial plan was that many
  languages that didn't have special support within Solr would simply get
  lumped into a single default core that has some default analyzers that
  are applicable to the majority of languages.
 
  1b) Related to this: is there a practical limit to the number of cores
 that
  can be run on one instance of Lucene?
 
  2) Auto Suggest: In phase two I intend to add auto-suggestions as a user
  types a query. In reviewing how this is implemented and how the
 suggestion
  dictionary is built I have concerns. If I have more than one language in
 a
  single core (and I keep the same field name for suggestions on all
  languages within a core) then it seems that I could get suggestions from
  another language returned with a suggest query. Is there a way to build a
  separate dictionary for each language, but keep these languages within
 the
  same core?
 
  If it's helpful to know: I have a field in every core for Locale.
 Values
  will be the locale of the language of that document, i.e. en, es,
  zh_hans, etc. I'd like to be able to: 1) when building a suggestion
  dictionary, divide it into multiple dictionaries, grouping them by
 locale,
  and 2) supply a parameter to the suggest query that allows the suggest
  component to only return suggestions from the appropriate dictionary for
  that locale.
 
  If the answer to #1 is keep splitting groups of languages that have
  different analyzers into their own cores and the answer to #2 is that's
  not supported, then I'd be curious: where would I start to write my own
  extension that supported #2? I looked last night at the suggest lookup
  classes, dictionary classes, etc. But I didn't see a clear point where it
  would be clean to implement something like I'm suggesting above.
 
  Best Regards,
  Jeremy Thomerson




-- 
All the best

Liu Bo


Re: Grouping results with group.limit return wrong numFound ?

2014-01-01 Thread Liu Bo
hi @Ahmet

I've thought about using group.ngroups=true , but when you use
group.main=true, there's no ngroups field in the response.

and according to http://wiki.apache.org/solr/FieldCollapsing, the result
might not be correct in solrcloud.

I don't like using facet for this but seems have to...


On 1 January 2014 00:35, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Tasmaniski,

 I don't follow. How come Liu's faceting workaround and n.groups=true
 produce different results?






 On Tuesday, December 31, 2013 6:08 PM, tasmaniski tasmani...@gmail.com
 wrote:
 @kamaci
 Ofcourse. That is the problem.

 group.limit is: the number of results (documents) to return for each
 group.
 NumFound is number of total found, but *not* sum number of *return for each
 group.*

 @Liu Bo
 seems to be the is only workaround for problem but
 it's to much expensive to go through all the groups and calculate total
 number of found/returned (I use PHP for client:) ).

 @iorixxx
 Yes, I consider that (group.ngroups=true)
 but in some group I have number of found result  lesser than limit.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174p4108906.html

 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo


Re: Chaining plugins

2013-12-31 Thread Liu Bo
Hi

I've done similar things as paul.

what I do is extending the default QueryComponent and overwrite the
preparing method,

then I just change the solrparams according to our logic and then call
super.prepare(). Then replace the default QueryComponent with it in my
search/query handler.

In this way, nothing of solr default behavior is touched. I think you can
do your logic in prepare method, and then let solr proceed the search.

I've tested it along with other components in both single solr node and
solrcloud. It works fine.

Hope it helps

Cheers

Bold



On 31 December 2013 06:03, Chris Hostetter hossman_luc...@fucit.org wrote:


 You don't need to write your own handler.

 See the previpous comment about implementing a SearchComponent -- you can
 check for the params in your prepare() method and do whatever side effects
 you want, then register your custom component and hook it into the
 component chain of whatever handler configuration you want (either using
 the components arr or by specifying it as a first-components...


 https://cwiki.apache.org/confluence/display/solr/RequestHandlers+and+SearchComponents+in+SolrConfig

 : I want to save the query into a file when a user is changing a parameter
 in
 : the query, lets say he adds logTofile=1 then the searchHandler will
 : provide the same result as without this parameter, but in the background
 it
 : will do some logic(ex. save the query to file) .
 : But I dont want to touch solr source code, all I want is to add code(like
 : plugin). if i understand it right I want to write my own search handler
 , do
 : some logic , then pass the data to solr default search handler.




 -Hoss
 http://www.lucidworks.com/




-- 
All the best

Liu Bo


Re: Grouping results with group.limit return wrong numFound ?

2013-12-31 Thread Liu Bo
Hi

I've met the same problem, and I've googled it around but not found direct
solution.

But there's a work around, do a facet on your group field, with parameters
like

   str name=facettrue/str
   str name=facet.fieldyour_field/str
   str name=facet.limit-1/str
   str name=facet.mincount1/str

and then count how many facted pairs in the response. This should be the
same with the number of documents after grouping.

Cheers

Bold




On 31 December 2013 06:40, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 group.limit is: the number of results (documents) to return for each group.
 Defaults to 1. Did you check the page here:
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604232

 Thanks;
 Furkan KAMACI


 25 Aralık 2013 Çarşamba tarihinde tasmaniski tasmani...@gmail.com adlı
 kullanıcı şöyle yazdı:
  Hi All, When I perform a search with grouping result in a groups and do
 limit
  results in one group I got that *numFound* is the same as I didn't use
  limit.looks like SOLR first perform search and calculate numFound and
 that
  group and limit the results.I do not know if this is a bug or a feature
  :)But I cannot use pagination and other stuff.Is there any workaround or
 I
  missed something ?Example:I want to search book title and limit the
 search
  to 3 results per one publisher.q=book_title: solr
  phpgroup=truegroup.field=publishergroup.limit=3group.main=trueI have
 for
  apress publisher 20 results but I show only 3 that works OKBut in
 numFound I
  still have 20 for apress publisher...
 
 
 
  --
  View this message in context:

 http://lucene.472066.n3.nabble.com/Grouping-results-with-group-limit-return-wrong-numFound-tp4108174.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo


Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
hi Josip

for the 1 question we've done similar things: copying search field to a
text field. But highlighting is normally on specific fields such as tittle
depending on how the search content is displayed to the front end, you can
search on text and highlight on the field you wanted by specify hl.fl

ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

 Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
   highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic




-- 
All the best

Liu Bo


Re: an array liked string is treated as multivalued when adding doc to solr

2013-12-18 Thread Liu Bo
Hi Alexandre

It's quite a rare case, just one out of tens of thousands.

I'm planning to have every multilingual field as multivalued and just get
the first one while formatting the response to our business object.

The first value update processor seems a lot helpful, thank you.

All the best

Liu Bo


On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote:

 If this happens rarely and you want to deal with in on the way into Solr,
 you could just keep one of the values, using URP:

 http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html

 Regards,
Alex

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote:

  Hey Furkan and solr users
 
  This is a miss reported problem. It's not solr problem but our data
 issue.
  Sorry for this.
 
  It's a data issue of our side, a coupon happened to have two piece
 English
  description, which is not allowed in our business logic, but it happened
   and we added twice of the name_en_US to solr document.
 
  I've done a set of test and deep debugging to solr source code, and found
  out that a array like string such as  [Get 20% Off Official Barca Kits,
  coupon] won't be treated as multivalued field.
 
  Sorry again for not digging more before sent out question email. I trust
  our business logic and data integrity more than solr, I will definitely
 not
  do this again. ;-)
 
  All the best
 
  Liu Bo
 
 
 
  On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote:
 
   Hi Liu;
  
   Yes. it is an expected behavior. If you send data within square
 brackets
   Solr will behave it as a multivalued field. You can test it with this
  way:
   if you use Solrj and use a List for a field it will be considered as
   multivalued too because when you call toString() method of your List
 you
   can see that elements are printed within square brackets. This is the
   reason that a List can be used for a multivalued field.
  
   If you explain your situation I can offer a way how to do it.
  
   Thanks;
   Furkan KAMACI
  
  
   2013/12/6 Liu Bo diabl...@gmail.com
  
Dear solr users:
   
I've met this kind of error several times,
   
when add a array liked string such as:[Get 20% Off Official Barça
  Kits,
coupon] to a  multiValued=false field, solr will complain:
   
org.apache.solr.common.SolrException: ERROR:
 [doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US:
 [Get
   20%
Off Official Barca Kits, coupon]
   
my schema defination:
field name=name_en_US type=text_en indexed=true stored=true
multiValued=false /
   
This field is stored as the search result needs this field and it's
  value
in original format, and indexed to give it a boost while searching .
   
What I do is adding name (java.lang.String) to SolrInputDocument by
addField(name_en_US, product.getName()) method, and then add this
 to
   solr
using an AddUpdateCommand
   
It seems solr treats this kind of string data as multivalued, even I
  add
this field to solr only once.
   
Is this a bug or a supposed behavior?
   
Is there any way to tell solr this is not a multivalued value add
  don't
break it?
   
Your help and suggestion will be much of my appreciation.
   
--
All the best
   
Liu Bo
   
  
 
 
 
  --
  All the best
 
  Liu Bo
 




-- 
All the best

Liu Bo


Re: PostingsSolrHighlighter

2013-12-18 Thread Liu Bo
Hi Josip

that's quite weird, to my experience highlight is strict on string field
which needs a exact match, text fields should be fine.

I copy your schema definition and do a quick test in a new core, everything
is default from the tutorial, and the search component is
using solr.HighlightComponent .

search on searchable_text can highlight text, I copied your search url and
just change the host part, the input parameters are exactly the same,

result is attached.

Can you upload your complete solrconfig.xml and schema.xml?


On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote:

 Am 18.12.2013 09:55, schrieb Liu Bo:

 hi Josip


 hi liu,


  for the 1 question we've done similar things: copying search field to a
 text field. But highlighting is normally on specific fields such as tittle
 depending on how the search content is displayed to the front end, you can
 search on text and highlight on the field you wanted by specify hl.fl

 ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl


 thats exactly what i'm doing in that pastebin:

 http://pastebin.com/13Uan0ZF

 I'm searing there for 'q=searchable_text:labore' this is present in 'text'
 and in the copyfield 'searchable_text' but it is not highlighted in 'text'
 (hl.fl=text)

 The same query is working if set 'q=text:labore' as you can see in

 http://pastebin.com/4CP8XKnr

 For 2 question i figured out that the PostingsSolrHighlighter ellipsis
 is not like i thought for adding ellipsis to start or/and end in
 highlighted text. It is instead used to combine multiple snippets together
 if snippets is  1.

 cheers

 josip




 On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote:

  Hi @all,

 i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0
 and my configuration is from here:

 https://lucene.apache.org/solr/4_6_0/solr-core/org/
 apache/solr/highlight/
 PostingsSolrHighlighter.html

 Search query and result (not working):

 http://pastebin.com/13Uan0ZF

 Schema (not complete):

 http://pastebin.com/JGa38UDT

 Search query and result (working):

 http://pastebin.com/4CP8XKnr

 Solr config:

 searchComponent class=solr.HighlightComponent name=highlight
highlighting class=org.apache.solr.highlight.
 PostingsSolrHighlighter/


 /searchComponent

 So this is working just fine, but now i have some questions:

 1.) With the old default highlighter component it was possible to search
 in searchable_text and to retrive highlighted text. This is
 essential,
 because we use copyfield to put almost everything to searchable_text
 (title, subtitle, description, ...)

 2.) I can't get ellipsis working i tried hl.tag.ellipsis=...,
 f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems
 to work, maxAnalyzedChars is just cutting the sentence?

 Kind Regards

 Josip Delic









-- 
All the best

Liu Bo
http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0

{
responseHeader: {
status: 0,
QTime: 36,
params: {
sort: score desc,
fl: text,
start: 0,
,score: ,
q: (searchable_text:labore),
hl.fl: text,
wt: json,
hl: true,
rows: 10
}
},
response: {
numFound: 3,
start: 0,
docs: [
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum 
dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed 
diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet 
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
},
{
text: Lorem ipsum dolor sit amet, consetetur sadipscing 
elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna 
aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores 
et ea rebum. Stet clita kasd gubergren, no sea takimata

Re: an array liked string is treated as multivalued when adding doc to solr

2013-12-17 Thread Liu Bo
Hey Furkan and solr users

This is a miss reported problem. It's not solr problem but our data issue.
Sorry for this.

It's a data issue of our side, a coupon happened to have two piece English
description, which is not allowed in our business logic, but it happened
 and we added twice of the name_en_US to solr document.

I've done a set of test and deep debugging to solr source code, and found
out that a array like string such as  [Get 20% Off Official Barca Kits,
coupon] won't be treated as multivalued field.

Sorry again for not digging more before sent out question email. I trust
our business logic and data integrity more than solr, I will definitely not
do this again. ;-)

All the best

Liu Bo



On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi Liu;

 Yes. it is an expected behavior. If you send data within square brackets
 Solr will behave it as a multivalued field. You can test it with this way:
 if you use Solrj and use a List for a field it will be considered as
 multivalued too because when you call toString() method of your List you
 can see that elements are printed within square brackets. This is the
 reason that a List can be used for a multivalued field.

 If you explain your situation I can offer a way how to do it.

 Thanks;
 Furkan KAMACI


 2013/12/6 Liu Bo diabl...@gmail.com

  Dear solr users:
 
  I've met this kind of error several times,
 
  when add a array liked string such as:[Get 20% Off Official Barça Kits,
  coupon] to a  multiValued=false field, solr will complain:
 
  org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
  multiple values encountered for non multiValued field name_en_US: [Get
 20%
  Off Official Barca Kits, coupon]
 
  my schema defination:
  field name=name_en_US type=text_en indexed=true stored=true
  multiValued=false /
 
  This field is stored as the search result needs this field and it's value
  in original format, and indexed to give it a boost while searching .
 
  What I do is adding name (java.lang.String) to SolrInputDocument by
  addField(name_en_US, product.getName()) method, and then add this to
 solr
  using an AddUpdateCommand
 
  It seems solr treats this kind of string data as multivalued, even I add
  this field to solr only once.
 
  Is this a bug or a supposed behavior?
 
  Is there any way to tell solr this is not a multivalued value add don't
  break it?
 
  Your help and suggestion will be much of my appreciation.
 
  --
  All the best
 
  Liu Bo
 




-- 
All the best

Liu Bo


an array liked string is treated as multivalued when adding doc to solr

2013-12-05 Thread Liu Bo
Dear solr users:

I've met this kind of error several times,

when add a array liked string such as:[Get 20% Off Official Barça Kits,
coupon] to a  multiValued=false field, solr will complain:

org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692]
multiple values encountered for non multiValued field name_en_US: [Get 20%
Off Official Barca Kits, coupon]

my schema defination:
field name=name_en_US type=text_en indexed=true stored=true
multiValued=false /

This field is stored as the search result needs this field and it's value
in original format, and indexed to give it a boost while searching .

What I do is adding name (java.lang.String) to SolrInputDocument by
addField(name_en_US, product.getName()) method, and then add this to solr
using an AddUpdateCommand

It seems solr treats this kind of string data as multivalued, even I add
this field to solr only once.

Is this a bug or a supposed behavior?

Is there any way to tell solr this is not a multivalued value add don't
break it?

Your help and suggestion will be much of my appreciation.

-- 
All the best

Liu Bo


Re: deleting a doc inside a custom UpdateRequestProcessor

2013-11-18 Thread Liu Bo
hi,

you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:

//build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);

Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.

you can also try to write your own updaterequesthandler instead of a
customized processor.

you can do a set of operations in your function
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}

get your processor chain in this function and passes a delete command
to it such as :

SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);

  DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);

this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.


On 19 November 2013 10:01, Peyman Faratin pey...@robustlinks.com wrote:

 Hi

 I am building a custom UpdateRequestProcessor to intercept any doc heading
 to the index. Basically what I want to do is to check if the current index
 has a doc with the same title (i am using IDs as the uniques so I can't use
 that, and besides the logic of checking is a little more complicated). If
 the incoming doc has a duplicate and some other conditions hold then one of
 2 things can happen:

 1- we don't index the incoming document
 2- we index the incoming and delete the duplicate currently in the
 index

 I think (1) can be done by simple not passing the call up the chain (not
 calling super.processAdd(cmd)). However, I don't know how to implement the
 second condition, deleting the duplicate document, inside a custom
 UpdateRequestProcessor. This thread is the closest to my goal

 http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html

 however i am not clear how to proceed. Code snippets below.

 thank you in advance for your help

 class isDuplicate extends UpdateRequestProcessor
 {
 public isDuplicate( UpdateRequestProcessor next) {
   super( next );
 }
 @Override
 public void processAdd(AddUpdateCommand cmd) throws
 IOException {
 try
 {
 boolean indexIncomingDoc =
 checkIfIsDuplicate(cmd);
 if(indexIncomingDoc)
 super.processAdd(cmd);
 } catch (SolrServerException e)
 {e.printStackTrace();}
 catch (ParseException e) {e.printStackTrace();}
 }
 public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
 ...{

 SolrInputDocument incomingDoc =
 cmd.getSolrInputDocument();
 if(incomingDoc == null) return false;
 String title = (String) incomingDoc.getFieldValue(
 title );
 SolrIndexSearcher searcher =
 cmd.getReq().getSearcher();
 boolean addIncomingDoc = true;
 Integer idOfDuplicate = searcher.getFirstMatch(new
 Term(title,title));
 if(idOfDuplicate != -1)
 {
 addIncomingDoc =
 compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
 }
 return addIncomingDoc;
 }
 private boolean compareDocs(.){
 
 if( condition 1 )
 {
 -- DELETE DUPLICATE DOC in INDEX --
 addIncomingDoc = true;
 }
 
 return addIncomingDoc;
 }




-- 
All the best

Liu Bo


Re: Multi-core support for indexing multiple servers

2013-11-12 Thread Liu Bo
As far as I know about magento, it's DB schema is designed for extensible
property storage and relationships between db tables are kind of complex.

Product has its attribute sets and properties which are stored in different
tables. Configurable product may have different attribute values for each
of it's sub simple products.

Handle relationship like this in DIH won't be easy, especially when you
want to group attributes of a configurable product into one document.

But if you just need to search on name and description but not other
attributes, you can try write DIH on catalog_product_flat_x tables, magento
may have several of them.

We used to use lucene core to provide search on magento products, what we
do is using SOAP service provided by magento to get products, and then
converting them to lucene document. Indexes are updated daily. This hides
lots of magento implementation details but it's kind of slow.




On 12 November 2013 22:41, Robert Veliz rob...@mavenbridge.com wrote:

 I have two sources/servers--one of them is Magento. Since Magento has a
 more or less out of the box integration with Solr, my thought was to run
 Solr server from the Magento instance and then use DIH to get/merge content
 from the other source/server. Seem feasible/appropriate?  I spec'd it out
 and it seems to make sense...

 R

  On Nov 11, 2013, at 11:25 PM, Liu Bo diabl...@gmail.com wrote:
 
  like Erick said, merge data from different datasource could be very
  difficult, SolrJ is much easier to use but may need another application
 to
  do handle index process if you don't want to extends solr much.
 
  I eventually end up with a customized request handler which use
 SolrWriter
  from DIH package to index data,
 
  So that I can fully control the index process, quite like SolrJ, you can
  write code to convert your data into SolrInputDocument, and then post
 them
  to SolrWriter, SolrWriter will handles the rest stuff.
 
 
  On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Yep, you can define multiple data sources for use with DIH.
 
  Combining data from those multiple sources into a single
  index can be a bit tricky with DIH, personally I tend to prefer
  SolrJ, but that's mostly personal preference, especially if
  I want to get some parallelism going on.
 
  But whatever works
 
  Erick
 
 
  On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com
  wrote:
 
  Eric,
  Just a question :-), wouldn't it be easy to use DIH to pull data from
  multiple data sources.
 
  I do use DIH to do that comfortably. I have three data sources
  - MySQL
  - URLDataSource that returns XML from an .NET application
  - URLDataSource that connects to an API and return XML
 
  Here is part of data-config data source settings
  dataSource type=JdbcDataSource name=solr
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root
  password=root/
dataSource name=CRMServer type=URLDataSource
 encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
dataSource name=ImageServer type=URLDataSource
  encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
 
 
  Of course, in application I do the same.
  To construct my results, I do connect to MySQL and those two data
  sources.
 
  Basically we have two point of indexing
  - Using DIH at one time indexing
  - At application whenever there is transaction to the details that we
  are
  storing in Solr.
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
  All the best
 
  Liu Bo




-- 
All the best

Liu Bo


Re: eDisMax, multiple language support and stopwords

2013-11-11 Thread Liu Bo
Happy to see some one have similar solutions as ours.

we have similar multi-language search feature and we index different
language content to _fr, _en field like you've done

but in search, we need a language code as a parameter to specify the
language client wants to search on which is normally decided by the website
visited, such as: qf=name descriptionlanguage=en

and in our search components we find the right field: name_en and
description_en to be searched on

we used to support on all language search and removed that later, as the
site tells the customer which language is supported, we also don't think we
have many language experts on our web sites that knows more than two
language and need to search them at the same time.


On 7 November 2013 23:01, Tom Mortimer tom.m.f...@gmail.com wrote:

 Ah, thanks Markus. I think I'll just add the Boolean operators to the
 stopwords list in that case.

 Tom



 On 7 November 2013 12:01, Markus Jelsma markus.jel...@openindex.io
 wrote:

  This is an ancient problem. The issue here is your mm-parameter, it gets
  confused because for separate fields different amount of tokens are
  filtered/emitted so it is never going to work just like this. The easiest
  option is not to use the stopfilter.
 
 
 
 http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html
  https://issues.apache.org/jira/browse/SOLR-3085
 
  -Original message-
   From:Tom Mortimer tom.m.f...@gmail.com
   Sent: Thursday 7th November 2013 12:50
   To: solr-user@lucene.apache.org
   Subject: eDisMax, multiple language support and stopwords
  
   Hi all,
  
   Thanks for the help and advice I've got here so far!
  
   Another question - I want to support stopwords at search time, so that
  e.g.
   the query oscar and wilde is equivalent to oscar wilde (this is
 with
   lowercaseOperators=false). Fair enough, I have stopword and in the
  query
   analyser chain.
  
   However, I also need to support French as well as English, so I've got
  _en
   and _fr versions of the text fields, with appropriate stemming and
   stopwords. I index French content into the _fr fields and English into
  the
   _en fields. I'm searching with eDisMax over both versions, e.g.:
  
   str name=qfheadline_en headline_fr/str
  
   However, this means I get no results for oscar and wilde. The parsed
   query is:
  
   (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
   DisjunctionMaxQuery((headline_fr:and))
   DisjunctionMaxQuery((headline_fr:wild |
 headline_en:wild)))~3))/no_coord
  
   If I add and to the French stopwords list, I *do* get results, and
 the
   parsed query is:
  
   (+((DisjunctionMaxQuery((headline_fr:osca | headline_en:oscar))
   DisjunctionMaxQuery((headline_fr:wild |
 headline_en:wild)))~2))/no_coord
  
   This implies that the only solution is to have a minimal, shared
  stopwords
   list for all languages I want to support. Is this correct, or is there
 a
   way of supporting this kind of searching with per-language stopword
  lists?
  
   Thanks for any ideas!
  
   Tom
  
 




-- 
All the best

Liu Bo


Re: Multi-core support for indexing multiple servers

2013-11-11 Thread Liu Bo
like Erick said, merge data from different datasource could be very
difficult, SolrJ is much easier to use but may need another application to
do handle index process if you don't want to extends solr much.

I eventually end up with a customized request handler which use SolrWriter
from DIH package to index data,

So that I can fully control the index process, quite like SolrJ, you can
write code to convert your data into SolrInputDocument, and then post them
to SolrWriter, SolrWriter will handles the rest stuff.


On 8 November 2013 21:46, Erick Erickson erickerick...@gmail.com wrote:

 Yep, you can define multiple data sources for use with DIH.

 Combining data from those multiple sources into a single
 index can be a bit tricky with DIH, personally I tend to prefer
 SolrJ, but that's mostly personal preference, especially if
 I want to get some parallelism going on.

 But whatever works

 Erick


 On Thu, Nov 7, 2013 at 11:17 PM, manju16832003 manju16832...@gmail.com
 wrote:

  Eric,
  Just a question :-), wouldn't it be easy to use DIH to pull data from
  multiple data sources.
 
  I do use DIH to do that comfortably. I have three data sources
   - MySQL
   - URLDataSource that returns XML from an .NET application
   - URLDataSource that connects to an API and return XML
 
  Here is part of data-config data source settings
  dataSource type=JdbcDataSource name=solr
  driver=com.mysql.jdbc.Driver
  url=jdbc:mysql://localhost/employeeDB batchSize=-1 user=root
  password=root/
 dataSource name=CRMServer type=URLDataSource encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
 dataSource name=ImageServer type=URLDataSource
 encoding=UTF-8
  connectionTimeout=5000 readTimeout=1/
 
 
  Of course, in application I do the same.
  To construct my results, I do connect to MySQL and those two data
 sources.
 
  Basically we have two point of indexing
   - Using DIH at one time indexing
   - At application whenever there is transaction to the details that we
 are
  storing in Solr.
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Multi-core-support-for-indexing-multiple-servers-tp4099729p4099933.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
All the best

Liu Bo


how does solr load plugins?

2013-10-16 Thread Liu Bo
Hi

I write a plugin to index contents reusing our DAO layer which is developed
using Spring.

What I am doing now is putting the plugin jar and all other depending jars
of DAO layer to shared lib folder under solr home.

In the log, I can see all the jars are loaded through SolrResourceLoader
like:

INFO  - 2013-10-16 16:25:30.611; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/apache-tomcat-7.0.42/solr/lib/spring-tx-3.1.0.RELEASE.jar'
to classloader


Then initialize the Spring context using:

ApplicationContext context = new
FileSystemXmlApplicationContext(/solr/spring/solr-plugin-bean-test.xml);


Then Spring will complain:

INFO  - 2013-10-16 16:33:57.432;
org.springframework.context.support.AbstractApplicationContext; Refreshing
org.springframework.context.support.FileSystemXmlApplicationContext@e582a85:
startup date [Wed Oct 16 16:33:57 CST 2013]; root of context hierarchy
INFO  - 2013-10-16 16:33:57.491;
org.springframework.beans.factory.xml.XmlBeanDefinitionReader; Loading XML
bean definitions from file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]
ERROR - 2013-10-16 16:33:59.944;
com.test.search.solr.spring.AppicationContextWrapper; Configuration
problem: Unable to locate Spring NamespaceHandler for XML schema namespace [
http://www.springframework.org/schema/context]
Offending resource: file
[D:\apache-tomcat-7.0.42\solr\spring\solr-plugin-bean-test.xml]

Spring context requires spring-tx-3.1.xsd which does exist
in spring-tx-3.1.0.RELEASE.jar under
org\springframework\transaction\config\ package, but the program can't
find it even though it could load spring classes successfully.

The following won't work either.

ApplicationContext context = new
ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml);
//the solr-plugin-bean-test.xml is packaged in plugin.jar as well.

But when I but all the jars under TOMECAT_HOME/webapp/solr/WEB-INF/lib, and
using

ApplicationContext context = new
ClassPathXmlApplicationContext(classpath:spring/solr-plugin-bean-test.xml);

everything works fine, I could initialize spring context and load DAO beans
to read data and then write them to solr index. But isn't modifying
solr.war a bad practice?

It seems SolrResourceLoader only loads classes from plugins jars but these
jars are NOT in classpath. Please correct me if I am wrong,

Is there any ways to use resources in plugin jars such as configuration
file?

BTW is there any difference between SolrResourceLoader with tomcat webapp
classLoader?

-- 
All the best

Liu Bo


Re: SolrDocumentList - bitwise operation

2013-10-13 Thread Liu Bo
join query might be helpful: http://wiki.apache.org/solr/Join

join can across indexes but probably won't work in solr clound.

be aware that only to documents are retrievable, if you want content from
both documents, join query won't work. And in lucene join query doesn't
quite work on multiple join conditions, haven't test it in solr yet.

I have similar join case like you, eventually I choose to denormalize our
data into one set of documents.


On 13 October 2013 22:34, Michael Tyler michaeltyler1...@gmail.com wrote:

 Hello,

 I have 2 different solr indexes returning 2 different sets of
 SolrDocumentList. Doc Id is the foreign key relation.

 After obtaining them, I want to perform AND operation between them and
 then return results to user. Can you tell me how do I get this? I am using
 solr 4.3

  SolrDocumentList results1 = responseA.getResults();
  SolrDocumentList results2 = responseB.getResults();

 results1  : d1, d2, d3
 results2  :  d1,d2, d4

 Return : d1, d2

 Regards,
 Michael




-- 
All the best

Liu Bo


Re: SolrCore 'collection1' is not available due to init failure

2013-10-11 Thread Liu Bo
org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused
by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
NativeFSLock@/usr/share/solr-4.5.0/example/solr/
collection1/data/index/write.lock:
java.io.FileNotFoundException:
/usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock
(Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at

it seems a permission problem, the user that start tomcat don't have
permission to access your index folder.

try grant read and write permission to current user to your solr data
folder and restart tomcat to see what happens.


-- 
All the best

Liu Bo


Re: Multiple schemas in the same SolrCloud ?

2013-10-10 Thread Liu Bo
you can try this way:

start zookeeper server first.

upload your configurations to zookeeper and link them to your collection
using zkcli just like shawn said

let's say you have conf1 and conf2, you can link them to collection1 and
collection2

remove the bootstrap stuff and start solr server.

after you have solr running, create collection1 and collection2 via core
admin, you don't have conf because all your core specified configurations
are in zookeeper

or you could use core discovery and have collection name specified in
core.properties, see :
http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29



On 10 October 2013 23:57, maephisto my_sky...@yahoo.com wrote:

 On this topic, once you've uploaded you collection's configuration in ZK,
 how
 can you update it?
 Upload the new one with the same config name ?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094729.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
All the best

Liu Bo


Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-08 Thread Liu Bo
I've solved this problem myself.

If you use core discovery, you must specify the numShards parameter in
core.properties.
or else solr won't be allocate range for each shards and then documents
won't be distributed properly.

Using core discovery to set up solr cloud in tomcat is much easier and
clean than coreAdmin described in the wiki:
http://wiki.apache.org/solr/SolrCloudTomcat.

It costs me some time to move from jetty to tomcat, but I think our IT team
will like this way. :)




On 6 October 2013 23:53, Liu Bo diabl...@gmail.com wrote:

 Hi all

 I've sent out this mail before, but I only subscribed to lucene-user but
 not solr-user at that time. Sorry for repeating if any and your help will
 be much of my appreciation.

 I'm trying out the tutorial about solrcloud, and then I manage to write my
 own plugin to import data from our set of databases, I use SolrWriter from
 DataImporter package and the docs could be distributed commit to shards.

 Every thing works fine using jetty from the solr example, but when I move
 to tomcat, solrcloud seems not been configured right. As the documents are
 just committed to the shard where update requested goes to.

 The cause probably is the range is null for shards in clusterstate.json.
 The router is implicit instead of compositeId as well.

 Is there anything missed or configured wrong in the following steps? How
 could I fix it. Your help will be much of my appreciation.

 PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm
 trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki
 pages.

 Here's what I've done and some useful logs:

 1. start three zookeeper server.
 2. upload configuration files to zookeeper, the collection name is
 content_collection
 3. start three tomcat instants on three server with core discovery

 a) core file:
  name=content
  loadOnStartup=true
  transient=false
  shard=shard1   (differrent on servers)
  collection=content_collection
 b) solr.xml

  solr

   solrcloud

 str name=host${host:}/str

 str name=hostContext${hostContext:solr}/str

 int name=hostPort8080/int

 int name=zkClientTimeout${zkClientTimeout:15000}/int

 str name=zkHost10.199.46.176:2181,10.199.46.165:2181,
 10.199.46.158:2181/str

 bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool

   /solrcloud


   shardHandlerFactory name=shardHandlerFactory

 class=HttpShardHandlerFactory

 int name=socketTimeout${socketTimeout:0}/int

 int name=connTimeout${connTimeout:0}/int

   /shardHandlerFactory

 /solr

 4. In the solr.log, I see the three shards are recognized, and the
 solrcloud can see the content_collection has three shards as well.
 5. write documents to content_collection using my update request, the
 documents only commits to the shard the request goes to, in the log I can
 see the DistributedUpdateProcessorFactory is in the processorChain and
 disribute commit is triggered:

 INFO  - 2013-09-30 16:31:43.205;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 updata request processor factories:

 INFO  - 2013-09-30 16:31:43.206;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77

 INFO  - 2013-09-30 16:31:43.207;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.*DistributedUpdateProcessorFactory*
 @5b2bc407

 INFO  - 2013-09-30 16:31:43.207;
 com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
 org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654

 INFO  - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy;
 SolrDeletionPolicy.onInit: commits: num=1


 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

 INFO  - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy;
 newest commit generation = 1

 INFO  - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor;
 Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/,
 StdNode: http://10.199.46.165:8080/solr/content/]
 params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false

 but the documents won't go to other shards, the other shards only has a
 request with not documents:

 INFO  - 2013-09-30 16:31:43.841;
 org.apache.solr.update.DirectUpdateHandler2; start
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
 SolrDeletionPolicy.onInit: commits: num=1


 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

 INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
 newest commit

documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-10-06 Thread Liu Bo
@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{content_collection:{

shards:{

  shard1:{

   * range:null,*

state:active,

replicas:{core_node1:{

state:active,

core:content,

node_name:10.199.46.176:8080_solr,

base_url:http://10.199.46.176:8080/solr;,

leader:true}}},

  shard3:{

   * range:null,*

state:active,

replicas:{core_node2:{

state:active,

core:content,

node_name:10.199.46.202:8080_solr,

base_url:http://10.199.46.202:8080/solr;,

leader:true}}},

  shard2:{

   * range:null,*

state:active,

replicas:{core_node3:{

state:active,

core:content,

node_name:10.199.46.165:8080_solr,

base_url:http://10.199.46.165:8080/solr;,

leader:true,

*router:implicit*}}



-- 
All the best

Liu Bo


documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-09-30 Thread Liu Bo
:31:43.870;
org.apache.solr.update.processor.LogUpdateProcessor; [content] webapp=/solr
path=/update
params={waitSearcher=truecommit=truewt=javabinexpungeDeletes=falsecommit_end_point=trueversion=2softCommit=false}
{commit=} 0 42

6) later I found the range is null in clusterstate.json which might have
caused the document isn't committed distributively

{content_collection:{

shards:{

  shard1:{

   * range:null,*

state:active,

replicas:{core_node1:{

state:active,

core:content,

node_name:10.199.46.176:8080_solr,

base_url:http://10.199.46.176:8080/solr;,

leader:true}}},

  shard3:{

   * range:null,*

state:active,

replicas:{core_node2:{

state:active,

core:content,

node_name:10.199.46.202:8080_solr,

base_url:http://10.199.46.202:8080/solr;,

leader:true}}},

  shard2:{

   * range:null,*

state:active,

replicas:{core_node3:{

state:active,

core:content,

node_name:10.199.46.165:8080_solr,

base_url:http://10.199.46.165:8080/solr;,

leader:true,

*router:implicit*}}



-- 
All the best

Liu Bo


how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Liu Bo
Hi all

Our system has distributed MySQL databases, we create a database for every
customer signed up and distributed it to one of our MySQL hosts.

We currently use lucene core to perform search on these databases, and we
write java code to loop through these databases and convert the data to
lucene index.

Right now we are planning to move to Solr for distribution, and I am doing
investigation on it.

I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler
in
the wiki page, but I can't figured out a way to use multiple datasoures
with the same schema.

The other question is, we have the database connection data in one table,
can I create datasource connections info from it, and loop through the
databases using DataImporter?

If DataImporter isn't working, is there a way to feed data to solr using
customized SolrRequestHandler without using SolrJ?

If neither of these two ways is working, I think I am going to reuse the
DAO of the old project and feed the data to solr using SolrJ, probably
using embedded Solr server.

Your help will be much of my appreciation.

http://wiki.apache.org/solr/DataImportHandlerFaq--
All the best

Liu Bo