from:"Floyd Wu"

Hi,
I have many XML Message file formatted like this
https://wiki.apache.org/solr/UpdateXmlMessages

These files are generated by my index builder daily.
Currently I am sending these file through http post to Solr but sometimes I
hit OOM exception or pending too many tlog.

Do you have better way to import these files to Solr to build index?

Thanks for the suggestion

Floyd

Re: What is the best approach to send lots of XML Messages to Solr to build index?

Thank you Alex.
I'm doing commit every 100 fiels.
Maybe there is a better way to do this job, something like DIH(possible?)
Sometimes i have much bigger xml file (2MB) and post to SOLR(jetty enabled)
may encounter slow or exceed limitation.

Floyd



2014-06-15 16:48 GMT+08:00 Alexandre Rafalovitch arafa...@gmail.com:

 When are you doing commit? You can issue one manually, have one with
 timeout parameter (commitWithin), or you can configure it to happen
 automatically (in solrconfig.xml).

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Sun, Jun 15, 2014 at 3:44 PM, Floyd Wu floyd...@gmail.com wrote:
  Hi,
  I have many XML Message file formatted like this
  https://wiki.apache.org/solr/UpdateXmlMessages
 
  These files are generated by my index builder daily.
  Currently I am sending these file through http post to Solr but
 sometimes I
  hit OOM exception or pending too many tlog.
 
  Do you have better way to import these files to Solr to build index?
 
  Thanks for the suggestion
 
  Floyd

Re: What is the best approach to send lots of XML Messages to Solr to build index?

Hi Mikhail,
What is the pros. to disable tlog?
Each of my xml file contained to doc, one is main content and the other is
acl.
Currently I'm not using SolrCloud due to my poor understanding of this
architecture and pros/cons.
The main system is developed using .Net C# so using SolrJ won't be a
solution.

Floyd



2014-06-15 18:14 GMT+08:00 Mikhail Khludnev mkhlud...@griddynamics.com:

 Hello Floyd,

 Did you consider to disable tlog?
 Does a file consist of many docs?
 Do you have SolrCloud? Do you use just sh/curl or have a java program?
 DIH is not really performant so far. Submitting roughly ten huge files in
 parallel is a way to perform good. Once again, nuke tlog.


 On Sun, Jun 15, 2014 at 12:44 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi,
  I have many XML Message file formatted like this
  https://wiki.apache.org/solr/UpdateXmlMessages
 
  These files are generated by my index builder daily.
  Currently I am sending these file through http post to Solr but
 sometimes I
  hit OOM exception or pending too many tlog.
 
  Do you have better way to import these files to Solr to build index?
 
  Thanks for the suggestion
 
  Floyd
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com

Re: What is the best approach to send lots of XML Messages to Solr to build index?

Hi Erick, Thanks for your advice. autoCommit is configured 30 sec in my
environment.
i'm using C# to develop main system and Solr as a service, so using SolrJ
would consider as impossible(for now).
I;m seeking the better way to directly input(import) the offline generated
XML to build index.
Currently i'm using my own C# code to send these xml files one by one
through HTTP but result poor performance. (parallel will hit OOM or
generate lots of tlog files).

Actually a main question is what is the best(better) way to rebuild whole
index from scratch.

Floyd

2014-06-15 23:59 GMT+08:00 Erick Erickson erickerick...@gmail.com:

A couple of things:

Consider indexing them with SolrJ, here's a place to get started:
http://searchhub.org/2012/02/14/indexing-with-solrj/. Especially if you
use a SAX-based parser you have more control over memory consumption, it's
on the client after all. And, you can rack together as many clients all
going to Solr as you need.

Here's a bunch of information about tlogs and commits that might be
useful background.

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
.
Consider setting your autoCommit interval quite short (15 seconds)
with openSearcher set to false. That'll truncate your tlog, although
how that relates to your error is something of a mystery to me...

Best,
Erick

On Sun, Jun 15, 2014 at 3:14 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Hello Floyd,

Did you consider to disable tlog?
Does a file consist of many docs?
Do you have SolrCloud? Do you use just sh/curl or have a java program?
DIH is not really performant so far. Submitting roughly ten huge files in
parallel is a way to perform good. Once again, nuke tlog.

On Sun, Jun 15, 2014 at 12:44 PM, Floyd Wu floyd...@gmail.com wrote:

Hi,
I have many XML Message file formatted like this
https://wiki.apache.org/solr/UpdateXmlMessages

These files are generated by my index builder daily.
Currently I am sending these file through http post to Solr but
sometimes I
hit OOM exception or pending too many tlog.

Do you have better way to import these files to Solr to build index?

Thanks for the suggestion

Floyd

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Re: What is the best approach to send lots of XML Messages to Solr to build index?

Hi Shawn,
I've tried to set 4GB heap for Solr and the OOM exception rellay get reduce
and also performance gained.

Floyd



2014-06-16 0:00 GMT+08:00 Shawn Heisey s...@elyograg.org:

 On 6/15/2014 2:54 AM, Floyd Wu wrote:
  Thank you Alex.
  I'm doing commit every 100 fiels.
  Maybe there is a better way to do this job, something like DIH(possible?)
  Sometimes i have much bigger xml file (2MB) and post to SOLR(jetty
 enabled)
  may encounter slow or exceed limitation.

 If you are getting OOM exceptions on your Solr server, then you need to
 increase your Java heap size.  I have never seen the pending too many
 error that you mentioned at the beginning of the thread, and Google
 didn't turn up anything useful, so I don't know what needs to be done
 for that.  If you can post the entire exception stacktrace for this
 error, perhaps we can figure it out.  We would also need the exact Solr
 version.

 Solr has a default 2MB limit on POST requests.  This can be increased
 with the formdataUploadLimitInKB parameter on the requestDispatcher tag
 in solrconfig.xml -- assuming that you're running 4.1 or later.
 Previous versions required changing the request size in the servlet
 container config, but there was a bug in the example Jetty included in
 4.0.0 that made it impossible to change the size.

 https://bugs.eclipse.org/bugs/show_bug.cgi?id=397130
 https://issues.apache.org/jira/browse/SOLR-4223

 Regarding the advice to disable the updateLog: I never like to do this,
 but if you are sending a very large number of updates in a single
 request, it might be advisable until indexing is complete, so that Solr
 restart times are not excessive.

 Thanks,
 Shawn

Re: ranking retrieval measure

2014-04-01 Thread Floyd Wu

Usually IR system is measured using Precision  Recall.
But depends on what kind of system you are developing to fit what scenario.

Take a look
http://en.wikipedia.org/wiki/Precision_and_recall



2014-04-01 10:23 GMT+08:00 azhar2007 azhar2...@outlook.com:

 Hi people. Ive developed a search engine to implement and improve it using
 another search engine as a test case. Now I want to compare and test
 results
 from both to determine which is better. I am unaware of how to do this so
 someone please point me in the right direction.

 Regards



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ranking-retrieval-measure-tp4128324.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to index 20 MB plain-text xml

2014-03-31 Thread Floyd Wu

Hi Alex,

Thanks for your responding. Personally I don't want to feed these big xml
to solr. But users wants.
I'll try your suggestions later.

Many thanks.

Floyd



2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch arafa...@gmail.com:

 Without digging too deep into why exactly this is happening, here are
 the general options:

 0. Are you actually committing? Check the messages in the logs and see
 if the records show up when you expect them too.
 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
 buffer that's blowing up? Try using stream.file instead (notice
 security warning though): http://wiki.apache.org/solr/ContentStream
 2. Split file into smaller ones and and commit each separately
 3. Set hard auto-commit in solrconfig.xml based on number of documents
 to flush in-memory structures to disk
 4. Switch to using DataImportHandler to pull from XML instead of pushing
 5. Increase amount of memory to Solr (-X command line flags)

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency

 On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu floyd...@gmail.com wrote:
  I have many plain text xml that I transfer to form of solr xml format.
  But every time I send them to solr, I hit OOM exception.
  How to configure solr to eat these big xml?
  Please guide me a way. Thanks
 
  floyd

Re: how to index 20 MB plain-text xml

2014-03-31 Thread Floyd Wu

Hi Upayavira,
User don't hit solr directly, the search documents through my application.
The application is a entrance for user to upload documents and then indexed
by solr.
the situation is they upload a plain-text, something like dictionary. You
know, that dictionary is something big.
I'm trying to figure out some good technique before I can split these xml
to small one and streaming to solr.

Floyd



2014-04-01 2:55 GMT+08:00 Upayavira u...@odoko.co.uk:

 Tell the user they can't have!

 Or, write a small app that reads in their XML in one go, and pushes it
 in parts to Solr. Generally, I'd say letting a user hit Solr directly is
 a bad thing - especially a user who doesn't know the details of how Solr
 works.

 Upayavira

 On Mon, Mar 31, 2014, at 07:17 AM, Floyd Wu wrote:
  Hi Alex,
 
  Thanks for your responding. Personally I don't want to feed these big xml
  to solr. But users wants.
  I'll try your suggestions later.
 
  Many thanks.
 
  Floyd
 
 
 
  2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch arafa...@gmail.com:
 
   Without digging too deep into why exactly this is happening, here are
   the general options:
  
   0. Are you actually committing? Check the messages in the logs and see
   if the records show up when you expect them too.
   1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
   buffer that's blowing up? Try using stream.file instead (notice
   security warning though): http://wiki.apache.org/solr/ContentStream
   2. Split file into smaller ones and and commit each separately
   3. Set hard auto-commit in solrconfig.xml based on number of documents
   to flush in-memory structures to disk
   4. Switch to using DataImportHandler to pull from XML instead of
 pushing
   5. Increase amount of memory to Solr (-X command line flags)
  
   Regards,
  Alex.
  
   Personal website: http://www.outerthoughts.com/
   Current project: http://www.solr-start.com/ - Accelerating your Solr
   proficiency
  
   On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu floyd...@gmail.com wrote:
I have many plain text xml that I transfer to form of solr xml
 format.
But every time I send them to solr, I hit OOM exception.
How to configure solr to eat these big xml?
Please guide me a way. Thanks
   
floyd

how to index 20 MB plain-text xml

2014-03-30 Thread Floyd Wu

I have many plain text xml that I transfer to form of solr xml format.
But every time I send them to solr, I hit OOM exception.
How to configure solr to eat these big xml?
Please guide me a way. Thanks

floyd

DocValues uasge and senarios?

2013-11-20 Thread Floyd Wu

Hi there,

I'm not fully understand what kind of usage example that DocValues can be
used?

When I set field docValues=true, do i need to change anyhting in xml that I
sent to solr for indexing?

Please point me.

Thanks

Floyd

PS: I've googled and read lots of DocValues discussion but confused.

Re: DocValues uasge and senarios?

2013-11-20 Thread Floyd Wu

Hi Yago

Thanks for you reply. I once thought that DocValues feature is one for me
to store some extra values.

May I summarized that DocValues is a feature that speed up sorting and
faceting?

Floyd



2013/11/20 Yago Riveiro yago.rive...@gmail.com

 Hi Floyd,

 DocValues are useful for sorting and faceting per example.

 You don't need to change nothing in your xml's, the only thing that you
 need to do is set the docValues=true in your field definition in the schema.

 If you don't want use the default implementation (all loaded in the heap),
 you need to add the tag codecFactory class=solr.SchemaCodecFactory/ in
 the solrconfig.xml and the docValuesFormat=true on the fieldType definition.

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Wednesday, November 20, 2013 at 9:38 AM, Floyd Wu wrote:

  Hi there,
 
  I'm not fully understand what kind of usage example that DocValues can be
  used?
 
  When I set field docValues=true, do i need to change anyhting in xml
 that I
  sent to solr for indexing?
 
  Please point me.
 
  Thanks
 
  Floyd
 
  PS: I've googled and read lots of DocValues discussion but confused.

Re: DocValues uasge and senarios?

2013-11-20 Thread Floyd Wu

Thanks Yago,

I've read this article
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
But I don't understand well.
I'll try to figure out the missing part. Thanks for helping.

Floyd




2013/11/20 Yago Riveiro yago.rive...@gmail.com

 You should understand the DocValues as feature that allow you to do
 sorting and faceting without blow the heap.

 They are not necessary faster than the traditional method, they are more
 memory efficient and in huge indexes this is the main limitation.

 This post resumes the docvalues feature and the main goals
 http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Wednesday, November 20, 2013 at 10:15 AM, Floyd Wu wrote:

  Hi Yago
 
  Thanks for you reply. I once thought that DocValues feature is one for me
  to store some extra values.
 
  May I summarized that DocValues is a feature that speed up sorting and
  faceting?
 
  Floyd
 
 
 
  2013/11/20 Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
 
   Hi Floyd,
  
   DocValues are useful for sorting and faceting per example.
  
   You don't need to change nothing in your xml's, the only thing that you
   need to do is set the docValues=true in your field definition in the
 schema.
  
   If you don't want use the default implementation (all loaded in the
 heap),
   you need to add the tag codecFactory
 class=solr.SchemaCodecFactory/ in
   the solrconfig.xml and the docValuesFormat=true on the fieldType
 definition.
  
   --
   Yago Riveiro
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
   On Wednesday, November 20, 2013 at 9:38 AM, Floyd Wu wrote:
  
Hi there,
   
I'm not fully understand what kind of usage example that DocValues
 can be
used?
   
When I set field docValues=true, do i need to change anyhting in xml
   that I
sent to solr for indexing?
   
Please point me.
   
Thanks
   
Floyd
   
PS: I've googled and read lots of DocValues discussion but confused.

Re: Lots of tlog files remained, why?

2013-11-05 Thread Floyd Wu

Hi Eric,

Sorry for replay being late.
The tlog file stay there for one week and seems no decease. Most of them
are 3~5 MB and totally 40MB.

The article your point I've read many times but no working. Everytime I
reindex files solr generate many tlog of them and no matter how many hard
commit I di , tlog still there.

I'm using Solr 4.3.2 on Windown server 2003 32bit enviroment.
What else detail information should I provide, please le me know.

PS: Should I ugrade Solr to 4.5.1?

Floyd

2013/11/4 Erick Erickson erickerick...@gmail.com

What is your commit strategy? A hard commit
(openSearcher=true or false doesn't matter)
should close the current tlog file, open
a new one and delete old ones. That said, there
will be enough tlog files kept around to hold at
least 100 documents. So if you're committing
too often (say after every document or something),
you can expect to have a bunch around. The
real question is whether they stay around forever
or not. If you index more documents, do old ones
disappear?

Here's a writerup:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If that doesn't help, what version of Solr? How
big are you tlog files? Details matter.

Best,
Erick

On Sun, Nov 3, 2013 at 10:03 AM, Floyd Wu floyd...@gmail.com wrote:

After re-index 2 xml files and done commit, optimization many times,
I
still have many tlog files in data/tlof directory.

Why?

How to remove those files(delete them directly or just ignored them?)

What is the difference if tlog files exist or not?

Please kindly guide me.

Thanks

Floyd

Lots of tlog files remained, why?

2013-11-03 Thread Floyd Wu

After re-index 2 xml files and done commit, optimization many times, I
still have many tlog files in data/tlof directory.

Why?

How to remove those files(delete them directly or just ignored them?)

What is the difference if tlog files exist or not?

Please kindly guide me.

Thanks

Floyd

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu

After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
2DA012_ISO MARK 2 and can be search by term 2DA012?

What if I just want _ to be removed both query/index time, what and how to
configure?

Floyd

2013/8/22 Floyd Wu floyd...@gmail.com

Thank you all.
By the way, Jack I gonna by your book. Where to buy?
Floyd

2013/8/22 Jack Krupansky j...@basetechnology.com

I thought that the StandardTokenizer always split on punctuation,

Proving that you haven't read my book! The section on the standard
tokenizer details the rules that the tokenizer uses (in addition to
extensive examples.) That's what I mean by deep dive.

-- Jack Krupansky

-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?

On 8/21/2013 7:54 PM, Floyd Wu wrote:

When using StandardAnalyzer to tokenize string Pacific_Rim will get

ST
textraw_**bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

How to make this string to be tokenized to these two tokens Pacific,
Rim?
Set _ as stopword?
Please kindly help on this.
Many thanks.

Interesting. I thought that the StandardTokenizer always split on
punctuation, but apparently that's not the case for the underscore
character.

You can always use the WordDelimeterFilter after the StandardTokenizer.

http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Thanks,
Shawn

Re: How to avoid underscore sign indexing problem?

2013-08-22 Thread Floyd Wu

Alright, thanks for all your help. I finally fix this problem using
PatternReplaceFilterFactory + WordDelimeterfilterFactory.

I first replace _ (underscore) using PatternReplaceFilterFactory and then
using WordDelimeterFilterFactory to generate word and number part to
increase user search hit. Although this decrease search quality a little,
but user need higher recall rate than precision.

Thank you all.

Floyd

2013/8/22 Floyd Wu floyd...@gmail.com

After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
2DA012_ISO MARK 2 and can be search by term 2DA012?

What if I just want _ to be removed both query/index time, what and how to
configure?

Floyd

2013/8/22 Floyd Wu floyd...@gmail.com

Thank you all.
By the way, Jack I gonna by your book. Where to buy?
Floyd

2013/8/22 Jack Krupansky j...@basetechnology.com

I thought that the StandardTokenizer always split on punctuation,

Proving that you haven't read my book! The section on the standard
tokenizer details the rules that the tokenizer uses (in addition to
extensive examples.) That's what I mean by deep dive.

-- Jack Krupansky

-Original Message- From: Shawn Heisey
Sent: Wednesday, August 21, 2013 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: How to avoid underscore sign indexing problem?

On 8/21/2013 7:54 PM, Floyd Wu wrote:

When using StandardAnalyzer to tokenize string Pacific_Rim will get

ST
textraw_**bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

How to make this string to be tokenized to these two tokens Pacific,
Rim?
Set _ as stopword?
Please kindly help on this.
Many thanks.

Interesting. I thought that the StandardTokenizer always split on
punctuation, but apparently that's not the case for the underscore
character.

You can always use the WordDelimeterFilter after the StandardTokenizer.

http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Thanks,
Shawn

How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu

When using StandardAnalyzer to tokenize string Pacific_Rim will get

ST
textraw_bytesstartendtypeposition
pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

How to make this string to be tokenized to these two tokens Pacific,
Rim?
Set _ as stopword?
Please kindly help on this.
Many thanks.

Floyd

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu

Thank you all.
By the way, Jack I gonna by your book. Where to buy?
Floyd


2013/8/22 Jack Krupansky j...@basetechnology.com

 I thought that the StandardTokenizer always split on punctuation, 

 Proving that you haven't read my book! The section on the standard
 tokenizer details the rules that the tokenizer uses (in addition to
 extensive examples.) That's what I mean by deep dive.

 -- Jack Krupansky

 -Original Message- From: Shawn Heisey
 Sent: Wednesday, August 21, 2013 10:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to avoid underscore sign indexing problem?


 On 8/21/2013 7:54 PM, Floyd Wu wrote:

 When using StandardAnalyzer to tokenize string Pacific_Rim will get

 ST
 textraw_**bytesstartendtypeposition
 pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011ALPHANUM1

 How to make this string to be tokenized to these two tokens Pacific,
 Rim?
 Set _ as stopword?
 Please kindly help on this.
 Many thanks.


 Interesting.  I thought that the StandardTokenizer always split on
 punctuation, but apparently that's not the case for the underscore
 character.

 You can always use the WordDelimeterFilter after the StandardTokenizer.

 http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
 WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

 Thanks,
 Shawn

Switch to new leader transparently?

Hi there,

I've built a SolrCloud cluster from example, but I have some question.
When I send query to one leader (say
http://xxx.xxx.xxx.xxx:8983/solr/collection1) and no problem everything
will be fine.

When I shutdown that leader, the other replica(
http://xxx.xxx.xxx.xxx:9983/solr/collection1) in the some shard will be new
leader. The problem is:

The application doesn't know new leader's location and still send request
to http://xxx.xxx.xxx.xxx:8983/solr/collection1 and of course no response.

How can I know new leader in my application?
Are there any mechanism that application can send request to one fixed
endpoint no matter who is leader?

For example, application just send to
http://xxx.xxx.xxx.xxx:8983/solr/collection1
even the real leader run on http://xxx.xxx.xxx.xxx:9983/solr/collection1

Please help on this or give me some key infomation to google it.

Many thanks.

Floyd

Re: Switch to new leader transparently?

Hi anshum
Thanks for your response.
My application is developed using C#, so I can't use  CloudSolrServer with
SolrJ.

My problem is there is a setting in my application

SolrUrl = http://xxx.xxx.xxx.xxx:8983/solr/collection1

When this Solr instance shutdown or crash, I have to change this setting.
I've read source code of CloudSolrServer.java in SolrJ just few minutes ago.

It seems to that CloudSolrServer first read cluster state from zk ( or some
live node)
to retrieve info and then use this info to decide which node to send
request.

Maybe I have to modify my application to mimic CloudSolrServer impl.

Any idea?

Floyd




2013/7/10 Anshum Gupta ans...@anshumgupta.net

 You don't really need to direct any query specifically to a leader. It will
 automatically be routed to the right leader.
 You may put a load balancer on top to just fix the problem with querying a
 node that has gone away.

 Also, ZK aware SolrJ Java client that load-balances across all nodes in
 cluster.


 On Wed, Jul 10, 2013 at 2:52 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi there,
 
  I've built a SolrCloud cluster from example, but I have some question.
  When I send query to one leader (say
  http://xxx.xxx.xxx.xxx:8983/solr/collection1) and no problem everything
  will be fine.
 
  When I shutdown that leader, the other replica(
  http://xxx.xxx.xxx.xxx:9983/solr/collection1) in the some shard will be
  new
  leader. The problem is:
 
  The application doesn't know new leader's location and still send request
  to http://xxx.xxx.xxx.xxx:8983/solr/collection1 and of course no
 response.
 
  How can I know new leader in my application?
  Are there any mechanism that application can send request to one fixed
  endpoint no matter who is leader?
 
  For example, application just send to
  http://xxx.xxx.xxx.xxx:8983/solr/collection1
  even the real leader run on http://xxx.xxx.xxx.xxx:9983/solr/collection1
 
  Please help on this or give me some key infomation to google it.
 
  Many thanks.
 
  Floyd
 



 --

 Anshum Gupta
 http://www.anshumgupta.net

Re: Switch to new leader transparently?

Hi Furkan
I'm using C#,  SolrJ won't help on this, but its impl is a good reference
for me. Thanks for your help.

by the way, how to fetch/get cluster state from zk directly in plain http
or tcp socket?
In my SolrCloud cluster, I'm using standalone zk to coordinate.

Floyd




2013/7/10 Furkan KAMACI furkankam...@gmail.com

 You can define a CloudSolrServer as like that:

 *private static CloudSolrServer solrServer;*

 and then define the addres of your zookeeper host:

 *private static String zkHost = localhost:9983;*

 initialize your variable:

 *solrServer = new CloudSolrServer(zkHost);*

 You can get leader list as like:

 *ClusterState clusterState =
 cloudSolrServer.getZkStateReader().getClusterState();
 ListReplica leaderList = new ArrayList();
   for (Slice slice : clusterState.getSlices(collectionName)) {
   leaderList.add(slice.getLeader()); /
 }*


 For querying you can try that:
 *
 *
 *SolrQuery solrQuery = new SolrQuery();*
 *//fill your **solrQuery variable here**
 *
 *QueryRequest queryRequest = new QueryRequest(solrQuery,
 SolrRequest.METHOD.POST);
 queryRequest.process(**solrServer**);*

 CloudSolrServer uses LBHttpSolrServer by default. It's definiton is like
 that: *LBHttpSolrServer or Load Balanced HttpSolrServer is just a wrapper
 to CommonsHttpSolrServer. This is useful when you have multiple SolrServers
 and query requests need to be Load Balanced among them. It offers automatic
 failover when a server goes down and it detects when the server comes back
 up.*
 *
 *
 *
 *

 2013/7/10 Anshum Gupta ans...@anshumgupta.net

  You don't really need to direct any query specifically to a leader. It
 will
  automatically be routed to the right leader.
  You may put a load balancer on top to just fix the problem with querying
 a
  node that has gone away.
 
  Also, ZK aware SolrJ Java client that load-balances across all nodes in
  cluster.
 
 
  On Wed, Jul 10, 2013 at 2:52 PM, Floyd Wu floyd...@gmail.com wrote:
 
   Hi there,
  
   I've built a SolrCloud cluster from example, but I have some question.
   When I send query to one leader (say
   http://xxx.xxx.xxx.xxx:8983/solr/collection1) and no problem
 everything
   will be fine.
  
   When I shutdown that leader, the other replica(
   http://xxx.xxx.xxx.xxx:9983/solr/collection1) in the some shard will
 be
   new
   leader. The problem is:
  
   The application doesn't know new leader's location and still send
 request
   to http://xxx.xxx.xxx.xxx:8983/solr/collection1 and of course no
  response.
  
   How can I know new leader in my application?
   Are there any mechanism that application can send request to one fixed
   endpoint no matter who is leader?
  
   For example, application just send to
   http://xxx.xxx.xxx.xxx:8983/solr/collection1
   even the real leader run on
 http://xxx.xxx.xxx.xxx:9983/solr/collection1
  
   Please help on this or give me some key infomation to google it.
  
   Many thanks.
  
   Floyd
  
 
 
 
  --
 
  Anshum Gupta
  http://www.anshumgupta.net

Re: Switch to new leader transparently?

Thanks Aloke, I will do some research.
2013/7/10 下午9:45 於 Aloke Ghoshal alghos...@gmail.com 寫道：

 Hi Floyd,

 We use SolrNet to connect to Solr from a C# application. Since SolrNet is
 not aware about SolrCloud or ZK, we use a Http load balancer in front of
 the Solr nodes  query via the load balancer url. You could use something
 like HAProxy or Apache reverse proxy for load balancing.

 On the other hand in order to write a ZK aware client in C# you could start
 here: https://github.com/ewhauser/zookeeper/tree/trunk/src/dotnet

 Regards,
 Aloke


 On Wed, Jul 10, 2013 at 4:11 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  By the this is not related to your question but this may help you for
  connecting Solr via C#: http://solrsharp.codeplex.com/
 
  2013/7/10 Floyd Wu floyd...@gmail.com
 
   Hi Furkan
   I'm using C#,  SolrJ won't help on this, but its impl is a good
 reference
   for me. Thanks for your help.
  
   by the way, how to fetch/get cluster state from zk directly in plain
 http
   or tcp socket?
   In my SolrCloud cluster, I'm using standalone zk to coordinate.
  
   Floyd
  
  
  
  
   2013/7/10 Furkan KAMACI furkankam...@gmail.com
  
You can define a CloudSolrServer as like that:
   
*private static CloudSolrServer solrServer;*
   
and then define the addres of your zookeeper host:
   
*private static String zkHost = localhost:9983;*
   
initialize your variable:
   
*solrServer = new CloudSolrServer(zkHost);*
   
You can get leader list as like:
   
*ClusterState clusterState =
cloudSolrServer.getZkStateReader().getClusterState();
ListReplica leaderList = new ArrayList();
  for (Slice slice : clusterState.getSlices(collectionName)) {
  leaderList.add(slice.getLeader()); /
}*
   
   
For querying you can try that:
*
*
*SolrQuery solrQuery = new SolrQuery();*
*//fill your **solrQuery variable here**
*
*QueryRequest queryRequest = new QueryRequest(solrQuery,
SolrRequest.METHOD.POST);
queryRequest.process(**solrServer**);*
   
CloudSolrServer uses LBHttpSolrServer by default. It's definiton is
  like
that: *LBHttpSolrServer or Load Balanced HttpSolrServer is just a
   wrapper
to CommonsHttpSolrServer. This is useful when you have multiple
   SolrServers
and query requests need to be Load Balanced among them. It offers
   automatic
failover when a server goes down and it detects when the server comes
   back
up.*
*
*
*
*
   
2013/7/10 Anshum Gupta ans...@anshumgupta.net
   
 You don't really need to direct any query specifically to a leader.
  It
will
 automatically be routed to the right leader.
 You may put a load balancer on top to just fix the problem with
   querying
a
 node that has gone away.

 Also, ZK aware SolrJ Java client that load-balances across all
 nodes
  in
 cluster.


 On Wed, Jul 10, 2013 at 2:52 PM, Floyd Wu floyd...@gmail.com
  wrote:

  Hi there,
 
  I've built a SolrCloud cluster from example, but I have some
   question.
  When I send query to one leader (say
  http://xxx.xxx.xxx.xxx:8983/solr/collection1) and no problem
everything
  will be fine.
 
  When I shutdown that leader, the other replica(
  http://xxx.xxx.xxx.xxx:9983/solr/collection1) in the some shard
  will
be
  new
  leader. The problem is:
 
  The application doesn't know new leader's location and still send
request
  to http://xxx.xxx.xxx.xxx:8983/solr/collection1 and of course no
 response.
 
  How can I know new leader in my application?
  Are there any mechanism that application can send request to one
   fixed
  endpoint no matter who is leader?
 
  For example, application just send to
  http://xxx.xxx.xxx.xxx:8983/solr/collection1
  even the real leader run on
http://xxx.xxx.xxx.xxx:9983/solr/collection1
 
  Please help on this or give me some key infomation to google it.
 
  Many thanks.
 
  Floyd
 



 --

 Anshum Gupta
 http://www.anshumgupta.net

Re: PostingsSolrHighlighter not working on Multivalue field

2013-06-23 Thread Floyd Wu

Any idea can help on this?


2013/6/22 Erick Erickson erickerick...@gmail.com

 Unfortunately, from here I need to leave it to people who know
 the highlighting code

 Erick

 On Wed, Jun 19, 2013 at 8:40 PM, Floyd Wu floyd...@gmail.com wrote:
  Hi Erick,
 
  multivalue is my typo, thanks for your reminding.
 
  There is no log show anything wrong or exception occurred.
 
  The field definition as following
 
  field name=summary type=text indexed=true stored=true
  omitNorms=false termVectors=true termPositions=true
  termOffsets=true storeOffsetsWithPositions=true/
 
  dynamicField name=* type=text indexed=true stored=true
  multiValued=true termVectors=true termPositions=true
  termOffsets=true omitNorms=false storeOffsetsWithPositions=true/
 
  The PostingSolrHighlighter only do highlight on summary field.
 
  When I send a xml file to solr like this
 
  ?xml version=1.0 encoding=utf-8?
  command
add
  doc
field name=summaryfacebook yahoo plurk twitter social
  nextworing/field
field name=body_0facebook yahoo plurk twitter social
  nextworing/field
  /doc
/add
  /command
 
  As you can see the body_0 will be treated using dynamicField definition.
 
  Part of the debug response return of Solr like this
 
  lst name=highlighting
lst name=645
  arr name=summary
stremFacebook/em... emFacebook/em/str
  /arr
  arr name=body_0/
  
  /lst
 
  I'm sure hl.fl contains both summary and body_0.
  This behavior is different between PostingSolrHighlighter and
  FastVectorhighlighter.
 
  Please kindly help on this.
  Many thanks.
 
  Floyd
 
 
 
  2013/6/19 Erick Erickson erickerick...@gmail.com
 
  Well, _how_ does it fail? unless it's a type it should be
  multiValued (not capital 'V'). This probably isn't the
  problem, but just in case.
 
  Anything in the logs? What is the field definition?
  Did you re-index after changing to multiValued?
 
  Best
  Erick
 
  On Tue, Jun 18, 2013 at 11:01 PM, Floyd Wu floyd...@gmail.com wrote:
   In my test case, it seems this new highlighter not working.
  
   When field set multivalue=true, the stored text in this field can not
 be
   highlighted.
  
   Am I miss something? Or this is current limitation? I have no luck to
  find
   any documentations mentioned this.
  
   Floyd

Re: PostingsSolrHighlighter not working on Multivalue field

2013-06-19 Thread Floyd Wu

Hi Erick,

multivalue is my typo, thanks for your reminding.

There is no log show anything wrong or exception occurred.

The field definition as following

field name=summary type=text indexed=true stored=true
omitNorms=false termVectors=true termPositions=true
termOffsets=true storeOffsetsWithPositions=true/

dynamicField name=* type=text indexed=true stored=true
multiValued=true termVectors=true termPositions=true
termOffsets=true omitNorms=false storeOffsetsWithPositions=true/

The PostingSolrHighlighter only do highlight on summary field.

When I send a xml file to solr like this

?xml version=1.0 encoding=utf-8?
command
  add
doc
  field name=summaryfacebook yahoo plurk twitter social
nextworing/field
  field name=body_0facebook yahoo plurk twitter social
nextworing/field
/doc
  /add
/command

As you can see the body_0 will be treated using dynamicField definition.

Part of the debug response return of Solr like this

lst name=highlighting
  lst name=645
arr name=summary
  stremFacebook/em... emFacebook/em/str
/arr
arr name=body_0/

/lst

I'm sure hl.fl contains both summary and body_0.
This behavior is different between PostingSolrHighlighter and
FastVectorhighlighter.

Please kindly help on this.
Many thanks.

Floyd



2013/6/19 Erick Erickson erickerick...@gmail.com

 Well, _how_ does it fail? unless it's a type it should be
 multiValued (not capital 'V'). This probably isn't the
 problem, but just in case.

 Anything in the logs? What is the field definition?
 Did you re-index after changing to multiValued?

 Best
 Erick

 On Tue, Jun 18, 2013 at 11:01 PM, Floyd Wu floyd...@gmail.com wrote:
  In my test case, it seems this new highlighter not working.
 
  When field set multivalue=true, the stored text in this field can not be
  highlighted.
 
  Am I miss something? Or this is current limitation? I have no luck to
 find
  any documentations mentioned this.
 
  Floyd

PostingsSolrHighlighter not working on Multivalue field

2013-06-18 Thread Floyd Wu

In my test case, it seems this new highlighter not working.

When field set multivalue=true, the stored text in this field can not be
highlighted.

Am I miss something? Or this is current limitation? I have no luck to find
any documentations mentioned this.

Floyd

Re: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-06-17 Thread Floyd Wu

Hi Michael, How do I configure posthighlighter with my solr 4.2 box?
Please kindly point me. Many thanks.
2013/6/15 下午10:48 於 Michael McCandless luc...@mikemccandless.com 寫道：

 You could also try the new[ish] PostingsHighlighter:

 http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html

 Mike McCandless

 http://blog.mikemccandless.com


 On Sat, Jun 15, 2013 at 8:50 AM, Michael Sokolov
 msoko...@safaribooksonline.com wrote:
  If you have very large documents (many MB) that can lead to slow
  highlighting, even with FVH.
 
  See https://issues.apache.org/jira/browse/LUCENE-3234
 
  and try setting phraseLimit=1 (or some bigger number, but not infinite,
  which is the default)
 
  -Mike
 
 
 
  On 6/14/13 4:52 PM, Andy Brown wrote:
 
  Bryan,
 
  For specifics, I'll refer you back to my original email where I
  specified all the fields/field types/handlers I use. Here's a general
  overview.
I really only have 3 fields that I index and search against: name,
  description, and content. All of which are just general text
  (string) fields. I have a catch-all field called text that is only
  used for querying. It's indexed but not stored. The name,
  description, and content fields are copied into the text field.
For partial word matching, I have 4 more fields: name_par,
  description_par, content_par, and text_par. The text_par field
  has the same relationship to the *_par fields as text does to the
  others (only used for querying). Those partial word matching fields are
  of type text_general_partial which I created. That field type is
  analyzed different than the regular text field in that it goes through
  an EdgeNGramFilterFactory with the minGramSize=2 and maxGramSize=7
  at index time.
I query against both text and text_par fields using edismax
 deftype
  with my qf set to text^2 text_par^1 to give full word matches a higher
  score. This part returns back very fast as previously stated. It's when
  I turn on highlighting that I take the huge performance hit.
Again, I'm using the FastVectorHighlighting. The hl.fl is set to name
  name_par description description_par content content_par so that it
  returns highlights for full and partial word matches. All of those
  fields have indexed, stored, termPositions, termVectors, and termOffsets
  set to true.
It all seems redundant just to allow for partial word
  matching/highlighting but I didn't know of a better way. Does anything
  stand out to you that could be the culprit? Let me know if you need any
  more clarification.
Thanks!
- Andy
 
  -Original Message-
  From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
  Sent: Wednesday, May 29, 2013 5:44 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Slow Highlighter Performance Even Using
  FastVectorHighlighter
 
  Andy,
 
  I don't understand why it's taking 7 secs to return highlights. The
 
  size
 
  of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set
 
  to
 
  1024 for this verification purpose and that should be more than
 
  enough.
 
  The processor is plenty powerful enough as well.
 
  Running VisualVM shows all my CPU time being taken by mainly these 3
  methods:
 
 
  org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
 
  nfo.getStartOffset()
 
  org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
 
  nfo.getStartOffset()
 
  org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
 
  )
 
  That is a strange and interesting set of things to be spending most of
  your CPU time on. The implication, I think, is that the number of term
  matches in the document for terms in your query (or, at least, terms
  matching exact words or the beginning of phrases in your query) is
  extremely high . Perhaps that's coming from this partial word match
  you
  mention -- how does that work?
 
  -- Bryan
 
  My guess is that this has something to do with how I'm handling
 
  partial
 
  word matches/highlighting. I have setup another request handler that
  only searches the whole word fields and it returns in 850 ms with
  highlighting.
 
  Any ideas?
 
  - Andy
 
 
  -Original Message-
  From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com]
  Sent: Monday, May 20, 2013 1:39 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Slow Highlighter Performance Even Using
  FastVectorHighlighter
 
  My guess is that the problem is those 200M documents.
  FastVectorHighlighter is fast at deciding whether a match, especially
 
  a
 
  phrase, appears in a document, but it still starts out by walking the
  entire list of term vectors, and ends by breaking the document into
  candidate-snippet fragments, both processes that are proportional to
 
  the
 
  length of the document.
 
  It's hard to do much about the first, but for the second you could
  choose
  to expose FastVectorHighlighter's FieldPhraseList representation, and
  return offsets to

Re: Very slow query when boosting involve with EnternalFileField

2013-03-21 Thread Floyd Wu

Anybody can point me a direction?
Many thanks.



2013/3/20 Floyd Wu floyd...@gmail.com

 Hi everyone,

 I have a problem and have no luck to figure out.

 When I issue a query to
 Query 1

 http://localhost:8983/solr/select?q={!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)}allhttp://localhost:8983/solr/select?q=%7B!boost+b=recip(ms(NOW/HOUR,last_modified_datetime),3.16e-11,1,1)%7Dall
 :javastart=0rows=10fl=score,authorsort=score+desc

 Query 2

 http://localhost:8983/solr/select?q={!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}allhttp://localhost:8983/solr/select?q=%7B!boost+b=sum(ranking,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall
 :javastart=0rows=10fl=score,authorsort=score+desc

 The difference between two query is boost.
 The boost function of Query 2 using a field named ranking and this field
 is ExternalFileField.
 External file is key=value pair about 1 lines.

 Execution time
 Query 1--100ms
 Query 2--2300ms

 I tried to issue Query 3 and change ranking to a constant 1

 http://localhost:8983/solr/select?q={!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)}allhttp://localhost:8983/solr/select?q=%7B!boost+b=sum(1,recip(ms(NOW/HOUR,last_modified_datetime)),3.16e-11,1,1)%7Dall
 :javastart=0rows=10fl=score,authorsort=score+desc

 Execution time
 Query 3--110ms

 one thing I can sure that involved with externalFileField will slow down
 query execution time significantly. But I have no idea how to solve this
 problem as my boost function must calculate value of ranking field.

 Please help on this.

 PS: I'm using SOLR-4.1

 Floyd

Re: difference these two queries

2012-12-10 Thread Floyd Wu

Thanks Otis.

When talked about query performance(ignore scoring). To use fq is better?

Floyd


2012/12/11 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 The fq one is a FilterQuery that only does matching, but not scoring. It's
 results are stored in the filter cache, while the q uses the query cache.

 Otis
 --
 SOLR Performance Monitoring - http://sematext.com/spm/index.html





 On Mon, Dec 10, 2012 at 10:11 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi There,
  Sorry for sapmming if this question had already asked.
 
  Wha't the main difference between
 
  q=fieldA:value AND fieldB:value
 
  q=fieldA:valuefq=fieldB:value
 
  both query will give me the same result, I wonder what's the main
  difference and in practice what the better way?
 
  Thanks in advance
 
  Floyd

Difference between 'bf' and 'boost' when using eDismax

2012-12-03 Thread Floyd Wu

Hi there,

I'm not sure if I understand this clearly.

'bf' is that final score will be add some value return by bf?
for example-  score + bf = final score

'boost' is that score will be multiply with value that return by boost?
for example- score * boost = final score

When using both( 'bf' and 'boost')
score * boost + bf = final score

If I would like to make recent created document ranking higher, using 'bf'
or 'boost' will be better solution(Assume bf and boost will use the same
function recip(ms(NOW,datefield),3.16e-11,1,1))?


Please help on this.

Re: Difference between 'bf' and 'boost' when using eDismax

2012-12-03 Thread Floyd Wu

Thanks Jack!
It helps a lots.

Floyd



2012/12/4 Jack Krupansky j...@basetechnology.com

 bf is processed first, then boost.

 All the bf's will be added, then the resulting scores will be boosted by
 the product of all the boost function queries.

 -- Jack Krupansky

 -Original Message- From: Floyd Wu
 Sent: Monday, December 03, 2012 11:00 PM
 To: solr-user@lucene.apache.org
 Subject: Difference between 'bf' and 'boost' when using eDismax


 Hi there,

 I'm not sure if I understand this clearly.

 'bf' is that final score will be add some value return by bf?
 for example-  score + bf = final score

 'boost' is that score will be multiply with value that return by boost?
 for example- score * boost = final score

 When using both( 'bf' and 'boost')
 score * boost + bf = final score

 If I would like to make recent created document ranking higher, using 'bf'
 or 'boost' will be better solution(Assume bf and boost will use the same
 function recip(ms(NOW,datefield),3.16e-**11,1,1))?


 Please help on this.

Re: Dynamic ranking based on search term

2012-11-28 Thread Floyd Wu

Hi Upayavira

Let me explain what I need in the other words.

The list is the result that after analyzing log.
Key value pairs list actually means that when search term is java, then
boosting these documents(doc1,doc2,doc5).
for example
  java, doc1,doc2,doc5

Any ideas?

Thanks.




2012/11/28 Upayavira u...@odoko.co.uk

 Isn't this what Solr/Lucene are designed to do??

 On indexing a document, Lucene creates an inverted index, mapping terms
 back to their containing documents. The data you have is already
 inverted.

 I'd suggest uninverting it and then hand it to Solr in that format,
 thus:

 doc1: java
 doc2: java
 doc4: book
 doc5: java
 doc9: book
 doc77: book

 With that structure, you'll have in your index exactly what Solr
 expects, and will be able to take advantage of the inbuilt ranking
 capabilities of Lucene and Solr.

 Upayavira

 On Wed, Nov 28, 2012, at 10:15 AM, Floyd Wu wrote:
  Hi there,
 
  If I have a list that is key-value pair in text filed or database table.
  How do I achieve dynamic ranking based on search term? That say when user
  search term java and doc1,doc2, doc5 will get higher ranking.
 
  for example( key is search term, value is related index document unique
  key):
  ==
  key, value
  ==
  java, doc1,doc2,doc5
  book, doc9, doc4,doc77
  ==
 
  I've finished implementation using externalFileField to do ranking, but
  in
  this way, ranking is static.
 
  Please kindly point me a way to do this.
 
  PS: SearchComponent maybe?

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-20 Thread Floyd Wu

Hi Chris,

Thanks! Before your great suggestions, I give up using function query to
calculate product of score and rankingField and using exactly the same with
your boost query solution. Of course it works fine. The next step will be
design suitable function to output a ranking value that also consider with
popularity, recency, relevance and also rating of documents.

Many thanks to the community.

Floyd



2012/11/21 Chris Hostetter hossman_luc...@fucit.org


 : But the sort=product(score, rankingField) is not working in my test. What
 : probably wrong?

 the problem is score is not a field or a function -- Solr doesn't know
 exactly what score you want it to use there (scores from which query?)

 You either need to refrence the query in the function (using the
 query(...) function) or you need to incorporate your function directly
 into the score (using something like the boost QParser).

 Unless you need the score of the docs, from your orriginal query, to be
 returned in the fl, or used in some other clause of your sort, i would
 suggest using the boost parser -- that way your final scores will match
 the scores you computed with the function...

qq=your original query
 q={!boost b=rankingField v=$qq}


 https://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/search/BoostQParserPlugin.html
 https://people.apache.org/~hossman/ac2012eu/


 -Hoss

Re: Custom ranking solutions?

2012-11-20 Thread Floyd Wu

Hi Dan,

Thanks! I'm using boost query to solve this problem.

Floyd




2012/11/21 Daniel Rosher rosh...@gmail.com

 Hi

 The product function query needs a valuesource, not the pseudo score field.

 You probably need something like (with Solr 4.0):

 q={!lucene}*:*sort=product(query($q),2) desc,score
 descfl=score,_score_:product(query($q),2),[explain]

 Cheers,
 Dan

 On Tue, Nov 20, 2012 at 2:29 AM, Floyd Wu floyd...@gmail.com wrote:

  Hi there,
 
  Before ExternalFielField introduced, change document boost value to
 achieve
  custom ranking. My client app will update each boost value for documents
  daily and seem to worked fine.
  Actual ranking could be predicted based on boost value. (value is
  calculated based on click, recency, and rating ).
 
  I'm now try to use ExternalFileField to do some ranking, after some
 test, I
  did not get my expectation.
 
  I'm doing a sort like this
 
  sort=product(score,abs(rankingField))+desc
  But the query result ranking won't change anyway.
 
  The external file as following
  doc1=3
  doc2=5
  doc3=9
 
  The original score get from Solr result as fllowing
  doc1=41.042
  doc2=10.1256
  doc3=8.2135
 
  Expected ranking
  doc1
  doc3
  doc2
 
  What wrong in my test, please kindly help on this.
 
  Floyd

Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi  there,

I have a field(which is externalFileField, called rankingField) and that
value(type=float) is calculated by client app.

For the solr original scoring model, affect boost value will result
different ranking. So I think product(score,rankingField) may equivalent to
solr scoring model.

What I curious is which will be better in practice and the different
meanings on these three solutions?

1. sort=score+desc,ranking+desc
2. sort=ranking+desc,score+desc
3. sort=product(score,ranking) --is this possible?

I'd like to hear your thoughts.

Many thanks

Floyd

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Thanks Otis,

But the sort=product(score, rankingField) is not working in my test. What
probably wrong?

Floyd


2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 3. yes, you can sort by function -
 http://search-lucene.com/?q=solr+sort+by+function
 2. this will sort by score only when there is a tie in ranking (two docs
 have the same rank value)
 1. the reverse of 2.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi  there,
 
  I have a field(which is externalFileField, called rankingField) and that
  value(type=float) is calculated by client app.
 
  For the solr original scoring model, affect boost value will result
  different ranking. So I think product(score,rankingField) may equivalent
 to
  solr scoring model.
 
  What I curious is which will be better in practice and the different
  meanings on these three solutions?
 
  1. sort=score+desc,ranking+desc
  2. sort=ranking+desc,score+desc
  3. sort=product(score,ranking) --is this possible?
 
  I'd like to hear your thoughts.
 
  Many thanks
 
  Floyd

Re: Custom ranking solutions?

HI Otis,
The debug information as following, seems there is no product() process .

lst name=debug
str name=rawquerystring_l_all:測試/str
str name=querystring_l_all:測試/str
str name=parsedqueryPhraseQuery(_l_all:測 試)/str
str name=parsedquery_toString_l_all:測 試/str
lst name=explain
str name=222
41.11747 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 41.11747 = fieldWeight in 0, product of: 4.1231055 = tf(freq=17.0),
with freq of: 17.0 = phraseFreq=17.0 1.4246359 = idf(), sum of: 0.71231794
= idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 7.0 =
fieldNorm(doc=0)
/str
str name=223
14.246359 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 14.246359 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = phraseFreq=1.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 10.0 =
fieldNorm(doc=0)
/str
str name=211
10.073696 = (MATCH) weight(_l_all:測 試 in 0) [DefaultSimilarity], result
of: 10.073696 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0),
with freq of: 2.0 = phraseFreq=2.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 5.0 =
fieldNorm(doc=0)
/str
/lst
str name=QParserLuceneQParser/str
lst name=timing
double name=time6.0/double
lst name=prepare
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time6.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time3.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time3.0/double
/lst
/lst
/lst
/lst


2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi Floyd,

 Use debugQuery=true and let's see it.:)

 Otis
 --
 Performance Monitoring - http://sematext.com/spm/index.html
 Search Analytics - http://sematext.com/search-analytics/index.html




 On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu floyd...@gmail.com wrote:

  Hi there,
 
  Before ExternalFielField introduced, change document boost value to
 achieve
  custom ranking. My client app will update each boost value for documents
  daily and seem to worked fine.
  Actual ranking could be predicted based on boost value. (value is
  calculated based on click, recency, and rating ).
 
  I'm now try to use ExternalFileField to do some ranking, after some
 test, I
  did not get my expectation.
 
  I'm doing a sort like this
 
  sort=product(score,abs(rankingField))+desc
  But the query result ranking won't change anyway.
 
  The external file as following
  doc1=3
  doc2=5
  doc3=9
 
  The original score get from Solr result as fllowing
  doc1=41.042
  doc2=10.1256
  doc3=8.2135
 
  Expected ranking
  doc1
  doc3
  doc2
 
  What wrong in my test, please kindly help on this.
 
  Floyd

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

Hi Otis,

There is no error in console nor in log file. I'm using Solr-4.0.
The External file name is external_rankingField.txt and exist is directory
C:\solr-4.0.0\example\solr\collection1\data\external_rankingField.txt

External file should work as well because when I issue query
sort=sqrt(rankingField)+desc or sort=sqrt(rankingField)+asc or
sort=sqrt(rankingField)+desc

Things will change accordingly.

By the way, I first try external field according document here
http://lucidworks.lucidimagination.com/display/solr/Working+with+External+Files+and+Processes

Format of the External File

The file itself is located in Solr's index directory, which by default is
$SOLR_HOME/data/index. The name of the file should beexternal_*fieldname*
or external_*fieldname*.*. For the example above, then, the file could be
named external_entryRankFile orexternal_entryRankFile.txt.

But actually the external file should put in
$SOLR_HOME/data/

Floyd

2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

Do you see any errors?
Which version of Solr?
What does debugQuery=true say?
Are you sure your file with ranks is being used? (remove it, put some junk
in it, see if that gives an error)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu floyd...@gmail.com wrote:

Thanks Otis,

But the sort=product(score, rankingField) is not working in my test. What
probably wrong?

Floyd

2012/11/20 Otis Gospodnetic otis.gospodne...@gmail.com

Hi,

3. yes, you can sort by function -
http://search-lucene.com/?q=solr+sort+by+function
2. this will sort by score only when there is a tie in ranking (two
docs
have the same rank value)
1. the reverse of 2.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html

On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu floyd...@gmail.com wrote:

Hi there,

I have a field(which is externalFileField, called rankingField) and
that
value(type=float) is calculated by client app.

For the solr original scoring model, affect boost value will result
different ranking. So I think product(score,rankingField) may
equivalent
to
solr scoring model.

What I curious is which will be better in practice and the different
meanings on these three solutions?

1. sort=score+desc,ranking+desc
2. sort=ranking+desc,score+desc
3. sort=product(score,ranking) --is this possible?

I'd like to hear your thoughts.

Many thanks

Floyd

Re: Custom ranking solutions?