date:20120217

Re: Solr edismax clarification

2012-02-17 Thread Jan Høydahl

Please provide your full query, including your qf parameter and all other 
request parameters, and also the relevant fields/field-types from schema. Do 
you use stopwords? Can you also add debugQuery=true and paste in the 
parsedQuery?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 16. feb. 2012, at 18:07, Indika Tantrigoda wrote:

 Hi All,
 
 I am using edismax SearchHandler in my search and I have some issues in the
 search results. As I understand if the defaultOperator is set to OR the
 search query will be passed as  - The OR quick OR brown OR fox implicitly.
 However if I search for The quick brown fox, I get lesser results than
 explicitly adding the OR. Another issue is that if I search for The quick
 brown fox other documents that contain the word fox is not in the search
 results.
 
 Thanks.

Re: custom scoring

2012-02-17 Thread Carlos Gonzalez-Cadenas

Thanks Em, Robert, Chris for your time and valuable advice. We'll make some
tests and will let you know soon.



On Thu, Feb 16, 2012 at 11:43 PM, Em mailformailingli...@yahoo.de wrote:

 Hello Carlos,

 I think we missunderstood eachother.

 As an example:
 BooleanQuery (
  clauses: (
 MustMatch(
   DisjunctionMaxQuery(
   TermQuery(stopword_field, barcelona),
   TermQuery(stopword_field, hoteles)
   )
 ),
 ShouldMatch(
  FunctionQuery(
*please insert your function here*
 )
 )
  )
 )

 Explanation:
 You construct an artificial BooleanQuery which wraps your user's query
 as well as your function query.
 Your user's query - in that case - is just a DisjunctionMaxQuery
 consisting of two TermQueries.
 In the real world you might construct another BooleanQuery around your
 DisjunctionMaxQuery in order to have more flexibility.
 However the interesting part of the given example is, that we specify
 the user's query as a MustMatch-condition of the BooleanQuery and the
 FunctionQuery just as a ShouldMatch.
 Constructed that way, I am expecting the FunctionQuery only scores those
 documents which fit the MustMatch-Condition.

 I conclude that from the fact that the FunctionQuery-class also has a
 skipTo-method and I would expect that the scorer will use it to score
 only matching documents (however I did not search where and how it might
 get called).

 If my conclusion is wrong than hopefully Robert Muir (as far as I can
 see the author of that class) can tell us what was the intention by
 constructing an every-time-match-all-function-query.

 Can you validate whether your QueryParser constructs a query in the form
 I drew above?

 Regards,
 Em

 Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas:
  Hello Em:
 
  1) Here's a printout of an example DisMax query (as you can see mostly
 MUST
  terms except for some SHOULD terms used for boosting scores for
 stopwords)
  *
  *
  *((+stopword_shortened_phrase:hoteles
 +stopword_shortened_phrase:barcelona
  stopword_shortened_phrase:en) | (+stopword_phrase:hoteles
  +stopword_phrase:barcelona
  stopword_phrase:en) | (+stopword_shortened_phrase:hoteles +stopword_short
  ened_phrase:barcelona stopword_shortened_phrase:en) |
 (+stopword_phrase:hoteles
  +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor
  tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona
  stopword_shortened_phrase:en) | (+stopword_phrase:hoteles +wildcard_stopw
  ord_phrase:barcelona stopword_phrase:en) |
 (+stopword_shortened_phrase:hoteles
  +wildcard_stopword_shortened_phrase:barcelona
 stopword_shortened_phrase:en)
  | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona
  stopword_phrase:en))*
  *
  *
  2)* *The collector is inserted in the SolrIndexSearcher (replacing the
  TimeLimitingCollector). We trigger it through the SOLR interface by
 passing
  the timeAllowed parameter. We know this is a hack but AFAIK there's no
  out-of-the-box way to specify custom collectors by now (
  https://issues.apache.org/jira/browse/SOLR-1680). In any case the
 collector
  part works perfectly as of now, so clearly this is not the problem.
 
  3) Re: your sentence:
  *
  *
  **I* would expect that with a shrinking set of matching documents to
  the overall-query, the function query only checks those documents that
 are
  guaranteed to be within the result set.*
  *
  *
  Yes, I agree with this, but this snippet of code in FunctionQuery.java
  seems to say otherwise:
 
  // instead of matching all docs, we could also embed a query.
  // the score could either ignore the subscore, or boost it.
  // Containment:  floatline(foo:myTerm, myFloatField, 1.0, 0.0f)
  // Boost:foo:myTerm^floatline(myFloatField,1.0,0.0f)
  @Override
  public int nextDoc() throws IOException {
for(;;) {
  ++doc;
  if (doc=maxDoc) {
return doc=NO_MORE_DOCS;
  }
  if (acceptDocs != null  !acceptDocs.get(doc)) continue;
  return doc;
}
  }
 
  It seems that the author also thought of maybe embedding a query in order
  to restrict matches, but this doesn't seem to be in place as of now (or
  maybe I'm not understanding how the whole thing works :) ).
 
  Thanks
  Carlos
  *
  *
 
  Carlos Gonzalez-Cadenas
  CEO, ExperienceOn - New generation search
  http://www.experienceon.com
 
  Mobile: +34 652 911 201
  Skype: carlosgonzalezcadenas
  LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas
 
 
  On Thu, Feb 16, 2012 at 8:09 PM, Em mailformailingli...@yahoo.de
 wrote:
 
  Hello Carlos,
 
  We have some more tests on that matter: now we're moving from issuing
  this
  large query through the SOLR interface to creating our own
  QueryParser. The
  initial tests we've done in our QParser (that internally creates
 multiple
  queries and inserts them inside a

Re: Solr edismax clarification

2012-02-17 Thread O. Klein


Indika Tantrigoda wrote
 
 Hi All,
 
 I am using edismax SearchHandler in my search and I have some issues in
 the
 search results. As I understand if the defaultOperator is set to OR the
 search query will be passed as  - The OR quick OR brown OR fox
 implicitly.
 
 

Did you also remove mm? If not  defaultOperator is ignored and it
follows mm settings.
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-edismax-clarification-tp3751013p3753260.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to connect embedded solr with each other by sharding

2012-02-17 Thread mustafozbek

I have been using sharding with multiple basic solr server for clustering. I
also used one embedded solr server (Solrj Java API) with many basic solr
servers and connecting them by sharding as embedded solr server is the
caller of them. I used the code line below for this purpose.
SolrQuery query = new SolrQuery();
query.set(shards, solr1URL,solr2URL,...);

Now, I have many embedded solr servers running on different computers and
they are unaware of each others. I want to communicate them with each other
by sharding. Is it possible? if yes how? if not what are the other options
that you can advice by using embedded solr servers?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-connect-embedded-solr-with-each-other-by-sharding-tp3753337p3753337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Error Indexing in solr 3.5

2012-02-17 Thread mechravi25

Hi Chantal,

I checked my client. It was pointing to the old solrj. After changing that,
it got indexed properly.

Thanks a lot.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-Indexing-in-solr-3-5-tp3746735p3753359.html
Sent from the Solr - User mailing list archive at Nabble.com.

Removing empty dynamic fields from a Solr 1.4 index

2012-02-17 Thread Andrew Ingram

Hi all

(Note: this question is cross-posted on stackoverflow: 
http://stackoverflow.com/questions/9327542/removing-empty-dynamic-fields-from-a-solr-1-4-index)

I have a Solr index that uses quite a few dynamic fields. I've recently changed 
my code to reduce the amount of data we index with Solr, significantly reducing 
the number of dynamic fields that are in use.

I've reindexed my data, and the doc count (as displayed in the admin schema 
browser) for the old fields has dropped to zero. But I'm confused as to why the 
fields still exist. I've done an optimize, and restarted the server, but I 
can't find any information on whether there's a way to get these fields to 
disappear.

Am I now stuck with these fields unless I recreate the index from scratch? 
We're talking about a significant reduction in fields (about 200 - 30), and 
I'm worried about the performance impact of keeping them floating around.

Thanks, 
Andrew Ingram

How to handle to run testcases in ruby code for solr

2012-02-17 Thread solr

Hi  all,
Am writing rails application by using solr_ruby gem to access solr . 
Can anybody suggest how to handle testcaeses for solr code and connections
in functionaltetsing.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-handle-to-run-testcases-in-ruby-code-for-solr-tp3753479p3753479.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Realtime search with multi clients updating index simultaneously.

2012-02-17 Thread Erick Erickson

See below

On Thu, Feb 16, 2012 at 6:18 AM, v_shan varun.c...@gmail.com wrote:
I have a heldesk application developed in PHP/MySQL. I want to implement real
time Full text search and I have shortlisted Solr. MySQL database will store
all the tickets and their updates and that data will be imported for
building Solr index. All Search requests will be handled by Solr.

What I want is a real time search. The moment someone updates a ticket, it
should be available for search.

As per my understanding of Solr, this is how I think the system will work.
A user updates a ticket - database record is modified - a request is sent
to Solr server to modify corresponding document in index.

The first thing to understand: Solr does not update a document, it deletes
the old one and adds a new one based on uniqueKey.

I have read a book on Solr and below questions are troubling me.
1. The book mentions that commits are slow in Solr. Depending on the index
size, Solr's auto-warming
configuration, and Solr's cache state prior to committing, a commit can take
a non-trivial amount of time. Typically, it takes a few seconds, but it can
take
some number of minutes in extreme cases. If this is true then how will I
know when the data will be availbale for search and how can I implemnt
realtime search? Also I don't want the ticket update operation to be slowed
down (by adding extra step of updating Solr index)

Well, Solr trunk is in the midst of getting NRT searching (Near Real Time),
so that may be of interest. Otherwise, there is some latency defined by
time until commit + replication time + autowarming time. You haven't
indicated how big your data set is, so what those numbers really are
is hard to even guess. Even if you do know how many records will
be there, the answer is still try it and see. Replication time may not
be necessary if you have a small enough system, it is possible to
index and search on the same machine. On larger installations,
a latency of a few minutes is common.

2. It is also mentioned that there is no transaction isolation. This means
that if more than one Solr client
were to submit modifications and commit them at overlapping times, it is
possible for part of one client's set of changes to be committed before that
client told Solr to commit. This applies to rollback as well. If this is a
problem
for your architecture then consider using one client process responsible for
updating Solr.

Does it mean that due to lack of transactional commits, Solr can mess up the
updates when multiple people update the ticket simultaneously?

As above, Solr deletes and replaces complete documents. So in this case
your update process would simply honor the last-received.

But I think you're missing a bit here. User's won't update your Solr index.
Somewhere, you'll have a process that queries your MySql database
and updates any changed records. The MySql database is your
system-of-record and where your transactional integrity is maintained.
The process that queries the database and sends the results to Solr will
just see the results of the aggregate changes to the underlying database
as single records, so I don't think this is an issue.

Best
Erick

Now the question before me is: Is Solr fit in my case? If yes, How?

Can't answer this for you.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Realtime-search-with-multi-clients-updating-index-simultaneously-tp3749881p3749881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Payload and exact search - 2

2012-02-17 Thread Erick Erickson

OK, payloads are a bit of a mystery to me, so this may be way off
base.

But...

The ordering of your analysis chain is suspicious, the admin/analysis
page is a life-saver.

WordDelimiterFilterFactory is breaking up your input before it gets to
the payload filter I think, so your payload information is completely
disassociated with from your terms and treated as individual terms
all by themselves. At that point what you get
in your index *probably* has no payloads attached at all!

Use the admin/schema browser link to actually look at the data (or
just go straight to Luke) and I believe you'll see that your position
information is being treated just like any other token in the input stream.

There should be nothing about payloads that prevents normal
text query on the text part, although.

Best
Erick

On Thu, Feb 16, 2012 at 9:18 AM, leonardo2 leonardo.rigut...@gmail.com wrote:
 Hello,
 I already posted this question but for some reason it was attached to a
 thread with different topic.


 Is there the possibility of perform 'exact search' in a payload field?

 I'have to index text with auxiliary info for each word. In particular at
 each word is associated the bounding box containing it in the original pdf
 page (it is used for highligthing the search terms in the pdf). I used the
 payload to store that information.

 In the schema.xml, the fieldType definition is:

 ---
 fieldtype name=wppayloads stored=false indexed=true
 class=solr.TextField 
 analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1
                         catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.DelimitedPayloadTokenFilterFactory encoder=identity/
 /analyzer
 /fieldtype
 ---

 while the field definition is:

 ---
 field name=words type=wppayloads indexed=true stored=true
 required=true multiValued=true/
 ---

 When indexing, the field 'words' contains a list of word|box as in the
 following example:

 ---
 doc_id=example
 words={Fonte:|307.62,948.16,324.62,954.25 Comune|326.29,948.16,349.07,954.25
 di|350.74,948.16,355.62,954.25 Bologna|358.95,948.16,381.28,954.25}
 ---

 Such solution works well except in the case of an exact search. For example,
 assuming the only indexed doc is the 'example' doc (before shown), the query
 words:Comune di Bologna returns no results.

 Someone know if there is the possibility of perform 'exact search' in a
 payload field?

 Thanks in advance,
 Leonardo

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Payload-and-exact-search-2-tp3750355p3750355.html
 Sent from the Solr - User mailing list archive at Nabble.com.

customizing standard tokenizer

2012-02-17 Thread Torsten Krah

Hi,

is it possible to extend the standard tokenizer or use a custom one
(possible via extending the standard one) to add some custom tokens
like Lucene-Core to be one token.

regards


smime.p7s
Description: S/MIME cryptographic signature

Re: problem to indexing pdf directory

2012-02-17 Thread alessio crisantemi

thanks gora for your help.
I installed Maven and downloaded Tika following the guide: But I have an
errore during the built of Tika about 'tika compiler', and the maven
installation of Tika is stopped.

there is another way?
thank you
a.

2012/2/16 Gora Mohanty g...@mimirtech.com

 On 16 February 2012 21:37, alessio crisantemi
 alessio.crisant...@gmail.com wrote:
  here the log:
 
 
  org.apache.solr.handler.dataimport.DataImporter doFullImport
  Grave: Full Import failed
  org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir'
 is
  a required attribute Processing Document # 1
 [...]

 The exception message above is pretty clear. You need to define a
 baseDir attribute for the second entity.

 However, even if you fix this, the setup will *not* work for indexing
 PDFs. Did you read the URLs that I sent earlier?

 Regards,
 Gora

Re: problem to indexing pdf directory

2012-02-17 Thread Erick Erickson

You should not have to do anything with Maven, the instructions
you followed were from 1.4.1 days..

Assuming you're working with a 3.x build, here's a data-config
that worked for me, just a straight distro. But note a couple of things:
1 for simplicity, I changed the schema.xml to NOT require
the id field. You'll have to change this back probably and
select a good uniqueKey
2 I had to add this line to solrconfig.xml to find the path:
lib dir=../../dist/ regex=apache-solr-dataimporthandler-extras-\d.*\.jar/
3 If this all works without errors in the Solr log and you still
 can't find anything, be sure you issue a commit.

Best
Erick

dataConfig
  dataSource name=bin type=BinFileDataSource/
  document
entity baseDir=/Users/Erick/testdocs fileName=.*pdf name=sd
processor=FileListEntityProcessor recursive=true
rootEntity=false
  entity dataSource=bin format=text name=tika-test
processor=TikaEntityProcessor url=${sd.fileAbsolutePath}
field column=Author meta=true name=author/
field column=Content-Type meta=true name=title/
!-- field column=title name=title meta=true/ --
field column=text name=text/
  /entity
  !-- field column=fileLastModified name=date
dateTimeFormat=-MM-dd'T'hh:mm:ss / --
  field column=fileSize meta=true name=size/
/entity
  /document
/dataConfig
On Fri, Feb 17, 2012 at 9:35 AM, alessio crisantemi
alessio.crisant...@gmail.com wrote:
 thanks gora for your help.
 I installed Maven and downloaded Tika following the guide: But I have an
 errore during the built of Tika about 'tika compiler', and the maven
 installation of Tika is stopped.

 there is another way?
 thank you
 a.

 2012/2/16 Gora Mohanty g...@mimirtech.com

 On 16 February 2012 21:37, alessio crisantemi
 alessio.crisant...@gmail.com wrote:
  here the log:
 
 
  org.apache.solr.handler.dataimport.DataImporter doFullImport
  Grave: Full Import failed
  org.apache.solr.handler.dataimport.DataImportHandlerException: 'baseDir'
 is
  a required attribute Processing Document # 1
 [...]

 The exception message above is pretty clear. You need to define a
 baseDir attribute for the second entity.

 However, even if you fix this, the setup will *not* work for indexing
 PDFs. Did you read the URLs that I sent earlier?

 Regards,
 Gora

53 matches

Mail list logo