date:20110728

Re: Autocomplete with Solr 3.1

2011-07-28 Thread scorpking

Nobody can help me

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3206095.html
Sent from the Solr - User mailing list archive at Nabble.com.

Collpasing MultiValue fields

2011-07-28 Thread FatMan Corp

Hello,

I understand collapsing is not yet possible for multi value fields, but
still wonder what is the best way to solve the issue I am having.

I have the following document data fields:

1. Title (max 200 chars)
2. Abstract (max 2000 chars)
3. Body (can be quite long)
4. Author (multi valued)
5. link (multi valued)
6. more fields

I would like to collapse results of search based on different links. But
currently can't.
My solution for now is to have a single solr document for each link+title
combination. But that multiplies my data tremendously, and I am also
noticing that there is not smart data storage of the stored fields in solr.
Some of the fields I am storing (abstract, body) are huge and I would like
to avoid duplicating them.

Any ideas?

Thanks in advance,
Fattie

I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-07-28 Thread Bing Yu

I just goto apache-solr-3.3.0/solr and run 'ant test'

I find that the junit test will always fail, and told me ’BUILD FAILED‘

but if I type 'ant dist', I can get a apache-solr-3.3-SNAPSHOT.war
with no warning.

Is it a problem just me？

my server:Centos 5.6 64bit／apache-ant-1.8.2　/junit and jdk （both
jrocket and sun jdk1.6 fails）

Re: Dealing with keyword stuffing

2011-07-28 Thread Pranav Prakash

On Thu, Jul 28, 2011 at 08:31, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Presumably, they are doing this by increasing tf (term frequency),
 : i.e., by repeating keywords multiple times. If so, you can use a custom
 : similarity class that caps term frequency, and/or ensures that the
 scoring
 : increases less than linearly with tf. Please see


In some cases, yes they are repeating keywords multiple times. Stuffing
different combinations - Solr, Solr Lucene, Solr Search, Solr Apache, Solr
Guide.



 in paticular, using something like SweetSpotSimilarity tuned to know what
 values make sense for good content in your domain can be useful because
 it can actaully penalize docsuments that are too short/long or have term
 freqs that are outside of a reasonble expected range.


I am not a Solr expert, But I was thinking in this direction. The ratio of
tokens/total_length would be nearer to 1 for a stuffed document, while it
would be nearer to 0 for a bogus document. Somewhere between the two lies
documents that are more likely to be meaningful. I am not sure how to use
SweetSpotSimilarity. I am googling on this, but any useful insights are so
much appreciated.

Index time boosting with DIH

2011-07-28 Thread Bürkle , David

Can someone point me to an example for using index time boosting with the 
DataImportHandler.

Re: Index time boosting with DIH

2011-07-28 Thread Shalin Shekhar Mangar

On Thu, Jul 28, 2011 at 3:56 PM, Bürkle, David david.buer...@irix.chwrote:

 Can someone point me to an example for using index time boosting with the
 DataImportHandler.


You can use the special flag variable $docBoost to add a index time boost.

http://wiki.apache.org/solr/DataImportHandler#Special_Commands

-- 
Regards,
Shalin Shekhar Mangar.

Re: Dealing with keyword stuffing

2011-07-28 Thread Gora Mohanty

On Thu, Jul 28, 2011 at 3:48 PM, Pranav Prakash pra...@gmail.com wrote:
[...]
 I am not sure how to use SweetSpotSimilarity. I am googling on this, but
 any useful insights are so much appreciated.

Replace the existing DefaultSimilarity class in schema.xml (look towards
the bottom of the file) with the SweetSpotSimilarity class, e.g., have a line
like:
  similarity class=org.apache.lucene.search.SweetSpotSimilarity/

Regards,
Gora

Reusing SolrServer instances when swapping cores

2011-07-28 Thread Michael Szalay

Hi all

We work with two cores (active and passive) and swap them when the 
reindexing was finished.

Is it allowed to reuse the same instance of the SolrServer (both Embedded and 
Common)?
I.E. do they point to the other core after the swapping?

Regards Michael

-- 
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business

Re: please help explaining debug output

2011-07-28 Thread Erick Erickson

IDF is the frequency of the term in that field for the entire index, not
the specific document.

So it means that the term is in that field for some document somewhere,
but not in that particular document I believe...

Which leads me to wonder if the document is getting indexed as you
expect, although there's nothing in the data that you've provided that
I can point to as the culprit, it all looks like it *should* work

If you can get a copy of Luke and look at the document in question
and/or look at the schema browser for that particular field it might
help, but frankly I'm at a loss to understand what the problem is...

Sorry I can't be of more help
Erick

On Tue, Jul 26, 2011 at 1:04 PM, Robert Petersen rober...@buy.com wrote:
 That didn't help.  Seems like another case where I should get matches but 
 don't and this time it is only for some documents.  Others with similar 
 content do match just fine.  The debug output 'explain other' section for a 
 non-matching document seems to say the term frequency is 0 for my problematic 
 term, although I know it is in the content.

 I ended up making a synonym to do what the analysis stack *should* be doing: 
 splitting LaserJet on case changes.  IE putting LaserJet, laser jet in 
 synonyms at index time makes this work.  I don't know why though.

 Question:  Does this debug output mean it is matching the terms but the term 
 frequency vector is returning 0 for the frequency of this term.  IE Does this 
 mean the term is in the doc but not in the tf array?

 0.0 = no match on required clause (moreWords:laser jet)

    0.0 = weight(moreWords:laser jet in 32497), product of:

      0.60590804 = queryWeight(moreWords:laser jet), product of:

        14.597603 = idf(moreWords: laser=26731 jet=12685)

        0.041507367 = queryNorm

      0.0 = fieldWeight(moreWords:laser jet in 32497), product of:

        0.0 = tf(phraseFreq=0.0)

        14.597603 = idf(moreWords: laser=26731 jet=12685)

        0.078125 = fieldNorm(field=moreWords, doc=32497)




 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, July 25, 2011 3:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: please help explaining debug output

 Hmmm, I can't find a convenient 1.4.0 to download, but re-indexing is a good
 idea since this seems like it *should* work.

 Erick

 On Mon, Jul 25, 2011 at 5:32 PM, Robert Petersen rober...@buy.com wrote:
 I'm still on solr 1.4.0 and the analysis page looks like they should match, 
 and other products with the same content do in fact match.  I'm reindexing 
 the non-matching ones to rule that out.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, July 25, 2011 1:58 PM
 To: solr-user@lucene.apache.org
 Subject: Re: please help explaining debug output

 Hmmm, I'm assuming that moreWords is your default text field, yes?

 But it works for me (tm), using 1.4.1. What version of Solr are you on?

 Also, take a glance at the admin/analysis page, that might help...

 Gotta run

 Erick

 On Mon, Jul 25, 2011 at 4:52 PM, Robert Petersen rober...@buy.com wrote:
 Sorry, to clarify a search for P1102W matches all three docs but a
 search for p1102w LaserJet only matches the second two.  Someone asked
 me a question while I was typing and I got distracted, apologies for any
 confusion.

 -Original Message-
 From: Robert Petersen [mailto:rober...@buy.com]
 Sent: Monday, July 25, 2011 1:42 PM
 To: solr-user@lucene.apache.org
 Subject: please help explaining debug output

 I have three documents with the following product titles in a text field
 called moreWords with analysis stack matching the solr example text
 field definition.



 1.       HP LaserJet P1102W Monochrome Laser Printer
 http://www.buy.com/prod/hp-laserjet-p1102w-monochrome-laser-printer/q/l
 oc/101/213824965.html

 2.       HP CE285A (85A) Remanufactured Black Toner Cartridge for
 LaserJet M1212nf, P1102, P1102W Series
 http://www.buy.com/prod/hp-ce285a-85a-remanufactured-black-toner-cartri
 dge-for-laserjet/q/loc/101/217145536.html

 3.       Black HP CE285A Toner Cartridge For LaserJet P1102W, LaserJet
 M1130, LaserJet M1132, LaserJet M1210
 http://www.buy.com/prod/black-hp-ce285a-toner-cartridge-for-laserjet-p1
 102w-laserjet-m1130/q/loc/101/222045267.html



 A search for P1102W matches (2) and (3), but not (1) above.  Can someone
 explain the debug output?  It looks like I am getting a non-match on (1)
 because term frequency is zero?  Am I reading that right?  If so, how
 could that be? the searched terms are equivalently in all three docs.  I
 don't get it.





 lst name=debug

 str name=rawquerystringp1102w LaserJet /str

 str name=querystringp1102w LaserJet /str

 str name=parsedquery+PhraseQuery(moreWords:p 1102 w)
 +PhraseQuery(moreWords:laser jet)/str

 str name=parsedquery_toString+moreWords:p 1102 w +moreWords:laser
 jet/str

 lst name=explain

 str name=222045267

 3.64852 = (MATCH) sum

Re: Exact match not the first result returned

2011-07-28 Thread Brian Lamb

That's a clever idea. I'll put something together and see how it turns out.
Thanks for the tip.

On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : With your solution, RECORD 1 does appear at the top but I think thats
 just
 : blind luck more than anything else because RECORD 3 shows as having the
 same
 : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
 : like all three records returned with RECORD 1 being the first listing.

 with omitNorms RECORD1 and RECORD3 have the same score because only the
 tf() matters, and both docs contain the term frank exactly twice.

 the reason RECORD1 isn't scoring higher even though it contains (as you
 put it matchings 'Fred' exactly is that from a term perspective, RECORD1
 doesn't actually match myname:Fred exactly, because there are in fact
 other terms in that field because it's multivalued.

 one way to indicate that you (only* want documents where entire field
 values to match your input (ie: RECORD1 but no other records) would be to
 use a StrField instead of a TextField or an analyzer that doesn't split up
 tokens (lie: something using KeywordTokenizer).  that way a query on
 myname:Frank would not match a document where you had indexed the value
 Frank Stalone by a query for myname:Frank Stalone would.

 in your case, you don't want *only* the exact field value matches, but you
 want them boosted, so you could do something like copyField myname into
 myname_str and then do...

  q=+myname:Frank myname_str:Frank^100

 ...in which case a match on myname is required, but a match on
 myname_str will greatly increase the score.

 dismax (and edismax) are really designed for situations like this...

  defType=dismax  qf=myname  pf=myname_str^100  q=Frank



 -Hoss

Possible to use quotes in dismax qf?

2011-07-28 Thread O. Klein

I want to do a dismax search to search for original query and this query as a
phrasequery:

q=sail boat needs to be converted to dismax query q=sail boat sail boat

qf=title^10 content^2

What is best way to do this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: slave data files way bigger than master

2011-07-28 Thread Erick Erickson

My utter and complete shot in the dark is that the slave isn't getting
its data from the master you think it is. I know it's a silly comment, but
I've chased my tail this way more than once G...

None of the files match. None of the dates match, etc.

I'm assuming that bouncing the slave doesn't make any of the
files go away. This shouldn't be necessary, but it might do to
test to see if somehow the slave Solr has the files open. They
*should* be removed when the slave reopens its searchers, but
we're looking for really odd things here...

Are you totally sure that the slave has attempted a replication? Take
a look in the logs to see if there are errors being reported for the
replication process. You can also issue the replication command
vai HTTP, see: http://wiki.apache.org/solr/SolrReplication#HTTP_API

This is all speculation, but something is massively not right with the
file lists you've posted

Best
Erick

On Tue, Jul 26, 2011 at 3:45 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 So I've got Solr 1.4.  I've got replication going on.

 Once a day, before replication, I optimize on master.  Then I replicate.

 I'd expect optimization before replicate would basically replace all files
 on slave, this is expected.

 But that means I'd also expect that the index files on slave would be
 identical, and the same size, as on master, after replication, this is the
 point of replication, yes?

 But they are not. The master is only 12G, the slave is 39G.  The index files
 in slave and master have completely different filenames too, I don't know if
 that's expected, but it's not what I expected.  I'll post complete file
 lists below.

 Anyone have any idea what's going on?  Also... I wonder if these extra index
 files on the slave are just extra not even looekd at by the slave solr, or
 if instead they actually ARE included in the indexes!  If the latter, and we
 have 'ghost' documents in the index, that could explain some weird problems
 I'm having with the slave getting Java out of heap space errors due to huge
 uninverted indexes, even though the index is basically the same with the
 same solrconfig.xml settings as it has been for a while, without such
 problems.

 Greatly appreciate if anyone has any ideas.


 MASTER: ls -lh master_index

 total 12G
 -rw-rw-r-- 1 tomcat tomcat  3.0G Jul 26 06:37 _24p.fdt
 -rw-rw-r-- 1 tomcat tomcat   15M Jul 26 06:37 _24p.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 06:33 _24p.fnm
 -rw-rw-r-- 1 tomcat tomcat  1.2G Jul 26 06:44 _24p.frq
 -rw-rw-r-- 1 tomcat tomcat   49M Jul 26 06:44 _24p.nrm
 -rw-rw-r-- 1 tomcat tomcat  1.1G Jul 26 06:44 _24p.prx
 -rw-rw-r-- 1 tomcat tomcat  7.8M Jul 26 06:44 _24p.tii
 -rw-rw-r-- 1 tomcat tomcat  660M Jul 26 06:44 _24p.tis
 -rw-rw-r-- 1 tomcat tomcat  2.1G Jul 26 08:54 _2k4.fdt
 -rw-rw-r-- 1 tomcat tomcat  7.6M Jul 26 08:54 _2k4.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 08:51 _2k4.fnm
 -rw-rw-r-- 1 tomcat tomcat  719M Jul 26 08:59 _2k4.frq
 -rw-rw-r-- 1 tomcat tomcat   25M Jul 26 08:59 _2k4.nrm
 -rw-rw-r-- 1 tomcat tomcat  797M Jul 26 08:59 _2k4.prx
 -rw-rw-r-- 1 tomcat tomcat  5.0M Jul 26 08:59 _2k4.tii
 -rw-rw-r-- 1 tomcat tomcat  436M Jul 26 08:59 _2k4.tis
 -rw-rw-r-- 1 tomcat tomcat  211M Jul 26 09:25 _2n3.fdt
 -rw-rw-r-- 1 tomcat tomcat  774K Jul 26 09:25 _2n3.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 09:25 _2n3.fnm
 -rw-rw-r-- 1 tomcat tomcat   72M Jul 26 09:26 _2n3.frq
 -rw-rw-r-- 1 tomcat tomcat  2.5M Jul 26 09:26 _2n3.nrm
 -rw-rw-r-- 1 tomcat tomcat   78M Jul 26 09:26 _2n3.prx
 -rw-rw-r-- 1 tomcat tomcat  668K Jul 26 09:26 _2n3.tii
 -rw-rw-r-- 1 tomcat tomcat   53M Jul 26 09:26 _2n3.tis
 -rw-rw-r-- 1 tomcat tomcat  186M Jul 26 09:49 _2q6.fdt
 -rw-rw-r-- 1 tomcat tomcat  774K Jul 26 09:49 _2q6.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 09:49 _2q6.fnm
 -rw-rw-r-- 1 tomcat tomcat   60M Jul 26 09:50 _2q6.frq
 -rw-rw-r-- 1 tomcat tomcat  2.5M Jul 26 09:50 _2q6.nrm
 -rw-rw-r-- 1 tomcat tomcat   64M Jul 26 09:50 _2q6.prx
 -rw-rw-r-- 1 tomcat tomcat  562K Jul 26 09:50 _2q6.tii
 -rw-rw-r-- 1 tomcat tomcat   45M Jul 26 09:50 _2q6.tis
 -rw-rw-r-- 1 tomcat tomcat  246M Jul 26 10:16 _2t9.fdt
 -rw-rw-r-- 1 tomcat tomcat  774K Jul 26 10:16 _2t9.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 10:16 _2t9.fnm
 -rw-rw-r-- 1 tomcat tomcat   68M Jul 26 10:17 _2t9.frq
 -rw-rw-r-- 1 tomcat tomcat  2.5M Jul 26 10:17 _2t9.nrm
 -rw-rw-r-- 1 tomcat tomcat   89M Jul 26 10:17 _2t9.prx
 -rw-rw-r-- 1 tomcat tomcat  602K Jul 26 10:17 _2t9.tii
 -rw-rw-r-- 1 tomcat tomcat   53M Jul 26 10:17 _2t9.tis
 -rw-rw-r-- 1 tomcat tomcat  221M Jul 26 10:45 _2wc.fdt
 -rw-rw-r-- 1 tomcat tomcat  774K Jul 26 10:45 _2wc.fdx
 -rw-rw-r-- 1 tomcat tomcat   836 Jul 26 10:45 _2wc.fnm
 -rw-rw-r-- 1 tomcat tomcat   69M Jul 26 10:46 _2wc.frq
 -rw-rw-r-- 1 tomcat tomcat  2.5M Jul 26 10:46 _2wc.nrm
 -rw-rw-r-- 1 tomcat tomcat   82M Jul 26 10:46 _2wc.prx
 -rw-rw-r-- 1 tomcat tomcat  613K Jul 26 10:46 _2wc.tii
 -rw-rw-r-- 1 tomcat tomcat   53M Jul 26 10:46 _2wc.tis
 -rw-rw-r-- 1

Re: how to get solr core information using solrj

2011-07-28 Thread Jiang mingyuan

hi Stefan,


thanks for your advice,i wrote a jsp file to obtain those information.
witch looks like :

CoreContainer
cores=(CoreContainer)request.getAttribute(org.apache.solr.CoreContainer);
then cores.getCores() get core informations.

later I translate infos to json format.

at client side.I use httpClient to request this page,then parse json to Java
Object.

finally i got what i want.


i am not familiar with solr,so i'm not sure their are other good interfaces.

On  
http://wiki.apache.org/solr/**CoreAdmin#STATUShttp://wiki.apache.org/solr/CoreAdmin#STATUS
,
I have not found direct method to get information about core names ,paths
etc.

many thanks for your advice.



On Wed, Jul 20, 2011 at 3:01 PM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Jiang,

 what about 
 http://wiki.apache.org/solr/**CoreAdmin#STATUShttp://wiki.apache.org/solr/CoreAdmin#STATUS?

 Regards
 Stefan

 Am 20.07.2011 05:40, schrieb Jiang mingyuan:

  hi all,

 Our solr server contains two cores:core0,core1,and they both works well.

 Now I'am trying to find a way to get information about core0 and core1.

 Can solrj or other api do this?


 thanks very much.

Re: how to get solr core information using solrj

2011-07-28 Thread Jiang mingyuan

HI Erick:

At the page you have show me, I found some useful methods.
But it seems like not contains method about how to obtain core names,core
paths.

so,I followed the solr index page's method,wrote a jsp page ,like:

CoreContainer
cores=(CoreContainer)request.getAttribute(org.apache.solr.CoreContainer);
then cores.getCores() get core informations.

later I translate infos to json format.

at client side.I use httpClient to request this page,then parse json to Java
Object.

finally i got what i want.

thanks again.

On Mon, Jul 25, 2011 at 9:40 PM, Erick Erickson erickerick...@gmail.comwrote:


 http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/CoreAdminRequest.html

 That should get you started.


 Best
 Erick

 On Tue, Jul 19, 2011 at 11:40 PM, Jiang mingyuan
 mailtojiangmingy...@gmail.com wrote:
  hi all,
 
  Our solr server contains two cores:core0,core1,and they both works well.
 
  Now I'am trying to find a way to get information about core0 and core1.
 
  Can solrj or other api do this?
 
 
  thanks very much.

Re: Solr DataImport with multiple DBs

2011-07-28 Thread Erick Erickson

Often, the easiest solution when DIH gets really complex is to do one of
two things:
1 Use SolrJ instead. You can do complex things more easily much of
 the time with DIH.
2 You could consider using a custom Transformer in conjunction with your
  primary delta query to access the second table, see:
   http://wiki.apache.org/solr/DIHCustomTransformer


Best
Erick

On Tue, Jul 26, 2011 at 7:27 PM, spravin spravin.li...@gmail.com wrote:
 Hi All

 I am stuck with an issue with delta-import while configuring solr in an
 environment where multiple databases exist.

 My schema looks like this:
 id, name, keyword
 names exist in one DB and keywords in a table in the other DB (with id as
 foreign key).

 For delta import, I would need to check against the updated column in both
 the tables. But they are in two different databases, so I can't do this in a
 single deltaquery.
 So I'm not able to detect if the field in the second database has changed.

 The relevant part of my dataconfig xml looks like this:

 dataConfig
  dataSource ds1... /
  dataSource ds2... /
  document
    entity name=name dataSource=ds1
            query=SELECT ID, Name, Updated FROM records
            deltaImportQuery=SELECT ID, Name, Updated FROM records WHERE ID
 = '${dataimporter.delta.ID http://dataimporter.delta.id/}'
            deltaQuery=SELECT ID FROM records WHERE Updated 
 '${dataimporter.last_index_time}'

            entity name=keywords  dataSource=ds2
                    query=SELECT Keyword,Updated AS KeywordUpdated FROM
 keywords WHERE ID = '${name.ID}'
            /entity

    /entity
  /document
 /dataConfig

 I'm hoping someone in this list could point me to a solution: a way to
 specify deltaQuery across multiple databases.

 (In the above example, I would like to add OR ID IN (SELECT ID FROM
 keywords WHERE Updated  '${dataimporter.last_index_time}') to the
 deltaQuery, but this table can be accessed only from a different dataSource.

 Thanks
 - PS


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: what data type for geo fields?

2011-07-28 Thread Peter Wolanin

Thanks for the feedback. I'll have look more at how geohash works.

Looking at the sample schema more closely, I see:

fieldType name=double class=solr.TrieDoubleField
precisionStep=0 omitNorms=true positionIncrementGap=0/

So in fact double is also Trie, but just with precisionStep 0 in the example.

-Peter

On Wed, Jul 27, 2011 at 9:57 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, Jul 27, 2011 at 9:01 AM, Peter Wolanin peter.wola...@acquia.com
wrote:
Looking at the example schema:

http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml

the solr.PointType field type uses double (is this just an example
field, or used for geo search?)

While you could possibly use PointType for geo search, it doesn't have
good support for it (it's more of a general n-dimension point)
The LatLonType has all the geo support currently.

, while the solr.LatLonType field uses
tdouble and it's unclear how the geohash is translated into lat/lon
values or if the geohash itself might typically be used as a copyfield
and use just for matching a query on a geohash?

There's no geohash used in LatLonType
It is indexed as a lat and lon under the covers (using the suffix _d)

Is there an advantage in terms of speed to using Trie fields for
solr.LatLonType?

Currently only for explicit range queries... like point:[10,10 TO 20,20]

I would assume so, e.g. for bbox operations.

It's a bit of an implementation detail, but bbox doesn't currently use
range queries.

-Yonik
http://www.lucidimagination.com

--
Peter M. Wolanin, Ph.D. : Momentum Specialist, Acquia. Inc.
peter.wola...@acquia.com : 978-296-5247

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;

Re: what data type for geo fields?

2011-07-28 Thread Yonik Seeley

On Thu, Jul 28, 2011 at 10:24 AM, Peter Wolanin
peter.wola...@acquia.com wrote:
 Thanks for the feedback.  I'll have look more at how geohash works.

 Looking at the sample schema more closely, I see:

  fieldType name=double class=solr.TrieDoubleField
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 So in fact double is also Trie, but just with precisionStep 0 in the 
 example.

Right, which means it's a normal numeric field with one token
indexed per value (i.e. no tradeoff to to speed up range queries by
increasing index size).

-Yonik
http://www.lucidimagination.com

Re: Possible to use quotes in dismax qf?

2011-07-28 Thread Juan Grande

Hi,

You can use the pf parameter of the DismaxQParserPlugin:
http://wiki.apache.org/solr/DisMaxQParserPlugin#pf_.28Phrase_Fields.29

This parameter receives a list of fields using the same syntax as the qf
parameter. After determining the list of matching documents,
DismaxQParserPlugin will boost the docs where the terms of the query match
as a phrase in the one of those fields. You can also use the ps field to set
a phrase slop and boost docs where the terms appear in close proximity
instead of as an exact phrase.

Regards,

*Juan*



On Thu, Jul 28, 2011 at 11:00 AM, O. Klein kl...@octoweb.nl wrote:

 I want to do a dismax search to search for original query and this query as
 a
 phrasequery:

 q=sail boat needs to be converted to dismax query q=sail boat sail boat

 qf=title^10 content^2

 What is best way to do this?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr DataImport with multiple DBs

2011-07-28 Thread Dyer, James

Would it be possible to just run two sepearate deltas, one that updates records 
that changed in ds1 and another that updates records that changed in ds2 ?  Of 
course this would be inefficient if a lot of records typically change in both 
places at the same time.

With this approach, you might have to run the deltas using command=full-import 
/ clean=false as shown here: 
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, July 28, 2011 9:14 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr DataImport with multiple DBs

Often, the easiest solution when DIH gets really complex is to do one of
two things:
1 Use SolrJ instead. You can do complex things more easily much of
 the time with DIH.
2 You could consider using a custom Transformer in conjunction with your
  primary delta query to access the second table, see:
   http://wiki.apache.org/solr/DIHCustomTransformer


Best
Erick

On Tue, Jul 26, 2011 at 7:27 PM, spravin spravin.li...@gmail.com wrote:
 Hi All

 I am stuck with an issue with delta-import while configuring solr in an
 environment where multiple databases exist.

 My schema looks like this:
 id, name, keyword
 names exist in one DB and keywords in a table in the other DB (with id as
 foreign key).

 For delta import, I would need to check against the updated column in both
 the tables. But they are in two different databases, so I can't do this in a
 single deltaquery.
 So I'm not able to detect if the field in the second database has changed.

 The relevant part of my dataconfig xml looks like this:

 dataConfig
  dataSource ds1... /
  dataSource ds2... /
  document
    entity name=name dataSource=ds1
            query=SELECT ID, Name, Updated FROM records
            deltaImportQuery=SELECT ID, Name, Updated FROM records WHERE ID
 = '${dataimporter.delta.ID http://dataimporter.delta.id/}'
            deltaQuery=SELECT ID FROM records WHERE Updated 
 '${dataimporter.last_index_time}'

            entity name=keywords  dataSource=ds2
                    query=SELECT Keyword,Updated AS KeywordUpdated FROM
 keywords WHERE ID = '${name.ID}'
            /entity

    /entity
  /document
 /dataConfig

 I'm hoping someone in this list could point me to a solution: a way to
 specify deltaQuery across multiple databases.

 (In the above example, I would like to add OR ID IN (SELECT ID FROM
 keywords WHERE Updated  '${dataimporter.last_index_time}') to the
 deltaQuery, but this table can be accessed only from a different dataSource.

 Thanks
 - PS


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-DataImport-with-multiple-DBs-tp3201843p3201843.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: colocated term stats

2011-07-28 Thread Jonathan Rochkind


Not sure if this will do what you want, but one way might be using facets.

Take the term you are interested in, and apply it as an fq.  Now the 
result set will include only documents that include that term.  So also 
request facets for that result set, the top 10 facets are the top 10 
terms that appear in that result set -- which is the top 10 terms that 
appear in documents together with your fq constraint. (Okay, you might 
need to look at 11, because one of the facet values will be the same 
term you fq constrained). You don't need to look at actual documents at 
all (rows=0), just facet response.


Make sense? Does that do what you want?

On 7/27/2011 9:12 PM, Twomey, David wrote:

Given a query term, is it possible to get from the index the top 10 collocated 
terms in the index.

ie:  return the top 10 terms that appear with this term based on doc count.

A plus would be to add some constraints on how near the terms are in the docs.

Re: Exact match not the first result returned

2011-07-28 Thread Jonathan Rochkind

Keep in mind that if you use a field type that includes spaces (eg 
StrField, or KeywordTokenizer), then if you're using dismax or lucene 
query parsers, the only way to find matches in this field on queries 
that include spaces will be to do explicit phrase searches with double 
quotes.


These fields will, however, work fine with pf in dismax/edismax as per 
Hoss's example.


But yeah, I do what Hoss recommends -- I've got a KeywordTokenizer copy 
of my searchable field. I use a pf on that field with a very high boost 
to try and boost truly complete matches, that match the entirety of 
the value.  It's not exactly 'exact', I still do some normalization, 
including flattening unicode to ascii, and normalizing 1 or more 
string-or-punctuation to exactly 1 one space using a char regex filter.


It seems to pretty much work -- this is just one of various relevancy 
tweaks I've got going on, to the extent that my relevancy has become 
pretty complicated and hard to predict and doesn't always do what I'd 
expect/intend, but this particular aspect seems to mostly pretty much work.


On 7/27/2011 10:55 PM, Chris Hostetter wrote:

: With your solution, RECORD 1 does appear at the top but I think thats just
: blind luck more than anything else because RECORD 3 shows as having the same
: score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
: like all three records returned with RECORD 1 being the first listing.

with omitNorms RECORD1 and RECORD3 have the same score because only the
tf() matters, and both docs contain the term frank exactly twice.

the reason RECORD1 isn't scoring higher even though it contains (as you
put it matchings 'Fred' exactly is that from a term perspective, RECORD1
doesn't actually match myname:Fred exactly, because there are in fact
other terms in that field because it's multivalued.

one way to indicate that you (only* want documents where entire field
values to match your input (ie: RECORD1 but no other records) would be to
use a StrField instead of a TextField or an analyzer that doesn't split up
tokens (lie: something using KeywordTokenizer).  that way a query on
myname:Frank would not match a document where you had indexed the value
Frank Stalone by a query for myname:Frank Stalone would.

in your case, you don't want *only* the exact field value matches, but you
want them boosted, so you could do something like copyField myname into
myname_str and then do...

   q=+myname:Frank myname_str:Frank^100

...in which case a match on myname is required, but a match on
myname_str will greatly increase the score.

dismax (and edismax) are really designed for situations like this...

   defType=dismax  qf=myname  pf=myname_str^100  q=Frank



-Hoss

Re: Possible to use quotes in dismax qf?

2011-07-28 Thread Jonathan Rochkind

It's not clear to me why you would try to do that, I'm not sure it makes 
a lot of sense.


You want to find all documents that have sail boat as a phrase AND 
have sail somewhere in them AND have boat somewhere in them?  That's 
exactly the same as just all documents that have sail boat as a phrase 
-- such documents will neccesarily include sail and boat, right?  So 
why not just ask for q=sail boat?


What are you actually trying to do?

Maybe dismax 'pf', which relevancy boosts documents which have your 
input as a phrase, si what you really want?  Then you'd just search for 
q=sail boat, but documents that included sail boat as a phrase 
would be boosted, at the boost you specify.


On 7/28/2011 10:00 AM, O. Klein wrote:

I want to do a dismax search to search for original query and this query as a
phrasequery:

q=sail boat needs to be converted to dismax query q=sail boat sail boat

qf=title^10 content^2

What is best way to do this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-to-use-quotes-in-dismax-qf-tp3206762p3206762.html
Sent from the Solr - User mailing list archive at Nabble.com.

about the Solr request filter

2011-07-28 Thread 于浩

Hello,Dear friends,
  I have got an problem in developing with solr.
  In My Application ,It must sends multiple query to solr server after the page 
is loaded. Then I found a problem: some request will return statusCode:0 and 
QTime:0, The solr has accepted the request, but It does not return a result 
document.  If I send each request  one by one manually ,It will return the 
result. But If I send the request frequently in a very  short times, It will 
return nothing only statusCode:0 and QTime:0.
 I think this may be a stratege for solr. but i can't find any documents or 
discussions on the internet. 
 so i want you can help me.
  
  --
 Surely, 你永远是最棒的!

Re: Store complete XML record (DIH XPathEntityProcessor)

2011-07-28 Thread Chantal Ackermann

Hi g,

have a look at the PlainTextEntityProcessor:
http://wiki.apache.org/solr/DataImportHandler#PlainTextEntityProcessor

you will have to call the URL twice that way, but I don't think you can
get the complete document (the root element with all structure) via
xpath - so the XPathEntityProcessor cannot help you.

If calling the URL twice slows your indexer down in unacceptable ways
you can always subclass XPathEntityProcessor (knowing Java is helpful,
thoug...). There surely is a way to make it return what you need. Or
maybe an entity processor that caches the content and uses XPath EP and
PlainText EP to accomplish your needs (not sure whether the API allows
for that).

Cheers,
Chantal

On Thu, 2011-07-28 at 05:53 +0200, solruser@9913 wrote:
I am trying to use DIH to import an XML based file with multiple XML records
in it. Each record corresponds to one document in Lucene. I am using the
DIH FileListEntityProcessor (to get file list) followed by the
XPathEntityProcessor to create the entities.

It works perfectly and I am able to map XML elements to fields . however
I also need to store the entire XML record as separate 'full text' field.
Is there any way the XPathEntityProcessor provides a variable like 'rawLine'
or 'plainText' that I can map to a field.

I tried to use the Plain Text processor after this - but that does not
recognize the XML boundaries and just gives the whole XML file.

entity name=x rootEntity=truedataSource=logfilereader
processor=XPathEntityProcessor
url=${logfile.fileAbsolutePath} stream=false
forEach=/xml/myrecord
transformer=
field column=mycol1
xpath=/xml/myrecord/@something
/

and so on ...
This works perfectly. However I also need something like ...

field column=fullxmlrecord name=plainText /

Any help is much appreciated. I am a newbie and may be missing something
obvious here

-g

--
View this message in context:
http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3205524.html
Sent from the Solr - User mailing list archive at Nabble.com.

Exception in thread main org.apache.solr.common.SolrException: No such core: core1

2011-07-28 Thread automata

Hi 

I am very new with Solr, infact just started today so forgive my lack of
knowledge on the subject.

Everything went fine until the point where I started to get the exception
Exception in thread main org.apache.solr.common.SolrException: No such
core: core1 and I am stuck at the same point for a couple of hours now.

*Below is the test code :*

public class UtilSolR {

  private static EmbeddedSolrServer embeddedSolrServer = null;

  private static SolrServer httpSolrServer = null;

  /**
   * @param args
   */
  public static void main(String[] args) {
//SolrServer server = getHttpSolRServer();
SolrServer server = getEmbeddedSolRServer();

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField(tenant_id, tenant_id, 1.0f);
doc1.addField(displayas, displayas, 1.0f);
doc1.addField(btel, btel, 1.0f);
doc1.addField(htel, htel, 1.0f);

//SolrInputDocument doc2 = new SolrInputDocument();
//doc2.addField( id, id2, 1.0f );
//doc2.addField( name, doc2, 1.0f );
//doc2.addField( price, 20 );

CollectionSolrInputDocument docs = new ArrayListSolrInputDocument();
docs.add( doc1 );
//docs.add( doc2 );

try {
  server.add( docs );

  server.commit();
} catch (SolrServerException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
} catch (IOException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
}
System.out.println(Done !!);
  }

  public static EmbeddedSolrServer getEmbeddedSolRServer(){
if(embeddedSolrServer == null){
  CoreContainer coreContainer;
  System.setProperty(solr.solr.home,
/home/automata/solr/apache-solr-3.3.0/example/solr/);
  CoreContainer.Initializer initializer = new
CoreContainer.Initializer();
  try {
coreContainer = initializer.initialize();
embeddedSolrServer = new EmbeddedSolrServer(coreContainer, );
  } catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
  } catch (ParserConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
  } catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
  }
}
return embeddedSolrServer;
  }

 }



*The solr.xml file is as follows :*

solr persistent=false



  

  cores adminPath=/admin/cores defaultCoreName=collection1

core name=collection1 instanceDir=. /

core name=core1 instanceDir=core1 /

  /cores

/solr


The structure of the example folder is standard(just as the supplied one
from apache) and no change has been made to it.


The solar  interface doesnt mention any core names there, however does not
throw a 404 on opening the admin page.

Any help to resolve the problem will be really great. Please let me know if
I can provide any more information.

thanks !




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-in-thread-main-org-apache-solr-common-SolrException-No-such-core-core1-tp3206610p3206610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Possible to use quotes in dismax qf?

2011-07-28 Thread O. Klein

I removed the post as it might confuse people.

But because of analysers combining 2 words in a phrase query using shingles
and positionfilter and the usage of dismax, I need q to be the original
query plus the original query as phrasequery. That way the combined words
are also highlighted and do I get the results I need.

qf is not the place to do this it seems though. Any way to do this in Solr?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Possible-to-use-quotes-in-dismax-qf-tp3206891p3206986.html
Sent from the Solr - User mailing list archive at Nabble.com.

question about exception in faceting

2011-07-28 Thread Koji Sekiguchi

If I got an exception during faceting (e.g. undefined field), Solr doesn't
return HTTP 400 but 200 with the exception stack trace in arr name=exception
.../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same 
behavior.

Except FacetComponent, HighlightComponent for example, if I use a bad regex 
pattern
for RegexFragmenter, HighlightComponent throws an exception then Solr return 
400.

Thank you!

koji
-- 
Check out Query Log Visualizer
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Store complete XML record (DIH XPathEntityProcessor)

2011-07-28 Thread solruser@9913

Thanks Chantal
I am ok with the second call and I already tried using that.  Unfortunatly
It reads the whole file into a field.  My file is as below example
xml  
  record 
  ... 
  /record
  
  record 
  ... 
  /record
 
   record 
  ... 
  /record

/xml

Now the XPATH does the 'for each /record' part.  For each record I also need
to store the raw log in there.  If I use the  PlainTextEntityProcessor then
it gives me the whole file (from xml .. /xml ) and not each of the
record /record

Am I using the PlainTextEntityProcessor wrong?

THanks
g


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html
Sent from the Solr - User mailing list archive at Nabble.com.

ShingleFilterFactory class error

2011-07-28 Thread Pradeep Pujari

Hi,

I am trying to create shingles with minShingleSize = 10, but it also returns 
bi-grams too. Heres is my schema defn

filter class=solr.ShingleFilterFactory minShingleSize=10 
maxShingleSize=25
outputUnigrams=false outputUnigramsIfNoShingles=false 
tokenSeparator= /


For the input String Apple - iPad 3G Wi-Fi - 32GB, it breaks into
Apple -
- iPad

My understaing that it should be 10-gram token.

Is it bug or any configuration is to be added. 

Thank you in advance.
Pradeep

RE: ShingleFilterFactory class error

2011-07-28 Thread Steven A Rowe

Pradeep,

As indicated on the wiki 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory,
 the minShingleSize option is not available in Solr versions prior to 3.1.

What version of Solr are you using?

(By the way, I am only replying on solr-user@lucene.apache.org mailing list - 
the d...@lucene.apache.org mailing list is for the development of Lucene/Solr, 
not for questions about using the products; please ask first on 
solr-user@lucene.apache.org, if you think you have found a bug.  If you don't 
get an answer in a day or two, then it makes sense to escalate to 
d...@lucene.apache.org.)

Steve


 -Original Message-
 From: Pradeep Pujari [mailto:prade...@rocketmail.com]
 Sent: Thursday, July 28, 2011 1:43 PM
 To: solr-user@lucene.apache.org
 Subject: ShingleFilterFactory class error
 
 Hi,
 
 I am trying to create shingles with minShingleSize = 10, but it also
 returns bi-grams too. Heres is my schema defn
 
 filter class=solr.ShingleFilterFactory minShingleSize=10
 maxShingleSize=25
 outputUnigrams=false outputUnigramsIfNoShingles=false
 tokenSeparator= /
 
 
 For the input String Apple - iPad 3G Wi-Fi - 32GB, it breaks into
 Apple -
 - iPad 
 
 My understaing that it should be 10-gram token.
 
 Is it bug or any configuration is to be added.
 
 Thank you in advance.
 Pradeep

field with repeated data in index

2011-07-28 Thread Mark juszczec

Hello all

I created an index consisting of orders and the names of the salesmen who
are responsible for the order.

As you can imagine, the same name can be associated with many different
orders.

No problem.  Until I try to do a faceted search on the salesman name field.
 Right now, I have the data indexed as follows:

field name=PRIMARY_AC type=string indexed=false stored=true
required=true default=PRIMARY_AC unavailable/

My faceted search gives me the following response:

response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}}

Which just isn't right.  I KNOW there's data in there, but am confused as to
how to properly identify it to Solr.

Any suggestions?

Mark

RE: field with repeated data in index

2011-07-28 Thread Dyer, James

You need to index the field you want to facet on.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark juszczec [mailto:mark.juszc...@gmail.com] 
Sent: Thursday, July 28, 2011 3:50 PM
To: solr-user@lucene.apache.org
Subject: field with repeated data in index

Hello all

I created an index consisting of orders and the names of the salesmen who
are responsible for the order.

As you can imagine, the same name can be associated with many different
orders.

No problem.  Until I try to do a faceted search on the salesman name field.
 Right now, I have the data indexed as follows:

field name=PRIMARY_AC type=string indexed=false stored=true
required=true default=PRIMARY_AC unavailable/

My faceted search gives me the following response:

response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}}

Which just isn't right.  I KNOW there's data in there, but am confused as to
how to properly identify it to Solr.

Any suggestions?

Mark

Re: field with repeated data in index

2011-07-28 Thread Mark juszczec

James

Wow.  That was fast.  Thanks!

But I thought you couldn't index a field that has duplicate values?

Mark


On Thu, Jul 28, 2011 at 4:53 PM, Dyer, James james.d...@ingrambook.comwrote:

 You need to index the field you want to facet on.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Mark juszczec [mailto:mark.juszc...@gmail.com]
 Sent: Thursday, July 28, 2011 3:50 PM
 To: solr-user@lucene.apache.org
 Subject: field with repeated data in index

 Hello all

 I created an index consisting of orders and the names of the salesmen who
 are responsible for the order.

 As you can imagine, the same name can be associated with many different
 orders.

 No problem.  Until I try to do a faceted search on the salesman name field.
  Right now, I have the data indexed as follows:

 field name=PRIMARY_AC type=string indexed=false stored=true
 required=true default=PRIMARY_AC unavailable/

 My faceted search gives me the following response:


 response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}}

 Which just isn't right.  I KNOW there's data in there, but am confused as
 to
 how to properly identify it to Solr.

 Any suggestions?

 Mark

RE: field with repeated data in index

2011-07-28 Thread Dyer, James

I'm not sure what you're getting at when you mention duplicate values, but 
pretty much any way I interpret it, its allowed.  The only case it wouldn't be 
is if the field is your primary key and you try to index a second document with 
the same key as an existing document.  In that case the second document will 
replace the first.

It might save you some time in the long run, if you haven't already, to go 
through the step-by-step tutorial at 
http://lucene.apache.org/solr/tutorial.html .  There are links there also for 
the Solr Book and the Lucid reference guide.  These are both excellent 
detailed tutorials and should help you get up-to-speed pretty fast.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mark juszczec [mailto:mark.juszc...@gmail.com] 
Sent: Thursday, July 28, 2011 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: field with repeated data in index

James

Wow.  That was fast.  Thanks!

But I thought you couldn't index a field that has duplicate values?

Mark


On Thu, Jul 28, 2011 at 4:53 PM, Dyer, James james.d...@ingrambook.comwrote:

 You need to index the field you want to facet on.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Mark juszczec [mailto:mark.juszc...@gmail.com]
 Sent: Thursday, July 28, 2011 3:50 PM
 To: solr-user@lucene.apache.org
 Subject: field with repeated data in index

 Hello all

 I created an index consisting of orders and the names of the salesmen who
 are responsible for the order.

 As you can imagine, the same name can be associated with many different
 orders.

 No problem.  Until I try to do a faceted search on the salesman name field.
  Right now, I have the data indexed as follows:

 field name=PRIMARY_AC type=string indexed=false stored=true
 required=true default=PRIMARY_AC unavailable/

 My faceted search gives me the following response:


 response={responseHeader={status=0,QTime=358,params={facet=on,indent=true,q=*:*,facet.field=PRIMARY_AC,wt=javabin,rows=0,version=2}},response={numFound=954178,start=0,docs=[]},facet_counts={facet_queries={},facet_fields={PRIMARY_AC={}},facet_dates={},facet_ranges={}}}

 Which just isn't right.  I KNOW there's data in there, but am confused as
 to
 how to properly identify it to Solr.

 Any suggestions?

 Mark

[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7

2011-07-28 Thread Uwe Schindler

Hello Apache Lucene  Apache Solr users,
Hello users of other Java-based Apache projects,

Oracle released Java 7 today. Unfortunately it contains hotspot compiler
optimizations, which miscompile some loops. This can affect code of several
Apache projects. Sometimes JVMs only crash, but in several cases, results
calculated can be incorrect, leading to bugs in applications (see Hotspot
bugs 7070134 [1], 7044738 [2], 7068051 [3]).

Apache Lucene Core and Apache Solr are two Apache projects, which are
affected by these bugs, namely all versions released until today. Solr users
with the default configuration will have Java crashing with SIGSEGV as soon
as they start to index documents, as one affected part is the well-known
Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be
miscompiled, too, leading to index corruption (especially on Lucene trunk
with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]).

These problems were detected only 5 days before the official Java 7 release,
so Oracle had no time to fix those bugs, affecting also many more
applications. In response to our questions, they proposed to include the
fixes into service release u2 (eventually into service release u1, see [6]).
This means you cannot use Apache Lucene/Solr with Java 7 releases before
Update 2! If you do, please don't open bug reports, it is not the
committers' fault! At least disable loop optimizations using the
-XX:-UseLoopPredicate JVM option to not risk index corruptions.

Please note: Also Java 6 users are affected, if they use one of those JVM
options, which are not enabled by default: -XX:+OptimizeStringConcat or
-XX:+AggressiveOpts

It is strongly recommended not to use any hotspot optimization switches in
any Java version without extensive testing!

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Lucene project,
Uwe

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
[2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738
[3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
[4] https://issues.apache.org/jira/browse/LUCENE-3335
[5] https://issues.apache.org/jira/browse/LUCENE-3346
[6] http://s.apache.org/StQ

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

solr.TrieFloatField with multiValued=false treated as `UnInverted multi-valued field`

2011-07-28 Thread RaVbaker

Hi!

I have problem with coding own SearchComponent. My schema.xml is:

...
fieldType name=decimal class=solr.TrieFloatField
precisionStep=2 omitNorms=true positionIncrementGap=0 /
...
field name=price_min type=decimal indexed=true
stored=true multiValued=false /
...

When I use this value in my code leaves this in log:

Jul 28, 2011 4:29:04 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field
{field=price_min,memSize=13758712,tindexSize=28852,time=2407,phase1=2398,nTerms=184366,bigTerms=0,termInstances=3248049,uses=0}

So it suggest that `price_min` is mutliValue but it isn't and I'm confused.
In code those values are also false:

SchemaField sf = searcher.getSchema().getField(field);
 FieldType ft = sf.getType();

sf.multiValued() || ft.multiValuedFieldCache() // is false

I don't also understand why for this value is used UnInvertedField. Could
anybody explain it to me? I'm confused when I try:

FieldCache.StringIndex si =
FieldCache.DEFAULT.getStringIndex(searcher.getReader(), fieldName);
String termText = si.lookup[si.order[docID]];

And here `termText` is everytime equal to '~' and si.order[docID] is at the
end of  their array.
It would be great to see from someone any helpful idea.
-- 
Rafał RaVbaker Piekarski.

Index

2011-07-28 Thread GAURAV PAREEK

Hi All,

How we can check the particular;ar file is not INDEX in solr ?

Regards,
Gaurav

Re: Index

2011-07-28 Thread Jonathan Rochkind

I have no idea what you mean. A file on your disk? What does INDEX in 
solr mean?   Be more specific and clear, perhaps provide an example,  
and maybe someone can help you.


On 7/28/2011 5:45 PM, GAURAV PAREEK wrote:

Hi All,

How we can check the particular;ar file is not INDEX in solr ?

Regards,
Gaurav

Re: Index

2011-07-28 Thread Nicholas Chase

Do you mean, how can you check whether it has been indexed by solr, and 
is searchable?


  Nick

On 7/28/2011 5:45 PM, GAURAV PAREEK wrote:

Hi All,

How we can check the particular;ar file is not INDEX in solr ?

Regards,
Gaurav

Re: Index

2011-07-28 Thread GAURAV PAREEK

Yes NICK you are correct ?
how can you check whether it has been indexed by solr, and is searchable?

On Fri, Jul 29, 2011 at 3:27 AM, Nicholas Chase nch...@earthlink.netwrote:

 Do you mean, how can you check whether it has been indexed by solr, and is
 searchable?

   Nick


 On 7/28/2011 5:45 PM, GAURAV PAREEK wrote:

 Hi All,

 How we can check the particular;ar file is not INDEX in solr ?

 Regards,
 Gaurav

Re: question about exception in faceting

2011-07-28 Thread Koji Sekiguchi

Correction:

 Except FacetComponent, HighlightComponent for example, if I use a bad regex 
 pattern
 for RegexFragmenter, HighlightComponent throws an exception then Solr return 
 400.

Solr returns 500 in this case actually. I think it should be 400 (bad request).

koji
-- 
Check out Query Log Visualizer
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


(11/07/29 1:18), Koji Sekiguchi wrote:
 If I got an exception during faceting (e.g. undefined field), Solr doesn't
 return HTTP 400 but 200 with the exception stack trace inarr 
 name=exception
 .../arr  tag. Why is it implemented so? I checked Solr 1.1 and saw the same 
 behavior.
 
 Except FacetComponent, HighlightComponent for example, if I use a bad regex 
 pattern
 for RegexFragmenter, HighlightComponent throws an exception then Solr return 
 400.
 
 Thank you!
 
 koji

Re: question about exception in faceting

2011-07-28 Thread Chris Hostetter


: If I got an exception during faceting (e.g. undefined field), Solr doesn't
: return HTTP 400 but 200 with the exception stack trace in arr 
name=exception
: .../arr tag. Why is it implemented so? I checked Solr 1.1 and saw the same 
behavior.

super historic, pre-apache, code ... the idea at the time was that some 
parts of the response (like faceting, highlightin, watever...) would be 
optional and if there was an error computing that data it wouldn't fail 
the main request.

that logic should really be ripped out.


-Hoss

Re: question about exception in faceting

2011-07-28 Thread Koji Sekiguchi


(11/07/29 8:52), Chris Hostetter wrote:


: If I got an exception during faceting (e.g. undefined field), Solr doesn't
: return HTTP 400 but 200 with the exception stack trace inarr 
name=exception
: .../arr  tag. Why is it implemented so? I checked Solr 1.1 and saw the same 
behavior.

super historic, pre-apache, code ... the idea at the time was that some
parts of the response (like faceting, highlightin, watever...) would be
optional and if there was an error computing that data it wouldn't fail
the main request.

that logic should really be ripped out.


Thank you for the response what I expected! I opened:

https://issues.apache.org/jira/browse/SOLR-2682

koji
--
Check out Query Log Visualizer
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

43 matches

Mail list logo