date:20111123

you probably wanted to query this:

q=profileId:99964 OR profileId:10076 OR tagName:(MUSIC AND DESIGNER)

otherwise SOLR matches DESIGNER against you default field (whatever it
is) and ANDs it with tagName:MUSIC

On Wed, Nov 23, 2011 at 11:07 AM, sivaprasad sivaprasa...@echidnainc.comwrote:

 Hi,

 I have two indexed fields called profileId and tagName.When i issue a query
 like q=profileId:99964 OR profileId:10076 OR tagName:MUSIC AND
 DESIGNER, i am getting only the results for tagName:MUSIC AND
 DESIGNER.The results are not containing profileId 99964 and 10076.

 Can anybody tell what i am doing wrong?

 Regards,
 Siva

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-on-multiple-fields-is-not-working-tp3530145p3530145.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan

Re: date range in solr 3.1

2011-11-23 Thread do3do3

what i got is the number of this period but i want to get this result only,
what is the query to can get that like 
fq=source:news


--
View this message in context: 
http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3530424.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to configure /select handler ?

2011-11-23 Thread neuron005

Another newbie question here
Browse handler works perfect. Now I want to configure my /select handler
so that I perform ajax-solr on it.
How to perform it. The website 
https://github.com/evolvingweb/ajax-solr
https://github.com/evolvingweb/ajax-solr 
explains how to perform it. I want to do the same by configuring my /sect
handler or I should create a new handler?
Thanks in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-select-handler-tp3530493p3530493.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Search for misspelled search term

2011-11-23 Thread neuron005

Do you mean stemming?
For misspelled  words you will have to edit your dictionary (stopwords.txt)
i think where you can set solution for misspelled words!
Hope So :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3530504.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH Strange Problem

2011-11-23 Thread Chantal Ackermann

Hi Yavar,

my experience with similar problems was that there was something wrong
with the database connection or the database.

Chantal


On Wed, 2011-11-23 at 11:57 +0100, Husain, Yavar wrote:
 I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing 
 data. Indexing and all was working perfectly fine. However today when I 
 started full indexing again, Solr halts/stucks at the line Creating a 
 connection for entity. There are no further messages after that. I 
 can see that DIH is busy and on the DIH console I can see A command is still 
 running, I can also see total rows fetched = 0 and total request made to 
 datasource = 1 and time is increasing however it is not doing anything. This 
 is the exact configuration that worked for me. I am not really able to 
 understand the problem here. Also in the index directory where I am storing 
 the index there are just 3 files: 2 segment files + 1  lucene*-write.lock 
 file.
 ...
 data-config.xml:
 
 dataSource type=JdbcDataSource 
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders 
 user=testUser password=password/
 document
 .
 .
 
 Logs:
 
 INFO: Server startup in 2016 ms
 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter 
 doFullImport
 INFO: Starting Full Import
 Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 
 QTime=11
 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter 
 readIndexerProperties
 INFO: Read dataimport.properties
 Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll
 INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
 Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit
 INFO: SolrDeletionPolicy.onInit: commits:num=1

 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6]
 Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
 INFO: newest commit = 1322041133719
 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 
 call
 INFO: Creating a connection for entity SampleText with URL: 
 jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders
 /PRE
 BR
 **BRThis
  message may contain confidential or proprietary information intended only 
 for the use of theBRaddressee(s) named above or may contain information 
 that is legally privileged. If you areBRnot the intended addressee, or the 
 person responsible for delivering it to the intended addressee,BRyou are 
 hereby notified that reading, disseminating, distributing or copying this 
 message is strictlyBRprohibited. If you have received this message by 
 mistake, please immediately notify us byBRreplying to the message and 
 delete the original message and any copies immediately thereafter.BR
 BR
 Thank you.~BR
 **BR
 FAFLDBR
 PRE

Re: Solr Search for misspelled search term


I have configured specllchecker component in my solrconfig
below is the configuration

requestHandler name=/spellcheck class=solr.SearchHandler lazy=true
lst name=defaults
  
  str name=spellcheck.onlyMorePopularfalse/str
  
  str name=spellcheck.extendedResultsfalse/str
  
  str name=spellcheck.count1/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

using above configuration it works with below url 
http://192.168.1.59:8080/solr/core0/spellcheck?q=sc:directryspellcheck=truespellcheck.build=true

But when i set the same config in my standard request handler then i dont
work,
below is config setting for that

  requestHandler name=standard class=solr.SearchHandler default=true

 lst name=defaults
   str name=echoParamsexplicit/str
   
  str name=spellcheck.onlyMorePopularfalse/str
  
  str name=spellcheck.extendedResultsfalse/str
  
  str name=spellcheck.count1/str 
 /lst
  arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

then its not working with below url
http://192.168.1.59:8080/solr/core0/select?q=sc:directryspellcheck=truespellcheck.build=true.

anybody have any idea?
neuron005 wrote
 
 Do you mean stemming?
 For misspelled  words you will have to edit your dictionary
 (stopwords.txt) i think where you can set solution for misspelled words!
 Hope So :)
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3530526.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to make effective search with fq and q params

Hi Erik , 

Actually right now we can say that almost is done in filtering and passing q
as *:* , but we need to find out a better way if there is any. So according
to pravesh , i m thinking of to pass user entered text in query and date and
other fields in filter query? or as per you q=*:* is fast? 

I have below fields to search 
Search Term  :  User Entered Text Field (passing it in q)
Title : User Entered Text Field (passing it in fq)
Desc : User Entered Text Field (passing it in fq)
Appearing : User Entered Text Field (passing it in fq)
Date Range : (passing it in fq)
Time Zone : (EST , CST ,MST , PST) (passing it in fq)
Category : (multiple choice) (passing it in fq)
Market : (multiple choice) (passing it in fq)
Affiliate Network : (multiple choice) (passing it in fq)

I really appreciate your view.
Meghana
Jeff Schmidt wrote
 
 Hi Erik:
 
 When using [e]dismax, does configuring q.alt=*:* and not specifying q
 affect the performance/caching in any way?
 
 As a side note, a while back I configured q.alt=*:*, and the application
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be
 cached anyway), so use *:* as appropriate without worries.
 
  Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 jas@
 http://www.535consulting.com
 (650) 423-1068
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3529876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wild card search and lower-casing

Ah, I see what you're doing, go for it.

I intend to commit it today, but things happen.

About changing the setLowerCaseExpandedTerms(true), yes
that'll take care of this issue, although it has some
locale-specific assumptions (i.e. string.toLowerCase() uses the
default locale). That may not matter in your situation though.

Best
Erick

On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan dmitry@gmail.com wrote:
Thanks, Erick. I was in fact reading the patch (the one attached as a
file to the aforementioned jira) you updated sometime yesterday. I'll
watch the issue, but as said the change of a hard-coded boolean to its
opposite worked just fine for me.

Best,
Dmitry

On 11/22/11, Erick Erickson erickerick...@gmail.com wrote:
No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But things have changed a bit (one of the things I'll have to do is
create some documentation). You *should* be able to specify
just legacyMultiTerm=true in your fieldType if you want to
apply the 3.x patch to pre 3.6 code. It would be a good field test
if that worked for you.

But you can't do any of this until the JIRA (SOLR-2438) is
marked Resolution: Fixed.

Don't be fooled by Fix Version. Fix Version simply says
that those are the earliest versions it *could* go in.

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote:
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that for
the trailing wildcard there is a minimum of 2 preceding characters for a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
erickerick...@gmail.comwrote:

It may be. The tricky bit is that there is a constant governing the
behavior of
this that restricts it to 3.6 and above. You'll have to change it after
applying
the patch for this to work for you. Should be trivial, I'll leave a note
in the
code about this, look for SOLR-2438 in the 3x code line for the place
to change.

On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote:
Thanks Erick.

Do you think the patch you are working on will be applicable as well to
3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
erickerick...@gmail.com
wrote:

As it happens I'm working on SOLR-2438 which should address this. This
patch
will provide two things:

The ability to define a new analysis chain in your schema.xml,
currently
called
multiterm that will be applied to queries of various sorts,
including wildcard,
prefix, range. This will be somewhat of an expert thing to make
yourself...

In the absence of an explicit definition it'll synthesize a multiterm
analyzer
out of the query analyzer, taking any char fitlers, and
lowercaseFilter (if present),
and ASCIIFoldingfilter (if present) and putting them in the multiterm
analyzer along
with a (hardcoded) WhitespaceTokenizer.

As of 3.6 and 4.0, this will be the default behavior, although you can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while before
getting into the
wild, so I have no intention of trying to get it into the 3.5 release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick

On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
wrote:

You're right:

public SolrQueryParser(IndexSchema schema, String
defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

Please note that lowercaseExpandedTerms uses String.toLowercase()
(uses
default Locale) which is a Locale sensitive operation.

In Lucene AnalyzingQueryParser exists for this purposes, but I am
not
sure if it is ported to solr.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

--
Regards,

Dmitry Kan

need a way so that solr return result for misspelled terms

Hi,

I have configured spellchecker component in my solr. it works with custom
request handler (however its not working with standard request handler , but
this is not concern at now) . but its returning suggestions for the matching
spells, instead of it we want that we can directly get result for relative
spells of misspelled search term.

Can we do this. 
Any help much appreciated.
Meghana


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

Hi!

* Data:
- Solr 3.4;
- 30 shards ~ 13GB, 27-29M docs each shard.

* Machine parameters (Ubuntu 10.04 LTS):
user@Solr:~$ uname -a
Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
x86_64 GNU/Linux
user@Solr:~$ cat /proc/cpuinfo
processor   : 0 - 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
stepping: 2
cpu MHz : 3458.000
cache size  : 12288 KB
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
sse4_2 popcnt aes hypervisor lahf_lm ida arat
bogomips: 6916.00
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
user@Solr:~$ cat /proc/meminfo
MemTotal:   16992680 kB
MemFree:  110424 kB
Buffers:9976 kB
Cached: 11588380 kB
SwapCached:41952 kB
Active:  9860764 kB
Inactive:6198668 kB
Active(anon):4062144 kB
Inactive(anon):   398972 kB
Active(file):5798620 kB
Inactive(file):  5799696 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:  46873592 kB
SwapFree:   46810712 kB
Dirty:36 kB
Writeback: 0 kB
AnonPages:   4424756 kB
Mapped:   940660 kB
Shmem:40 kB
Slab: 362344 kB
SReclaimable: 350372 kB
SUnreclaim:11972 kB
KernelStack:2488 kB
PageTables:68568 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:55369932 kB
Committed_AS:5740556 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  350532 kB
VmallocChunk:   34359384964 kB
HardwareCorrupted: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:   10240 kB
DirectMap2M:17299456 kB

- Apache Tomcat 6.0.32:
!-- java arguments --
-XX:+DisableExplicitGC
-XX:PermSize=512M
-XX:MaxPermSize=512M
-Xmx12G
-Xms3G
-XX:NewSize=128M
-XX:MaxNewSize=128M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=50
-XX:GCTimeRatio=9
-XX:MinHeapFreeRatio=25
-XX:MaxHeapFreeRatio=25
-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/opt/search/tomcat/logs/gc.log

Out search schema is:
- 5 servers with configuration above;
- one tomcat6 application on each server with 6 solr applications.

- Full addresses are:
1) 
http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,http://192.168.1.85:8080/solr6
2) 
http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,http://192.168.1.86:8080/solr12
...
5) 
http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,http://192.168.1.89:8080/solr30
- At another server there is a additional common application with
shards paramerter:
requestHandler name=search class=solr.SearchHandler default=true
lst name=defaults
str name=echoParamsexplicit/str
str 
name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.89:8080/solr30/str
int name=rows10/int
/lst
/requestHandler
- schema and solrconfig are identical for all shards, for first shard
see attach;
- on these servers are only search, indexation is on another
(optimized to 2 segments shards replicate with ssh/rsync scripts).

So now the major problem is huge performance on distributed search.
Take look on, for example, these logs:
This is on 30 shards:
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
status=0 QTime=40712
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
status=0 QTime=36097
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
status=0 QTime=75756
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
status=0 QTime=30342
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
status=0 QTime=55690

Sometimes QTime is more than 15. But when we run identical queries
on one shard separately, QTime is between 200 and 1500.
Does ditributed solr search really slow or our architecture is non
optimal? Or maybe need to use any third-party applications?
Thanks for any replies.

--
Best regards,
Artem

Re: how to make effective search with fq and q params

2011-11-23 Thread Jeff Schmidt

Thanks, Erik.  I'm moving on to edismax, and will set q.alt=*:* and not specify 
q if no user provided terms.

Take it easy,

Jeff

On Nov 22, 2011, at 11:53 AM, Erik Hatcher wrote:

 I think you're using dismax, not edismax. edismax will take q=*:* just fine 
 as it handles all Lucene syntax queries also.  dismax does not.
 
 So, if you're using dismax and there is no actual query (but you want to get 
 facets), you set q.alt=*:* and omit q - that's entirely by design.
 
 If there's a non-empty q parameter, q.alt is not considered so there 
 shouldn't be any issues with always have q.alt set if that's what you want.
 
   Erik
 
 
 On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote:
 
 Hi Erik:
 
 It's not in the SolrJ library, but rather my use of it:
 
 In my application code:
 
 protected static final String SOLR_ALL_DOCS_QUERY = *:*;
 
 /*
 * If no search terms provided, then return all neighbors.
 * Results are to be returned in neighbor symbol alphabetical order.
 */
 
 if (searchTerms == null) {
  searchTerms = SOLR_ALL_DOCS_QUERY;
  nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
 }
 
 So, if no user search terms are provided, I search all documents (there are 
 other fqs in effect) and return them in name order.
 
 That worked just fine.  Then I read more about [e]dismax, and went and 
 configured:
 
 str name=q.alt*:*/str
 
 Then I would get zero results.  It's not a SolrJ issue though, as this 
 request in my browser also resulted in zero results:
 
 http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type
 
 That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
 guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
 handler).
 
 I solved my problem by net setting *:* in my application, and left q.alt=*:* 
 in place.
 
 Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war 
 under Tomcat 6.
 
 Jeff
 
 On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:
 
 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q 
 affect the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With 
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) 
 where there's any *:* used in the SolrJ library.  Can you provide some 
 details on how you used SolrJ?  It'd be good to track this down as that 
 seems like a bug to me.
 
 Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
   Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: need a way so that solr return result for misspelled terms

Meghana -

There's currently no facility in Solr to return results for suggestions 
automatically.  You'll have to code this into your client to make another 
request to Solr for the suggestions returned from the first request.

Erik

On Nov 23, 2011, at 07:58 , meghana wrote:

 Hi,
 
 I have configured spellchecker component in my solr. it works with custom
 request handler (however its not working with standard request handler , but
 this is not concern at now) . but its returning suggestions for the matching
 spells, instead of it we want that we can directly get result for relative
 spells of misspelled search term.
 
 Can we do this. 
 Any help much appreciated.
 Meghana
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to make effective search with fq and q params

Jeff,

Just to clarify - with edismax, q=*:* is fine and matches all documents.  With 
dismax (and also edismax), q.alt with no q is needed to match all documents. 

Erik



On Nov 23, 2011, at 08:20 , Jeff Schmidt wrote:

 Thanks, Erik.  I'm moving on to edismax, and will set q.alt=*:* and not 
 specify q if no user provided terms.
 
 Take it easy,
 
 Jeff
 
 On Nov 22, 2011, at 11:53 AM, Erik Hatcher wrote:
 
 I think you're using dismax, not edismax. edismax will take q=*:* just fine 
 as it handles all Lucene syntax queries also.  dismax does not.
 
 So, if you're using dismax and there is no actual query (but you want to get 
 facets), you set q.alt=*:* and omit q - that's entirely by design.
 
 If there's a non-empty q parameter, q.alt is not considered so there 
 shouldn't be any issues with always have q.alt set if that's what you want.
 
  Erik
 
 
 On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote:
 
 Hi Erik:
 
 It's not in the SolrJ library, but rather my use of it:
 
 In my application code:
 
 protected static final String SOLR_ALL_DOCS_QUERY = *:*;
 
 /*
 * If no search terms provided, then return all neighbors.
 * Results are to be returned in neighbor symbol alphabetical order.
 */
 
 if (searchTerms == null) {
 searchTerms = SOLR_ALL_DOCS_QUERY;
 nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc);
 }
 
 So, if no user search terms are provided, I search all documents (there are 
 other fqs in effect) and return them in name order.
 
 That worked just fine.  Then I read more about [e]dismax, and went and 
 configured:
 
 str name=q.alt*:*/str
 
 Then I would get zero results.  It's not a SolrJ issue though, as this 
 request in my browser also resulted in zero results:
 
 http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type
 
 That was due to the q=*:*.  Once I set, say, q=cancer, I got results.  So I 
 guess this is a [e]dismax thing?  (partner-tmo is the name of my request 
 handler).
 
 I solved my problem by net setting *:* in my application, and left 
 q.alt=*:* in place.
 
 Hope this helps.  Again, this is stock Solr 3.4.0, running the Apache war 
 under Tomcat 6.
 
 Jeff
 
 On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote:
 
 
 On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote:
 When using [e]dismax, does configuring q.alt=*:* and not specifying q 
 affect the performance/caching in any way?
 
 No different than using q=*:* with the lucene query parser.  
 MatchAllDocsQuery is possibly the fastest query out there!  (it simply 
 matches documents in index order, all scores are 1.0)
 
 As a side note, a while back I configured q.alt=*:*, and the application 
 (via SolrJ) still set q=*:* if no user input was provided (faceting). 
 With both of them set that way, I got zero results. (Solr 3.4.0)  
 Interesting.
 
 Ouch.  Really?  I don't see in the code (looking at my trunk checkout) 
 where there's any *:* used in the SolrJ library.  Can you provide some 
 details on how you used SolrJ?  It'd be good to track this down as that 
 seems like a bug to me.
 
Erik
 
 
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's 
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be 
 cached anyway), so use *:* as appropriate without worries.
 
  Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to 
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 
 
 
 
 
 
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 j...@535consulting.com
 http://www.535consulting.com
 (650) 423-1068

Re: how to make effective search with fq and q params

Meghana -

Some important points about q/fq -

  * q is used for scoring.  fq is for filtering, no scoring.

  * fq and q are cached independently

You may want to combine the user entered terms (search term, title, and desc) 
in the q parameter.  It's complicated/advanced, but you can use nested queries 
to achieve a spread of different query contexts with different field 
configurations.  Check out Yonik's blog entry for inspiration: 
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

Erik



On Nov 23, 2011, at 00:59 , meghana wrote:

 Hi Erik , 
 
 Actually right now we can say that almost is done in filtering and passing q
 as *:* , but we need to find out a better way if there is any. So according
 to pravesh , i m thinking of to pass user entered text in query and date and
 other fields in filter query? or as per you q=*:* is fast? 
 
 I have below fields to search 
 Search Term  :  User Entered Text Field (passing it in q)
 Title : User Entered Text Field (passing it in fq)
 Desc : User Entered Text Field (passing it in fq)
 Appearing : User Entered Text Field (passing it in fq)
 Date Range : (passing it in fq)
 Time Zone : (EST , CST ,MST , PST) (passing it in fq)
 Category : (multiple choice) (passing it in fq)
 Market : (multiple choice) (passing it in fq)
 Affiliate Network : (multiple choice) (passing it in fq)
 
 I really appreciate your view.
 Meghana
 Jeff Schmidt wrote
 
 Hi Erik:
 
 When using [e]dismax, does configuring q.alt=*:* and not specifying q
 affect the performance/caching in any way?
 
 As a side note, a while back I configured q.alt=*:*, and the application
 (via SolrJ) still set q=*:* if no user input was provided (faceting). With
 both of them set that way, I got zero results. (Solr 3.4.0)  Interesting.
 
 Thanks,
 
 Jeff
 
 On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote:
 
 If all you're doing is filtering (browsing by facets perhaps), it's
 perfectly fine to have q=*:*.  MatchAllDocsQuery is fast (and would be
 cached anyway), so use *:* as appropriate without worries.
 
 Erik
 
 
 
 On Nov 22, 2011, at 07:18 , pravesh wrote:
 
 Usually,
 
 Use the 'q' parameter to search for the free text values entered by the
 users (where you might want to parse the query and/or apply
 boosting/phrase-sloppy, minimum match,tie etc )
 
 Use the 'fq' to limit the searches to certain criterias like location,
 date-ranges etc.
 
 Also, avoid using the q=*:* as it implicitly translates to
 matchalldocsquery
 
 Regds
 Pravesh
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 --
 Jeff Schmidt
 535 Consulting
 jas@
 http://www.535consulting.com
 (650) 423-1068
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3529876.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: wild card search and lower-casing

Yes, it should be ok, as currently we are on the English side. If that's
beneficial for the effort, I could do a field test on 3.4 after you close
the jira.

Best,
Dmitry

On Wed, Nov 23, 2011 at 2:52 PM, Erick Erickson erickerick...@gmail.comwrote:

Ah, I see what you're doing, go for it.

I intend to commit it today, but things happen.

Best
Erick

Best,
Dmitry

On 11/22/11, Erick Erickson erickerick...@gmail.com wrote:
No, no, no That's something buried in Lucene, it has nothing to
do with the patch! The patch has NOT yet been applied to any
released code.

You could pull the patch from the JIRA and apply it to trunk locally if
you wanted. But there's no patch for 3.x, I'll probably put that up
over the holiday.

But you can't do any of this until the JIRA (SOLR-2438) is
marked Resolution: Fixed.

Don't be fooled by Fix Version. Fix Version simply says
that those are the earliest versions it *could* go in.

Best
Erick

On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com
wrote:
I guess, I have found your comment, thanks.

For our current needs I have just set:

setLowercaseExpandedTerms(true); // changed from default false

in the SolrQueryParser's constructor and that seem to work so far.

In order not to start a separate thread on wildcards. Is it so, that
for
the trailing wildcard there is a minimum of 2 preceding characters for
a
search to happen?

Dmitry

On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson
erickerick...@gmail.comwrote:

It may be. The tricky bit is that there is a constant governing the
behavior of
this that restricts it to 3.6 and above. You'll have to change it
after
applying
the patch for this to work for you. Should be trivial, I'll leave a
note
in the
code about this, look for SOLR-2438 in the 3x code line for the place
to change.

On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com
wrote:
Thanks Erick.

Do you think the patch you are working on will be applicable as
well to
3.4?

Best,
Dmitry

On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson
erickerick...@gmail.com
wrote:

As it happens I'm working on SOLR-2438 which should address this.
This
patch
will provide two things:

In the absence of an explicit definition it'll synthesize a
multiterm
analyzer
out of the query analyzer, taking any char fitlers, and
lowercaseFilter (if present),
and ASCIIFoldingfilter (if present) and putting them in the
multiterm
analyzer along
with a (hardcoded) WhitespaceTokenizer.

As of 3.6 and 4.0, this will be the default behavior, although you
can
explicitly
define a field type parameter to specify the current behavior.

The reason it is on 3.6 is that I want it to bake for a while
before
getting into the
wild, so I have no intention of trying to get it into the 3.5
release.

The patch is up for review now, I'd like another set of eyeballs or
two on it before
committing.

The patch that's up there now is against trunk but I hope to have
a 3x
patch that
I'll apply to the 3x code line after 3.5 RC1 is cut.

Best
Erick

On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com
wrote:

You're right:

public SolrQueryParser(IndexSchema schema, String
defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

Please note that lowercaseExpandedTerms uses String.toLowercase()
(uses
default Locale) which is a Locale sensitive operation.

In Lucene AnalyzingQueryParser exists for this purposes, but I am
not
sure if it is ported to solr.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

Re: Integrating Surround Query Parser

2011-11-23 Thread Ahmet Arslan

 After this i tried with solr3.1-src.
 Please suggest what should i do ?

Please use solr-trunk.
svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk

Re: Solr Performance/Architecture


On 11/22/2011 11:52 PM, Husain, Yavar wrote:

Hi Shawn

That was so great of you to explain the architecture in such a detail. I 
enjoyed reading it multiple times.

I have a question here:

You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am 
using that in my data-config.xml in the sql query itself, something like:

For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=0;
For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample 
where crc32(DocumentId)%2=1;

Will that be a right way? Will it not be a slow query?

Thanks once again.


Those queries look good.  Compared to an unqalified SELECT, I'm sure the 
crc32 will slow it down, but unless your database hardware is not up to 
the job, Solr will probably be more of a bottleneck than the DB.


You can have a generic DIH config and pass the information in with the 
dataimport:



url=jdbc:mysql://${dataimporter.request.dbHost}/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull

snip
SELECT * FROM ${dataimporter.request.dataView}
WHERE (
  (
did gt; ${dataimporter.request.minDid}
AND did lt;= ${dataimporter.request.maxDid}
  )
  ${dataimporter.request.extraWhere}
) AND (crc32(did) % ${dataimporter.request.numShards})
  IN (${dataimporter.request.modVal})

This is the URL template that will work with the above DIH config:

http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBSERVERdbSchema=DBSCHEMAdataView=DATAVIEWnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDIDextraWhere=EXTRAWHERE

Under normal circumstances extraWhere is blank.  It's there for 
special-purpose importing.


Thanks,
Shawn

Re: Huge Performance: Solr distributed search

Hello,

Is this log from the frontend SOLR (aggregator) or from a shard?
Can you merge, e.g. 3 shards together or is it much effort for your team?

In our setup we currently have 16 shards with ~30GB each, but we rarely
search in all of them at once.

Best,
Dmitry

On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote:

 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor   : 0 - 3
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 44
 model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
 stepping: 2
 cpu MHz : 3458.000
 cache size  : 12288 KB
 fpu : yes
 fpu_exception   : yes
 cpuid level : 11
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips: 6916.00
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:   16992680 kB
 MemFree:  110424 kB
 Buffers:9976 kB
 Cached: 11588380 kB
 SwapCached:41952 kB
 Active:  9860764 kB
 Inactive:6198668 kB
 Active(anon):4062144 kB
 Inactive(anon):   398972 kB
 Active(file):5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:   0 kB
 Mlocked:   0 kB
 SwapTotal:  46873592 kB
 SwapFree:   46810712 kB
 Dirty:36 kB
 Writeback: 0 kB
 AnonPages:   4424756 kB
 Mapped:   940660 kB
 Shmem:40 kB
 Slab: 362344 kB
 SReclaimable: 350372 kB
 SUnreclaim:11972 kB
 KernelStack:2488 kB
 PageTables:68568 kB
 NFS_Unstable:  0 kB
 Bounce:0 kB
 WritebackTmp:  0 kB
 CommitLimit:55369932 kB
 Committed_AS:5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:  350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted: 0 kB
 HugePages_Total:   0
 HugePages_Free:0
 HugePages_Rsvd:0
 HugePages_Surp:0
 Hugepagesize:   2048 kB
 DirectMap4k:   10240 kB
 DirectMap2M:17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
 http://192.168.1.85:8080/solr6
 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
 http://192.168.1.86:8080/solr12
 ...
 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
 http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
 192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
 status=0 QTime=30342
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
 status=0 QTime=55690

 Sometimes QTime is more than 15. But when we run identical queries
 on one shard separately, QTime is between 200 and 1500.
 Does ditributed solr search really slow or our architecture

Re: Collection Distribution vs Replication in Solr

2011-11-23 Thread Mark Miller


On Oct 27, 2011, at 2:57 PM, Alireza Salimi wrote:

 Hi guys,
 
 If we ignore the features that Replication provides (
 http://wiki.apache.org/solr/SolrReplication#Features),
 which approach is better?
 Is there any performance problems with Replication?
 
 Replications seems quite easier (no special configuration, ssh setting, cron
 setting),
 while rsync is a robust protocol.
 
 Which one do you recommend?
 
 Thanks
 
 -- 
 Alireza Salimi
 Java EE Developer


Replication with scripts is basically deprecated I'd say. Java replication is 
the path forward and what I would use.

- Mark Miller
lucidimagination.com

Re: need a way so that solr return result for misspelled terms

Hi Erik, 

Thanks for your reply. i come to know  that  Lucene provides the fuzzy
search by applying tilde(~) symbol at the end of search with like
delll~0.8

can we apply such fuzzy logic in solr in any way?

Thanks 
Meghana
Erik Hatcher-4 wrote
 
 Meghana -
 
 There's currently no facility in Solr to return results for suggestions
 automatically.  You'll have to code this into your client to make another
 request to Solr for the suggestions returned from the first request.
 
   Erik
 
 On Nov 23, 2011, at 07:58 , meghana wrote:
 
 Hi,
 
 I have configured spellchecker component in my solr. it works with custom
 request handler (however its not working with standard request handler ,
 but
 this is not concern at now) . but its returning suggestions for the
 matching
 spells, instead of it we want that we can directly get result for
 relative
 spells of misspelled search term.
 
 Can we do this. 
 Any help much appreciated.
 Meghana
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To push the terms.limit parameter from the master core to all the shard cores.

2011-11-23 Thread Mark Miller


On Nov 22, 2011, at 1:31 PM, mechravi25 wrote:

 Can you please suggest the definition of the terms component for the
 underlying shard cores. 

If you look at my earlier email, you will see the limit is set in invariants 
rather than defaults. This makes it so the param cannot be dynamically 
overridden, so it's what you want to use on your underlying shards.

- Mark Miller
lucidimagination.com

Re: DIH Strange Problem


On 11/23/2011 5:21 AM, Chantal Ackermann wrote:

Hi Yavar,

my experience with similar problems was that there was something wrong
with the database connection or the database.

Chantal


It's also possible that your JDBC driver might be trying to buffer the 
entire result set.  There's a link on the wiki specifically for this 
problem on MS SQL server.  Hopefully it's that, but Chantal could be 
right too.


http://wiki.apache.org/solr/DataImportHandlerFaq

Here's the URL to the specific paragraph, but it's likely that it won't 
survive the email trip in a clickable form:


http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

Thanks,
Shawn

UpdateRequestProcessor - processCommit

2011-11-23 Thread Matthew Parker

TWIMC:

I creating a custom UpdateRequestProcessor chain, where I need to commit
records to a database once the import process has completed.

I'm assuming the processCommit method is called for each
UpdateRequestProcessor chain class when the records are being commited to
the Lucene index.

I'm debugging the processor chain using the debug functionality in the
dataimport.jsp page, and I have selected verbose and commit as options.
When I import 10 records,
the processAddd methods are getting called, but the processCommit methods
aren't.

Is there something obvious that I'm missing here?

I'm using SOLR 1.4

TIA,

M.

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: need a way so that solr return result for misspelled terms

2011-11-23 Thread Ahmet Arslan

 I have configured spellchecker component in my solr. it
 works with custom
 request handler (however its not working with standard
 request handler , but
 this is not concern at now) . but its returning suggestions
 for the matching
 spells, instead of it we want that we can directly get
 result for relative
 spells of misspelled search term.

You might be interested in this :

http://sematext.com/products/dym-researcher/index.html

Re: Integrating Surround Query Parser

2011-11-23 Thread Rahul Mehta

is this is the trunk of solr 4.0 , can't i implement in solr 3.1 .?

On Wed, Nov 23, 2011 at 7:23 PM, Ahmet Arslan iori...@yahoo.com wrote:

  After this i tried with solr3.1-src.
  Please suggest what should i do ?

 Please use solr-trunk.
 svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk




-- 
Thanks  Regards

Rahul Mehta

Re: Collection Distribution vs Replication in Solr

2011-11-23 Thread Alireza Salimi

Yeah, and actually later I've found someone mentioned that
they had done some benchmarks and found that replication
is faster than collection distribution.

Thanks

On Wed, Nov 23, 2011 at 9:02 AM, Mark Miller markrmil...@gmail.com wrote:


 On Oct 27, 2011, at 2:57 PM, Alireza Salimi wrote:

  Hi guys,
 
  If we ignore the features that Replication provides (
  http://wiki.apache.org/solr/SolrReplication#Features),
  which approach is better?
  Is there any performance problems with Replication?
 
  Replications seems quite easier (no special configuration, ssh setting,
 cron
  setting),
  while rsync is a robust protocol.
 
  Which one do you recommend?
 
  Thanks
 
  --
  Alireza Salimi
  Java EE Developer


 Replication with scripts is basically deprecated I'd say. Java replication
 is the path forward and what I would use.

 - Mark Miller
 lucidimagination.com














-- 
Alireza Salimi
Java EE Developer

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

 Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator

 Can you merge, e.g. 3 shards together or is it much effort for your team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry

Any another ideas?

On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote:
 Hello,

 Is this log from the frontend SOLR (aggregator) or from a shard?
 Can you merge, e.g. 3 shards together or is it much effort for your team?

 In our setup we currently have 16 shards with ~30GB each, but we rarely
 search in all of them at once.

 Best,
 Dmitry

 On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote:

 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor       : 0 - 3
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
 stepping        : 2
 cpu MHz         : 3458.000
 cache size      : 12288 KB
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips        : 6916.00
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:       16992680 kB
 MemFree:          110424 kB
 Buffers:            9976 kB
 Cached:         11588380 kB
 SwapCached:        41952 kB
 Active:          9860764 kB
 Inactive:        6198668 kB
 Active(anon):    4062144 kB
 Inactive(anon):   398972 kB
 Active(file):    5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      46873592 kB
 SwapFree:       46810712 kB
 Dirty:                36 kB
 Writeback:             0 kB
 AnonPages:       4424756 kB
 Mapped:           940660 kB
 Shmem:                40 kB
 Slab:             362344 kB
 SReclaimable:     350372 kB
 SUnreclaim:        11972 kB
 KernelStack:        2488 kB
 PageTables:        68568 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    55369932 kB
 Committed_AS:    5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:       10240 kB
 DirectMap2M:    17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
 http://192.168.1.85:8080/solr6
 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
 http://192.168.1.86:8080/solr12
 ...
 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
 http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
 192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr

Re: need a way so that solr return result for misspelled terms

Sure... if you're using the lucene query parser and put a ~ after every term 
in the query :)

But that would mean that either the users or your application do this.

Erik

On Nov 23, 2011, at 09:03 , meghana wrote:

 Hi Erik, 
 
 Thanks for your reply. i come to know  that  Lucene provides the fuzzy
 search by applying tilde(~) symbol at the end of search with like
 delll~0.8
 
 can we apply such fuzzy logic in solr in any way?
 
 Thanks 
 Meghana
 Erik Hatcher-4 wrote
 
 Meghana -
 
 There's currently no facility in Solr to return results for suggestions
 automatically.  You'll have to code this into your client to make another
 request to Solr for the suggestions returned from the first request.
 
  Erik
 
 On Nov 23, 2011, at 07:58 , meghana wrote:
 
 Hi,
 
 I have configured spellchecker component in my solr. it works with custom
 request handler (however its not working with standard request handler ,
 but
 this is not concern at now) . but its returning suggestions for the
 matching
 spells, instead of it we want that we can directly get result for
 relative
 spells of misspelled search term.
 
 Can we do this. 
 Any help much appreciated.
 Meghana
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocomplete(terms) performance problem

2011-11-23 Thread roySolr

Thanks for your answer Nagendra,

The problem is i want to do some infix searches. When i search for sisco i
want the autocomplete with san fran*sisco*. In the example you gave me
it's also not possible.

Roy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3530891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Huge Performance: Solr distributed search

If the response time from each shard shows decent figures, then aggregator
seems to be a bottleneck. Do you btw have a lot of concurrent users?

On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?

 On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote:
  Hello,
 
  Is this log from the frontend SOLR (aggregator) or from a shard?
  Can you merge, e.g. 3 shards together or is it much effort for your team?
 
  In our setup we currently have 16 shards with ~30GB each, but we rarely
  search in all of them at once.
 
  Best,
  Dmitry
 
  On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com
 wrote:
 
  Hi!
 
  * Data:
  - Solr 3.4;
  - 30 shards ~ 13GB, 27-29M docs each shard.
 
  * Machine parameters (Ubuntu 10.04 LTS):
  user@Solr:~$ uname -a
  Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
  x86_64 GNU/Linux
  user@Solr:~$ cat /proc/cpuinfo
  processor   : 0 - 3
  vendor_id   : GenuineIntel
  cpu family  : 6
  model   : 44
  model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
  stepping: 2
  cpu MHz : 3458.000
  cache size  : 12288 KB
  fpu : yes
  fpu_exception   : yes
  cpuid level : 11
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
  mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
  rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
  tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
  sse4_2 popcnt aes hypervisor lahf_lm ida arat
  bogomips: 6916.00
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management:
  user@Solr:~$ cat /proc/meminfo
  MemTotal:   16992680 kB
  MemFree:  110424 kB
  Buffers:9976 kB
  Cached: 11588380 kB
  SwapCached:41952 kB
  Active:  9860764 kB
  Inactive:6198668 kB
  Active(anon):4062144 kB
  Inactive(anon):   398972 kB
  Active(file):5798620 kB
  Inactive(file):  5799696 kB
  Unevictable:   0 kB
  Mlocked:   0 kB
  SwapTotal:  46873592 kB
  SwapFree:   46810712 kB
  Dirty:36 kB
  Writeback: 0 kB
  AnonPages:   4424756 kB
  Mapped:   940660 kB
  Shmem:40 kB
  Slab: 362344 kB
  SReclaimable: 350372 kB
  SUnreclaim:11972 kB
  KernelStack:2488 kB
  PageTables:68568 kB
  NFS_Unstable:  0 kB
  Bounce:0 kB
  WritebackTmp:  0 kB
  CommitLimit:55369932 kB
  Committed_AS:5740556 kB
  VmallocTotal:   34359738367 kB
  VmallocUsed:  350532 kB
  VmallocChunk:   34359384964 kB
  HardwareCorrupted: 0 kB
  HugePages_Total:   0
  HugePages_Free:0
  HugePages_Rsvd:0
  HugePages_Surp:0
  Hugepagesize:   2048 kB
  DirectMap4k:   10240 kB
  DirectMap2M:17299456 kB
 
  - Apache Tomcat 6.0.32:
  !-- java arguments --
  -XX:+DisableExplicitGC
  -XX:PermSize=512M
  -XX:MaxPermSize=512M
  -Xmx12G
  -Xms3G
  -XX:NewSize=128M
  -XX:MaxNewSize=128M
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:+CMSClassUnloadingEnabled
  -XX:CMSInitiatingOccupancyFraction=50
  -XX:GCTimeRatio=9
  -XX:MinHeapFreeRatio=25
  -XX:MaxHeapFreeRatio=25
  -verbose:gc
  -XX:+PrintGCTimeStamps
  -Xloggc:/opt/search/tomcat/logs/gc.log
 
  Out search schema is:
  - 5 servers with configuration above;
  - one tomcat6 application on each server with 6 solr applications.
 
  - Full addresses are:
  1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
  http://192.168.1.85:8080/solr6
  2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
  http://192.168.1.86:8080/solr12
  ...
  5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
  http://192.168.1.89:8080/solr30
  - At another server there is a additional common application with
  shards paramerter:
  requestHandler name=search class=solr.SearchHandler default=true
  lst name=defaults
  str name=echoParamsexplicit/str
  str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
  192.168.1.89:8080/solr30/str
  int name=rows10/int
  /lst
  /requestHandler
  - schema and solrconfig are identical for all shards, for first shard
  see attach;
  - on these servers are only search, indexation is on another
  (optimized to 2 segments shards replicate with ssh/rsync scripts).
 
  So now the major problem is huge performance on distributed search.
  Take look on, for example, these logs:
  This is on 30 shards:
  INFO: [] webapp=/solr

Re: how to : multicore setup with same config files

2011-11-23 Thread Vadim Kisselmann

Hi,
yes, see http://wiki.apache.org/solr/DistributedSearch
Regards
Vadim


2011/11/2 Val Minyaylo vminya...@centraldesktop.com

 Have you tried to query multiple cores at same time?


 On 10/31/2011 8:30 AM, Vadim Kisselmann wrote:

 it works.
 it was one wrong placed backslash in my config;)
 sharing the config/schema files is not a problem.
 regards vadim


 2011/10/31 Vadim 
 Kisselmannv.kisselmann@**googlemail.comv.kisselm...@googlemail.com
 

  Hi folks,

 i have a small blockade in the configuration of an multicore setup.
 i use the latest solr version (4.0) from trunk and the example (with
 jetty).
 single core is running without problems.

 We assume that i have this structure:

 /solr-trunk/solr/example/**multicore/

solr.xml

core0/

core1/


 /solr-data/

   /conf/

 schema.xml

 solrconfig.xml

   /data/

 core0/

   index

 core1/

   index


 I want so share the config-files(same instanceDir but different docDir)

 How can i configure this so that it works(solrconfig.xml, solr.xml)?

 Do i need the directories for core0/core1 in solr-trunk/...?


 I found issues in Jira with old patches which unfortunately doesn't work.


 Thanks and Regards

 Vadim

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

 If the response time from each shard shows decent figures, then aggregator 
 seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
 is not a problem, but we expect from 1K to 10K of concurrent users and maybe 
 more
On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator
 seems to be a bottleneck. Do you btw have a lot of concurrent users?

 On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?


-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com

Re: Integrating Surround Query Parser

2011-11-23 Thread Ahmet Arslan


 is this is the trunk of solr 4.0 ,
 can't i implement in solr 3.1 .?

Author of the patch would know answer to this. But why not use trunk?

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Robert Stewart

If you request 1000 docs from each shard, then aggregator is really
fetching 30,000 total documents, which then it must merge (re-sort
results, and take top 1000 to return to client).  Its possible that
SOLR merging implementation needs optimized, but it does not seem like
it could be that slow.  How big are the documents you return (how many
fields, avg KB per doc, etc.)?  I would take a look at network to make
sure that is not some bottleneck, and also to make sure there is not
some underlying issue making 30 concurrent HTTP requests from the
aggregator.  I am not an expert in Java, but under .NET there is a
setting that limits concurrent out-going HTTP requests from a process
that must be over-ridden via configuration, otherwise by default is
very limiting.

Does performance get much better if you only request top 100, or top
10 documents instead of top 1000?

What if you only request a couple fields, instead of fl=*?

What if you only search 10 shards instead of 30?

I would collect those numbers and try to determine if time increases
linearly or not as you increase shards and/or # of docs.





On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh arco...@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator 
 seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
 is not a problem, but we expect from 1K to 10K of concurrent users and maybe 
 more
 On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator
 seems to be a bottleneck. Do you btw have a lot of concurrent users?

 On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?


 --
 Best regards,
 Artem Lokotosh        mailto:arco...@gmail.com

Problem with Solr logging under Jetty

I am having a problem with jdk logging with Solr, using the jetty 
included with Solr.


In jetty.xml, I have the following defined:
Call class=java.lang.System name=setProperty
Argjava.util.logging.config.file/Arg
Argetc/logging.properties/Arg
/Call

Contents of etc/logging.properties:
==
#  Logging level
.level=WARNING

# Write to a file
handlers = java.util.logging.FileHandler

# Write log messages in human readable format:
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.ConsoleHander.formatter = 
java.util.logging.SimpleFormatter


# Log to the log subdirectory, with log files named solr_log-n.log
java.util.logging.FileHandler.pattern = ./log/solr_log-%g.log
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.count = 10
java.util.logging.FileHandler.limit = 10485760
==

This actually all seems to work perfectly at first.  I changed the 
logging level to INFO in the solr admin, and it still seemed to work.  
Then at some point it stopped logging to solr_log-0.log and started 
logging to stderr.  My init script for Solr sends that to a file, but 
there's no log rotation on that file and it is overwritten whenever Solr 
is restarted.


With the same config, OS version, java version, and everything else I 
can think of, my test server is still working, but all of my production 
servers aren't.  It does seem to be related to changing the log level to 
INFO in the gui, but making that change doesn't make it fail right away.


What information can I provide to help troubleshoot this?

Thanks,
Shawn

Highlighting too much, indexing not seeing commas?

2011-11-23 Thread Robert Brown


Solr 3.3.0

I have a field/type indexed as below.

For a particular document the content of this field is 
'FreeBSD,Perl,Linux,Unix,SQL,MySQL,Exim,Postgresql,Apache,Exim'


Using eDismax, mm=1

When I query for...

+perl +(apache sql) +(linux unix)

Strangely, the highlighting is being returned as...

FreeBSD,emPerl,Linux,Unix,SQL,MySQL,Exim,Postgresql,Apache/em,Exim


The full call is...

/select/?qt=coreq=%2Bperl%20%2B%28apache%20sql%29%20%2B%28linux%20unix%29fl=skillshl=truehl.fl=skillsfq=id:2819615

I've checked the matching in the online analyser which looks fine, so 
can't understand why the highlighting isn't correct, I would have 
thought the highlighting would have highlighted in the same way the 
analyser tool does?




Is it an index-time/field type issue, or am I missing something in the 
request?



Thanks in advance...



fieldType name=textgen class=solr.TextField 
positionIncrementGap=100

analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
		filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
		filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
		filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
		filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
		filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
		filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


field name=skills   type=textgen   indexed=true  
stored=true  multiValued=false /






--

IntelCompute
Web Design  Local Online Marketing

http://www.intelcompute.com

Re: Architecture and Capacity planning for large Solr index

Whether three shards will give you adequate throughput is not an
answerable question. Here's what I suggest. Get a single box
of the size you expect your servers to be and index 1/3 of your
documents on it. Run stress tests. That's really the only way to
be fairly sure your hardware is adequate.

As far as SANs are concerned, local storage is almost always
better. I'd advise against trying to share the index amongst
slaves, SAN or not. And using the SAN for each slave's copy
seems unnecessary with storage as cheap as it is, what
advantage do you see in this scenario?

Best
Erick

On Mon, Nov 21, 2011 at 3:18 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:
Thanks Otis !
Please ignore my earlier email which does not have all the information.

My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records - 1.2 million
- Solr index size for these records comes to approximately 200 - 220
GB. (includes large attachments)
- Approx 250 users who will be searching the applicaiton with a peak of
1 search request every 40 seconds.

I am planning to address this using Solr distributed search on a VMWare
virtualized environment as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves
(load balanced)

2. Master configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 300 GB disk space

3. Slave configuration for each server is as follows
- 4 CPUs
- 16 GB RAM
- 150 GB disk space

4. I am planning to use SAN instead of local storage to store Solr index.

And my questions are as follows:
Will 3 shards serve the purpose here ?
Is SAN a a good option for storing solr index, given the high index volume ?

On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:

Thanks !

My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.

1. Whole index to be split up between 3 shards, with 3 masters and 6
slaves (load balanced)
2. Master configuration
will be 4 CPU

On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

Hi Rahul,

This is unfortunately not enough information for anyone to give you very
precise answers, so I'll just give some rough ones:

* best disk - SSD :)
* CPU - multicore, depends on query complexity, concurrency, etc.
* sharded search and failover - start with SolrCloud, there are a couple
of pages about it on the Wiki and
http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Tuesday, October 11, 2011 11:47 AM
Subject: Architecture and Capacity planning for large Solr index

Hi All,

I am working on a Solr search based project, and would highly appreciate
help/suggestions from you all regarding Solr architecture and capacity
planning.
Details of the project are as follows

1. There are 2 databases from which, data needs to be indexed and made
searchable,
- Production
- Archive
2. Production database will retain 6 months old data and archive data
every
month.
3. Archive database will retain 3 years old data.
4. Database is SQL Server 2008 and Solr version is 3.1

Data to be indexed contains a huge volume of attachments (PDF, Word,
excel
etc..), approximately 200 GB per month.
We are planning to do a full index every month (multithreaded) and
incremental indexing on a daily basis.
The Solr index size is coming to approximately 25 GB per month.

If we were to use distributed search, what would be the best
configuration
for Production as well as Archive indexes ?
What would be the best CPU/RAM/Disk configuration ?
How can I implement failover mechanism for sharded searches ?

Please let me know in case I need to share more information.

--
Thanks and Regards
Rahul A. Warawdekar

Re: Problem with pdf files indexing

The first thing I'd do is go over to the server and
try using the admin interface to query on *:*. If that
returns nothing, look at the admin/schema browser page
and see what's in your fields, if anything. Then go back
to SolrJ and work on the query part sans the indexing
part once you're sure you have data to work with.

Also, do you Solr logs show anything?

Best
Erick

On Tue, Nov 22, 2011 at 4:13 AM, Dali medalibenmans...@gmail.com wrote:
 Hi !I'm using solr 3.3 version and i have some pdf files which i want to
 index. I followed instructions from the wiki page:
 http://wiki.apache.org/solr/ExtractingRequestHandler
 The problem is that i can add my documents to Solr but i cannot request
 them. Here is what i have:

 *solrconfig.xml*:
 requestHandler name=/update/extract
                  startup=lazy
                  class=solr.extraction.ExtractingRequestHandler 
    lst name=defaults
      str name=fmap.contenttext/str
      str name=lowernamestrue/str
      str name=uprefixignored_/str
      str name=captureAttrtrue/str
      str name=fmap.alinks/str
      str name=fmap.divignored_/str
    /lst
  /requestHandler

 *schema.xml *:
 field name=title type=string indexed=true stored=true/
  field name=author type=string indexed=true stored=true /
  field name=text type=text_general indexed=true stored=true
 multiValued=true/

 *data-config.xml* :
  ...
 dataSource type=BinFileDataSource name=ds-file/
 ...
  entity  processor=TikaEntityProcessor  dataSource=ds-file
 url=../${document.filename}
                                                field column=Author 
 name=author meta=true/
                                                field column=title 
 name=title meta=true/
                                                field column=text 
 name=text/
 /entity
 ...

 I use Solrj to add documents as follows:
 SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/solr;);
           ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest(/update/extract);
           up.addFile(new File(d:\\test.pdf));
           up.setParam(literal.id, test);
           up.setParam(extractOnly, true);
           server.commit();
           NamedList result = server.request(up);
           System.out.println(Result:  + result);  // can display information
 about test.pdf
           QueryResponse rsp = server.query( new SolrQuery( *:*) );
           System.out.println(rsp:  + rsp); // returns nothing

 Any suggestion?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re : AW: How to select all docs of 'today' ?

One subtlety to note is that caching is messed up by this form
since NOW evaluates to the second, and submitting two
successive queries exactly like this won't re-use the cache. On
a query like this it may not matter unless you're paging

But on filter queries, its a good habit to cultivate to write
something like
[NOW/DAY TO NOW/DAY+1DAY]
which will be reused until midnight tonight...

Best
Erick

On Tue, Nov 22, 2011 at 12:02 PM, Danicela nutch
danicela-nu...@mail.com wrote:
 Thanks it works.

  All this is based on the fact that NOW/DAY means the beginning of the day.

 - Message d'origine -
 De : sebastian.pet...@tib.uni-hannover.de
 Envoyés : 22.11.11 16:46
 À : solr-user@lucene.apache.org
 Objet : AW: How to select all docs of 'today' ?

  Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian 
 -Ursprüngliche Nachricht- Von: Danicela nutch 
 [mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 
 An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? 
 Hi, I have a fetch-time (date) field to know when the documents were fetched. 
 I want to make a query to get all documents fetched today. I tried : 
 fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] 
 (it returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents 
 fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect 
 too. Do you have any idea ? Thanks in advance.

Re: Problems with AutoSuggest feature(Terms Components)

I'll have to defer that to one of the sharding experts.

Best
Erick

On Tue, Nov 22, 2011 at 1:28 PM, mechravi25 mechrav...@yahoo.co.in wrote:
 Hi Erick,

 Thanks for your reply. I would know all the options that can be given under
 the defaults section and how they can be overridden. is there any
 documentation available in solr forum. Cos we tried searching and wasn't
 able to succeed.

 My Exact scenario is that, I have one master core which has many underlying
 shards core(Disturbed architecture). I want the terms.limit should be
 defaulted to 10 in the underlying shards cores. When i hit the master core,
 it will in-turn hit the underlying shard cores. At this point of time, the
 terms.limit which has been passed to the master core has to passed to these
 underlying shard cores overriding the default value set. Can you please
 suggest the definition of the terms component for the underlying shard
 cores.

 Regards,
 Sivaganesh


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3528597.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: FunctionQuery score=0

2011-11-23 Thread Chris Hostetter


: Which answers my query needs. BUT, my boost function actually changes some
: of the results to be of score 0, which I want to be excluded from the
: result set.

Ok .. so the crux of the issue is that your boost function results in a 
value of 0 for some documents, and you would like those documents excluded 
from your results...

eqsim(alltokens,xyz)

eqsim is not a function thta ships with Solr (as far as i know) so i'm 
guessing it's something custom .. can you clarify what it does?

: 2) This is why I used the frange query to solve the issue with the score 0:
: q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08
: categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '})
: 
: But this time, the remaining results lost their *boosted* scores, and
: therefore the sort by score got all mixed up.

correct: frange produces a ConstantScoreQuery, it can only be used to 
filter documents based on wether the function it wraps falls in/out of the 
range.

: 3) I assume I can use filter queries, but from my understanding FQs
: actually perform another query before the main one and these queries are
: expensive in time and I would like to avoid it if possible.

Unless you actaully see notisable performance problems I wouldn't assume 
it will be an issue -- test first, get it working, then optimize if it's 
too slow.  For most people the overhead of the fq won't a factor.  

One option you might consider is the cache=false local param which tells 
Solr not to cache the fq (handy if you know the query you are 
filtering on is not going to be reused much) and since it's not being 
cached, Solr will execute it in parallel with the main query and ignore 
anything that it already knows isn't going to matter in the final query.

In your case however, you can already optimize the fq solution a bit 
because what you really need to filter out isn't documents matching your 
main query with a score less then zero; that set is the same as the set of 
documents for whom your eqsim function returns 0, so you can just use 
*that* in your fq.  Something like this should work...

q={!edismax ... boost=$eqsim}
fq={!frange l=0 incl=false v=$eqsim}
eqsim=eqsim(alltokens,xyz)

...but there may still be ways to clean that up and make it faster 
depending on what exactly your eqsim function does (ie: there may be a 
simple query that can be faster then that frange to identify the docs 
that get non-zero values from that function.

-Hoss

Re: Solr dismax scoring and weight

2011-11-23 Thread darul

Thanks a lot Erick for this explanation. Do you mean words are stored in
bytes, that's it ? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3531917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: strange behavior of scores and term proximity use

2011-11-23 Thread Ariel Zerbib

I tested with the version 4.0-2011-11-04_09-29-42.

Ariel


2011/11/17 Erick Erickson erickerick...@gmail.com

 Hmmm, I'm not seeing similar behavior on a trunk from today, when did
 you get your copy?

 Erick

 On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib ariel.zer...@gmail.com
 wrote:
  Hi,
 
  For this term proximity query: ab_main_title_l0:to be or not to be~1000
 
 
 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=truehttp://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22%7E1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true
 
  The third first results are the following one:
 
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeader
   int name=status0/int
   int name=QTime5/int
  /lst
  result name=response numFound=318 start=0 maxScore=3.0814114
   doc
 long name=id2315190010001021/long
 arr name=ab_main_title_l0
   strog54ct8n To be or not to be a Jew. 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
   doc
 long name=id2313006480001021/long
 arr name=ab_main_title_l0
   strog54ct8n To be or not to be 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
   doc
 long name=id2356410250001021/long
 arr name=ab_main_title_l0
   strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str
 /arr
 float name=score3.0814114/float/doc
  /result
  lst name=debug
   str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=querystringab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000/str
   str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or
  not to be 5w8ojsx2~1000)/str
   str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not
  to be 5w8ojsx2~1000/str
   lst name=explain
 str name=2315190010001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 378403, product of:
 0.57735026 = tf(freq=0.3334), with freq of:
   0.3334 = phraseFreq=0.3334
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=378403)
  /str
 str name=2313006480001021
  9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of:
   9.244234 = fieldWeight in 482807, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=482807)
  /str
 str name=2356410250001021
  5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be
  5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of:
   5.337161 = fieldWeight in 1317563, product of:
 0.57735026 = tf(freq=0.3334), with freq of:
   0.3334 = phraseFreq=0.3334
 29.581549 = idf(), sum of:
   1.0012436 = idf(docFreq=3297332, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   4.3826413 = idf(docFreq=112108, maxDocs=3301436)
   6.3982043 = idf(docFreq=14937, maxDocs=3301436)
   3.0405464 = idf(docFreq=429046, maxDocs=3301436)
   5.3583193 = idf(docFreq=42257, maxDocs=3301436)
   1.0017256 = idf(docFreq=3295743, maxDocs=3301436)
 0.3125 = fieldNorm(doc=1317563)
  /str
  /response
 
  The used version is a 4.0 October snapshot.
 
  I have 2 questions about the result:
  - Why debug print and scores in result are different?
  - What is the expected behavior of this kind of term proximity query?
   - The debug scores seem to be well ordered but the result scores
  seem to be wrong.
 
 
  Thanks,
  Ariel

Re: Separate ACL and document index

2011-11-23 Thread Robert Stewart

I have used two different ways:

1) Store mapping from users to documents in some external database
such as MySQL.  At search time, lookup mapping for user to some unique
doc ID or some group ID, and then build query or doc set which you can
cache in SOLR process for some period.  Then use that as a filter in
your search.  This is more involved approach but better if you have
lots of ACLs per user, but it is non-trivial to implement it well.  I
used this in a system with over 100 million docs, and approx. 20,000
ACLs per user.  The ACL mapped user to a set of group IDs, and each
group could have 10,000+ documents.

2) Generate a query filter that you pass to SOLR as part of the
search.  Potentially it could be a pretty large query if user has
granular ACL over may documents or groups.  I've seen it work ok with
up to 1000 or so ACLs per user query.  So you build that filter query
from the client using some external database to lookup user ACLs
before sending request to SOLR.

Bob


On Tue, Nov 22, 2011 at 10:48 PM, Floyd Wu floyd...@gmail.com wrote:
 Hi there,

 Is it possible to separate ACL index and document index and achieve to
 search by user role in SOLR?

 Currently my implementation is to index ACL with document, but the
 document itself change frequently. I have to perform rebuild index
 every time when ACL change. It's heavy for whole system due to
 document are so many and content are huge.

 Do you guys have any solution to solve this problem. I've been read
 mailing list for a while. Seem there is not suitable solution for me.

 I want user searches result only for him according to his role but I
 don't want to re-index document every time when document's ACL change.

 To my knowledge, is this possible to perform a join like database to
 achieve this? How and possible?

 Thanks

 Floyd

trouble with CollationKeyFilter

2011-11-23 Thread Michael Sokolov

I'm using CollectionKeyFilter to sort my documents using the Unicode 
root collation, and my documents do appear to be getting sorted 
correctly, but I'm getting weird results when performing range filtering 
using the sort key field.  For example:


ifp_sortkey_ls:[youth culture TO youth culture]

and

ifp_sortkey_ls:{youth culture TO youth culture}

both return 0 hits

but

ifp_sortkey_ls:youth culture

returns 1 hit

It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is 
acting as if the terms A, B are greater than all documents whose 
sortkeys start with an A-Z character, but less than a few documents that 
have greek letters as their first characters of their sortkeys.


the analysis chain for ifp_sortkey_ls is:

fieldType name=sortkey stored=false indexed=true 
class=solr.TextField positionIncrementGap=100 omitNorms=true 
omitTermFreqAndPositions=true

analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- The TrimFilter removes any leading or trailing whitespace --
filter class=solr.TrimFilterFactory /
filter class=solr.CollationKeyFilterFactory
language=
strength=primary
/
/analyzer
/fieldType

Does anyone have any idea what might be going on here?

Re: FunctionQuery score=0

2011-11-23 Thread John

Thanks Hoss,

I will give those a try and let you know.

Cheers.

On Wed, Nov 23, 2011 at 8:35 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Which answers my query needs. BUT, my boost function actually changes
 some
 : of the results to be of score 0, which I want to be excluded from the
 : result set.

 Ok .. so the crux of the issue is that your boost function results in a
 value of 0 for some documents, and you would like those documents excluded
 from your results...

eqsim(alltokens,xyz)

 eqsim is not a function thta ships with Solr (as far as i know) so i'm
 guessing it's something custom .. can you clarify what it does?

 : 2) This is why I used the frange query to solve the issue with the score
 0:
 : q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02
 title^0.08
 : categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '})
 :
 : But this time, the remaining results lost their *boosted* scores, and
 : therefore the sort by score got all mixed up.

 correct: frange produces a ConstantScoreQuery, it can only be used to
 filter documents based on wether the function it wraps falls in/out of the
 range.

 : 3) I assume I can use filter queries, but from my understanding FQs
 : actually perform another query before the main one and these queries are
 : expensive in time and I would like to avoid it if possible.

 Unless you actaully see notisable performance problems I wouldn't assume
 it will be an issue -- test first, get it working, then optimize if it's
 too slow.  For most people the overhead of the fq won't a factor.

 One option you might consider is the cache=false local param which tells
 Solr not to cache the fq (handy if you know the query you are
 filtering on is not going to be reused much) and since it's not being
 cached, Solr will execute it in parallel with the main query and ignore
 anything that it already knows isn't going to matter in the final query.

 In your case however, you can already optimize the fq solution a bit
 because what you really need to filter out isn't documents matching your
 main query with a score less then zero; that set is the same as the set of
 documents for whom your eqsim function returns 0, so you can just use
 *that* in your fq.  Something like this should work...

q={!edismax ... boost=$eqsim}
fq={!frange l=0 incl=false v=$eqsim}
eqsim=eqsim(alltokens,xyz)

 ...but there may still be ways to clean that up and make it faster
 depending on what exactly your eqsim function does (ie: there may be a
 simple query that can be faster then that frange to identify the docs
 that get non-zero values from that function.

 -Hoss

WordDelimiterFilter MultiPhraseQuery case insesitive Issue

2011-11-23 Thread Uomesh

Hi,

case insesitive search is not working if I use WordDelimiterFilter
splitOnCaseChange=1

I am searching for word norton and here is result

norton: returns result
Norton: returns result
but 
nOrton: no results

I want nOrton should results. Please help. below is my field type.

fieldType autoGeneratePhraseQueries=true class=solr.TextField
name=text positionIncrementGap=100
analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory /


filter class=solr.StopFilterFactory
enablePositionIncrements=true 
ignoreCase=true words=stopwords.txt
/
filter catenateAll=0 catenateNumbers=1 
catenateWords=1
class=solr.WordDelimiterFilterFactory 
generateNumberParts=1
generateWordParts=1 
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt /
filter class=solr.PorterStemFilterFactory /
/analyzer
analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory
enablePositionIncrements=true 
ignoreCase=true words=stopwords.txt
/
filter catenateAll=0 catenateNumbers=0 
catenateWords=0
class=solr.WordDelimiterFilterFactory 
generateNumberParts=1
generateWordParts=1 
splitOnCaseChange=1 /
filter class=solr.SynonymFilterFactory 
expand=true
ignoreCase=true 
synonyms=synonyms.txt /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt /
filter class=solr.PorterStemFilterFactory /
/analyzer
/fieldType


--
View this message in context: 
http://lucene.472066.n3.nabble.com/WordDelimiterFilter-MultiPhraseQuery-case-insesitive-Issue-tp3532209p3532209.html
Sent from the Solr - User mailing list archive at Nabble.com.

Synonyms 1 fetching 2001, how to avoid

2011-11-23 Thread RaviWhy

Hi,

I am searching on movie titles. with synonyms text file mapped to 1,one.

With this, when I am searching for '1'  I am expecting '1 in kind' but I am
getting results which have titles like 2001: My year . 

I am using query time analyser with 

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /

I am going to try with expand=false. But anything else I need to look at?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-1-fetching-2001-how-to-avoid-tp3532398p3532398.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: WordDelimiterFilter MultiPhraseQuery case insesitive Issue


On 11/23/2011 2:54 PM, Uomesh wrote:

Hi,

case insesitive search is not working if I use WordDelimiterFilter
splitOnCaseChange=1

I am searching for word norton and here is result

norton: returns result
Norton: returns result
but
nOrton: no results

I want nOrton should results. Please help. below is my field type.


Try adding preserveOriginal=1 to your WDF options.  You may not need 
to actually reindex before you see results, but it would be a good idea 
to reindex.  This will result in an increase in your index size.


Thanks,
Shawn

Re: Dismax, pf and qf

2011-11-23 Thread Chris Hostetter


: Now there are some scenario when I want just the pf active (without
: qf). Othen then surrounding my query with double quotes, is there
: another way to do that? I mean, i would like to do the following
: 
: _query:{!dismax pf=author^100}vincent kwner

...nope ... the pf is just a boosting factor to improve scores, there's no 
way to force a match in the pf fields.

wrapping the input in quotes and using qf is the only way I know of to get 
what you are describing.


-Hoss

Re: trouble with CollationKeyFilter

2011-11-23 Thread Robert Muir

hi,

locale sensitive range queries don't work with these filters, only sort,
although erick erickson has a patch that will enable this (the lowercasing
wildcards patch, then you could add this filter to your multiterm chain).

separately locale range queries and sort both work easily on trunk (with
binary terms)... just use collationfield or icucollationfield if you are
able to use trunk...

otherwise for 3.x I think that patch is pretty close any day now, so we can
add an example for localized range queries that makes use of it.

On Nov 23, 2011 4:39 PM, Michael Sokolov soko...@ifactory.com wrote:

 I'm using CollectionKeyFilter to sort my documents using the Unicode root
collation, and my documents do appear to be getting sorted correctly, but
I'm getting weird results when performing range filtering using the sort
key field.  For example:

 ifp_sortkey_ls:[youth culture TO youth culture]

 and

 ifp_sortkey_ls:{youth culture TO youth culture}

 both return 0 hits

 but

 ifp_sortkey_ls:youth culture

 returns 1 hit

 It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is
acting as if the terms A, B are greater than all documents whose sortkeys
start with an A-Z character, but less than a few documents that have greek
letters as their first characters of their sortkeys.

 the analysis chain for ifp_sortkey_ls is:

 fieldType name=sortkey stored=false indexed=true
class=solr.TextField positionIncrementGap=100 omitNorms=true
omitTermFreqAndPositions=true
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 !-- The TrimFilter removes any leading or trailing whitespace --
 filter class=solr.TrimFilterFactory /
 filter class=solr.CollationKeyFilterFactory
language=
strength=primary
/
 /analyzer
 /fieldType

 Does anyone have any idea what might be going on here?

Re: Autocomplete(terms) performance problem

2011-11-23 Thread solr-ra

I have now enabled the infix search. So you will be able to do both edge as
well as infix search. Type francisco peak in the edge field, and in the
below infix input field, try, cisco peak, both will get you to the same
selections.

Please give it a try now:

http://solr-ra.tgels.org/solr-ra-autocomplete.jsp



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3532656.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Separate ACL and document index

2011-11-23 Thread Floyd Wu

Thank you for your sharing, My current solution is similar to 2).
But my problem is ACL is early-binding (which means I build index and
embedded ACL with document index) I don't want to rebuild full index(a
lucene/solr Document with PDF content and ACL) when front end change
only permission settings.

Seems solution 2)  have same problem.

Floyd


2011/11/24 Robert Stewart bstewart...@gmail.com:
 I have used two different ways:

 1) Store mapping from users to documents in some external database
 such as MySQL.  At search time, lookup mapping for user to some unique
 doc ID or some group ID, and then build query or doc set which you can
 cache in SOLR process for some period.  Then use that as a filter in
 your search.  This is more involved approach but better if you have
 lots of ACLs per user, but it is non-trivial to implement it well.  I
 used this in a system with over 100 million docs, and approx. 20,000
 ACLs per user.  The ACL mapped user to a set of group IDs, and each
 group could have 10,000+ documents.

 2) Generate a query filter that you pass to SOLR as part of the
 search.  Potentially it could be a pretty large query if user has
 granular ACL over may documents or groups.  I've seen it work ok with
 up to 1000 or so ACLs per user query.  So you build that filter query
 from the client using some external database to lookup user ACLs
 before sending request to SOLR.

 Bob


 On Tue, Nov 22, 2011 at 10:48 PM, Floyd Wu floyd...@gmail.com wrote:
 Hi there,

 Is it possible to separate ACL index and document index and achieve to
 search by user role in SOLR?

 Currently my implementation is to index ACL with document, but the
 document itself change frequently. I have to perform rebuild index
 every time when ACL change. It's heavy for whole system due to
 document are so many and content are huge.

 Do you guys have any solution to solve this problem. I've been read
 mailing list for a while. Seem there is not suitable solution for me.

 I want user searches result only for him according to his role but I
 don't want to re-index document every time when document's ACL change.

 To my knowledge, is this possible to perform a join like database to
 achieve this? How and possible?

 Thanks

 Floyd

Re: Solr real time update

2011-11-23 Thread yu shen

Thanks for the information. I will play with it.

Spark

2011/11/23 Nagendra Nagarajayya nnagaraja...@transaxtions.com

Spark:

Solr with RankingAlgorithm is not a plugin but a change of search library
from Lucene to RankingAlgorithm. Here is more info on the changes you will
need to make to your solrconfig.xml:

http://solr-ra.tgels.org/wiki/**en/Near_Real_Time_Searchhttp://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search

Regards,

- Nagendra Nagrajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.**org/ http://rankingalgorithm.tgels.org/

On 11/22/2011 5:40 PM, yu shen wrote:

Hi Nagarajayya,

Thanks for your information. Do I need to change any configuration of my
current solr server to integrate your plugin?

Spark

2011/11/22 Nagendra
Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com

Yu:

To get Near Real Time update in Solr 1.4.1 you will need to use Solr
1.4.1
with RankingAlgorithm. This allows you to update documents in near real
time. You can download and give this a try from here:

http://solr-ra.tgels.org/

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org/
http://rankingalgorithm.tgels.org/http://**
rankingalgorithm.tgels.org/ http://rankingalgorithm.tgels.org/

On 11/21/2011 9:47 PM, yu shen wrote:

Hi All,

After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
req.add(solrDocs);
req.setCommitWithin(5000);
req.setParam(commitWithin, 5000);
req.setAction(AbstractUpdateRequest.ACTION.COMMIT,
true,

true);
req.process(SOLR_SERVER);

2011/11/22 yu shenshenyu...@gmail.com

Hi All,

I try to do a 'nearly real time update' to solr. My solr version is
1.4.1. I read this solr CommentWithinhttp://wiki.**
apache.org/solr/CommitWithinh**ttp://wiki.apache.org/solr/**
CommitWithin http://wiki.apache.org/solr/CommitWithin
**wiki, and a related
threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-http://n3.nabble.com/Solr-real-time-**
update-taking-time-td3472709.htmlhttp://lucene.472066.n3.**
nabble.com/Solr-real-time-**update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.html
mostly

on the difficulty to do this.

My issue is I tried the code snippet in the wiki:

UpdateRequest req = new UpdateRequest();
req.add(mySolrInputDocument);
req.setCommitWithin(1);
req.process(server);

But my index did not get updated, unless I call SOLR_SERVER.commit();
explicitly. The latter call will take more than 1 minute on average to
return.

Can I do a real time update on solr 1.4.1? Would someone help to show a
workable code snippet?

Spark

Re: trouble with CollationKeyFilter

2011-11-23 Thread Michael Sokolov


Thanks for confirming that, and laying out the options, Robert.

-Mike

On 11/23/2011 9:03 PM, Robert Muir wrote:

hi,

locale sensitive range queries don't work with these filters, only sort,
although erick erickson has a patch that will enable this (the lowercasing
wildcards patch, then you could add this filter to your multiterm chain).

separately locale range queries and sort both work easily on trunk (with
binary terms)... just use collationfield or icucollationfield if you are
able to use trunk...

otherwise for 3.x I think that patch is pretty close any day now, so we can
add an example for localized range queries that makes use of it.

On Nov 23, 2011 4:39 PM, Michael Sokolovsoko...@ifactory.com  wrote:

I'm using CollectionKeyFilter to sort my documents using the Unicode root

collation, and my documents do appear to be getting sorted correctly, but
I'm getting weird results when performing range filtering using the sort
key field.  For example:

ifp_sortkey_ls:[youth culture TO youth culture]

and

ifp_sortkey_ls:{youth culture TO youth culture}

both return 0 hits

but

ifp_sortkey_ls:youth culture

returns 1 hit

It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is

acting as if the terms A, B are greater than all documents whose sortkeys
start with an A-Z character, but less than a few documents that have greek
letters as their first characters of their sortkeys.

the analysis chain for ifp_sortkey_ls is:

fieldType name=sortkey stored=false indexed=true

class=solr.TextField positionIncrementGap=100 omitNorms=true
omitTermFreqAndPositions=true

analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- The TrimFilter removes any leading or trailing whitespace --
filter class=solr.TrimFilterFactory /
filter class=solr.CollationKeyFilterFactory
language=
strength=primary
/
/analyzer
/fieldType

Does anyone have any idea what might be going on here?

Re: need a way so that solr return result for misspelled terms

We are using solr query parser... just need some schema and / or solrconfig
configuration to do the misspell search and find results. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3532979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need a way so that solr return result for misspelled terms