Does ContentStreamDataSource support delta import?

2012-06-29 Thread jueljust
ContentStreamDataSource works fine with full-import command
but i can't make it work with delta-import command, i have to use
full-import and no clean instead
Does ContentStreamDataSource support delta import?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-ContentStreamDataSource-support-delta-import-tp3992008.html
Sent from the Solr - User mailing list archive at Nabble.com.


NGram and full word

2012-06-29 Thread Arkadi Colson

Hi

I have a question regarding the NGram filter and full word search.

When I insert arkadicolson into Solr and search for arkadic, solr 
will find a match.
When searching for arkadicols, Solr will not find a match because the 
maxGramSize is set to 8.
However when searching for the full word arkadicolson Solr will also 
not match.


Is there a way to also match full word in combination with NGram?

Thanks!

fieldType name=text class=solr.TextField 
positionIncrementGap=100

  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=Dutch /
filter class=solr.NGramFilterFactory minGramSize=3 
maxGramSize=8/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=Dutch /

  /analyzer
/fieldType

--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be



leaks in solr

2012-06-29 Thread Bernd Fehling
Hi list,

while monitoring my solr 3.6.1 installation I recognized an increase of memory 
usage
in OldGen JVM heap on my slave. I decided to force Full GC from jvisualvm and
send optimize to the already optimized slave index. Normally this helps because
I have monitored this issue over the past. But not this time. The Full GC
didn't free any memory. So I decided to take a heap dump and see what 
MemoryAnalyzer
is showing. The heap dump is about 23 GB in size.

1.)
Report Top consumers - Biggest Objects:
Total: 12.3 GB
org.apache.lucene.search.FieldCacheImpl : 8.1 GB
class java.lang.ref.Finalizer   : 2.1 GB
org.apache.solr.util.ConcurrentLRUCache : 1.5 GB
org.apache.lucene.index.ReadOnlySegmentReader : 622.5 MB
...

As you can see, Finalizer has already reached 2.1 GB!!!

* java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x37b056fd0
  * segments java.util.concurrent.ConcurrentHashMap @ 0x39b02d268
* map org.apache.solr.util.ConcurrentLRUCache @ 0x398f33c30
  * referent java.lang.ref.Finalizer @ 0x37affa810
* next java.lang.ref.Finalizer @ 0x37affa838
...

Seams to be org.apache.solr.util.ConcurrentLRUCache
The attributes are:

Type   |Name  | Value
-
boolean| isDestroyed  |  true
-
ref| cleanupThread|  null

ref| evictionListener |  null
---
long   | oldestEntry  | 0
--
int| acceptableWaterMark |  9500
--
ref| stats| org.apache.solr.util.ConcurrentLRUCache$Stats @ 
0x37b074dc8

boolean| islive   |  true
-
boolean| newThreadForCleanup | false

boolean| isCleaning   | false

ref| markAndSweepLock | java.util.concurrent.locks.ReentrantLock @ 
0x39bf63978
-
int| lowerWaterMark   |  9000
-
int| upperWaterMark   | 1
-
ref|  map | java.util.concurrent.ConcurrentHashMap @ 0x39b02d268
--




2.)
While searching for open files and their references I noticed that there are 
references to
index files which are already deleted from disk.
E.g. recent index files are data/index/_2iqw.frq and data/index/_2iqx.frq.
But I also see references to data/index/_2hid.frq which are quite old and are 
deleted way back
from earlier replications.
I have to analyze this a bit deeper.


So far my report, I go on analyzing this huge heap dump.
If you need any other info or even the heap dump, let me know.


Regards
Bernd



Re: Strange behaviour with default request handler

2012-06-29 Thread Ahmet Arslan
 And when i search for soph, i only get Sophie in the
 results and not Sophia.

Do you want your query q=soph to return both Sophie and Sophia?
If that's the case then you can use wildcard queries. q=soph*

Also you didn't provide field definition type=text. It seems that you have 
stemming filter in your analysis chain.

You can inspect how tokens Sophie and Sophia are indexed using 
solr/admin/analysis.jsp page.




Re: what is precisionStep and positionIncrementGap:

2012-06-29 Thread Erick Erickson
For PrecisionStep, see:
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/NumericRangeQuery.html?is-external=true

positionIncrementgap is for multiValued text fields, it is the space
put between the last token of one entry and the first of the next.
e.g.
field name=mvsome stuff/field
field name=mvmore things/field

Assume the two were in a single document you added and assume the
increment gap were 100. The token positions would be 0, 1, 101 and
102. so the phrase stuff more wouldn't match.


Best
Erick

On Tue, Jun 26, 2012 at 1:47 AM, ZHANG Liang F
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like this: 
 fieldType name=int class=solr.TrieIntField precisionStep=0 
 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could you 
 please elaborate more on these 2?

 Thanks!


Re: Query Logic Question

2012-06-29 Thread Erick Erickson
I think you're assuming that this is Boolean logic. It's not, see:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

Best
Erick

On Thu, Jun 28, 2012 at 9:27 AM, Rublex ruble...@hotmail.com wrote:
 Jack,

 Thank you the *:* solutions seems to work.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689p3991881.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it compulsory to define a tokenizer when defining field types in solr

2012-06-29 Thread Erick Erickson
Yes, it's mandatory to define at least one tokenizer (and only one
tokenizer). If
you need the whole input treated as one token, you can use
KeywordTokenizerFactory.

Best
Erick

On Thu, Jun 28, 2012 at 11:10 AM, Kissue Kissue kissue...@gmail.com wrote:
 Hi,

 When defining a fieldtype is it compulsory to include a tokenizer in its
 definition?

 I have a field defined as follows without tokenizer:

 fieldType name=lowercase_pattern class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        filter class=solr.LowerCaseFilterFactory /
      /analyzer
      analyzer type=query
        filter class=solr.LowerCaseFilterFactory /
      /analyzer
    /fieldType

 Using this field when i try to start up Solr it says the field is not
 recognised. But when i change it to the following with tokenizer included
 it works:

 fieldType name=lowercase_pattern class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.KeywordTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory /
      /analyzer
      analyzer type=query
        tokenizer class=solr.KeywordTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory /
      /analyzer
    /fieldType

 Thanks.


Wildcard searches with leading and ending wildcard

2012-06-29 Thread maurizio1976
Hi all,
I've been searching for an answer to this everywhere but I can never find an
answer that is perfect for my case, so I'll ask this myself.

I'm on Solr 3.6.
I'm using I use the *ReversedWildcardFilterFactory* in a field containing a
telephone number.
So only one word to be indexed, no phrases no strange tokens.
To be more exact: filter class=solr.ReversedWildcardFilterFactory
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/

I can check with Luke that two words are being indexed, one the reverse of
the other. Perfect.

I can run a query like this:*/ Num:*1234/* that will match docs starting
with 1234
and I can run a query like this:* /Num:1234*/* that will match docs ending
with 1234

but this is the question that everybody seems to be asking. 
Can I run in any way a query that will match records that contains the
value 1234?

If I write this: Num:*1234* this will match docs containing 1234 but also
docs containing 4321 which is wrong. this means this query: /Num*4321*/ and
this query: /Num:*1234*/ return exactly the same result.

Is this the wrong approach? has anybody tried the N-gram solution to this
problem?

thanks very much
Maurizio


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replication Issue

2012-06-29 Thread Michael Della Bitta
Hi, I'm having trouble with replication on a brand new rollout of 3.6.

Basically I've traced it to the slave always thinking the index it
creates when it warms up is newer than what's on the master, no matter
what I do... deleting the slave's index, committing or optimizing on
the master, etc. I can see the replication request come in on the
master, but nothing happens, presumably because of the Index Version
discrepancy.

The clocks of the two machines are within 3 seconds of one another,
but I don't know if that's significant.

Actually, I'm having trouble figuring out how Index Version is
calculated at all, and before I dive into the source, I thought I'd
ask here. My slave is saying Index Version 1340979968338, Generation
1, and my master says Index Version 1340052708476, Generation 83549.

Anybody have any ideas?

Thanks,

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


Re: Strange spikes in query response times...any ideas where else to look?

2012-06-29 Thread solr

Otis,

Thanks for the response. We'll check out that tool and see how it goes.

Regarding JMeter...you are exactly correct in that I was assuming 1  
thread = 1 query per second. I thought we had set up some sort of  
throttling mechanism to ensure that...and clearly I was mistaken. By  
the math we are getting A LOT more qps...and in a preliminary look  
those spikes look like they just might be correlated to high qps. We  
are pursuing this line and my gut tells me this *is* the problem.


Thanks for the info on the tool (which we will look at) and for the  
heads-up on the qps.



Peter Lee
ProQuest

Quoting Otis Gospodnetic otis_gospodne...@yahoo.com:


Peter,

These could be JVM, or it could be index reopening and warmup  
queries, or  
Grab SPM for Solr - http://sematext.com/spm - in 24-48h we'll  
release an agent that tracks and graphs errors and timings of each  
Solr search component, which may reveal interesting stuff.  In the  
mean time, look at the graph with IO as well as graph with caches.  
 That's where I'd first look for signs.


Re users/threads question - if I understand correctly, this is the  
problem:  JMeter is set up to run 15 threads from a single test  
machine...but I noticed that the JMeter report is showing close to  
47 queries per second.  It sounds like you re equating # of threads  
to QPS, which isn't right.  Imagine you had 10 threads and each  
query took 0.1 seconds (processed by a single CPU core) and the  
server had 10 CPU cores.  That would mean that your 1 thread could  
run 10 queries per second utilizing just 1 CPU core. And 10 threads  
would utilize all 10 CPU cores and would give you 10x higher  
throughput - 10x10=100 QPS.


So if you need to simulate just 2-5 QPS, just lower the number of  
threads.  What that number should be depends on query complexity and  
hw resources (cores or IO).


Otis

Performance Monitoring for Solr / ElasticSearch / HBase -  
http://sematext.com/spm 






From: s...@isshomefront.com s...@isshomefront.com
To: solr-user@lucene.apache.org
Sent: Thursday, June 28, 2012 9:20 PM
Subject: RE: Strange spikes in query response times...any ideas  
where else to look?


Michael,

Thank you for responding...and for the excellent questions.

1) We have never seen this response time spike with a  
user-interactive search. However, in the span of about 40 minutes,  
which included about 82,000 queries, we only saw a handful of  
near-equally distributed spikes. We have tried sending queries  
from the admin tool while the test was running, but given those  
odds, I'm not surprised we've never hit on one of those few  
spikes we are seeing in the test results.


2) Good point and I should have mentioned this. We are using  
multiple methods to track these response times.
  a) Looking at the catalina.out file and plotting the response  
times recorded there (I think this is logging the QTime as seen by  
Solr).
  b) Looking at what JMeter is reporting as response times. In  
general, these are very close if not identical to what is being  
seen in the Catalina.out file. I have not run a line-by-line  
comparison, but putting the query response graphs next to each  
other shows them to be nearly (or possibly exactly) the same.  
Nothing looked out of the ordinary.


3) We are using multiple threads. Before your email I was looking  
at the results, doing some math, and double checking the reports  
from JMeter. I did notice that our throughput is much higher than  
we meant for it to be. JMeter is set up to run 15 threads from a  
single test machine...but I noticed that the JMeter report is  
showing close to 47 queries per second. We are only targeting TWO  
to FIVE queries per second. This is up next on our list of things  
to look at and how to control more effectively. We do have three  
separate machines set up for JMeter testing and we are  
investigating to see if perhaps all three of these machines are  
inadvertently being launched during the test at one time and  
overwhelming the server. This *might* be one facet of the problem.  
Agreed on that.


Even as we investigate this last item regarding the number of  
users/threads, I wouldn't mind any other thoughts you or anyone  
else had to offer. We are checking on this user/threads issue and  
for the sake of anyone else you finds this discussion useful I'll  
note what we find.


Thanks again.

Peter S. Lee
ProQuest

Quoting Michael Ryan mr...@moreover.com:


A few questions...

1) Do you only see these spikes when running JMeter? I.e., do you  
ever see a spike when you manually run a query?


2) How are you measuring the response time? In my experience there  
are three different ways to measure query speed. Usually all of  
them will be approximately equal, but in some situations they can  
be quite different, and this difference can be a clue as to where  
the bottleneck is:

   1) The response time as seen by the end user (in this case, 

Re: How do we use HTMLStripCharFilterFactory

2012-06-29 Thread derohit
thnks @Kiran...will do things u have suggested and hope it works...thnks
again..

Rgds
Rohit

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955p3992104.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Trying to avoid filtering on score, as I'm told that's bad

2012-06-29 Thread mcb
Thanks, this worked using:

qq={!func}sub(sum(geodist(pt1,30.271567,-97.741886),geodist(pt2,36.054889,-95.716187),product(1.609344,
Dist)), 1000) asc

sort=$qq

fq={!frange u=100}$qq


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trying-to-avoid-filtering-on-score-as-I-m-told-that-s-bad-tp3991696p3992106.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using custom user-defined caches to store user app data while indexing

2012-06-29 Thread Iana Atanassova
Hi,

I'm trying to implement a custom UpdateRequestProcessorFactory class that
works with the XSLT Request handler for indexing.
My UpdateRequestProcessorFactory has to examine some of the document fields
and compare them against some regular expressions that are stored in an
external MySQL database.
Currently, my UpdateRequestProcessorFactory works by establishing a
connection to the database and them retrieving the regular expressions for
every new document that needs to be indexed.

However, I would like to speed up this processing and store the regular
expressions in memory. I tried to define a new user cache in solrconfig.xml
(http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches). As far
as I understand, these caches can be used to store any user application
data. But when I implement the UpdateRequestProcessorFactory, I do not
arrive to access this cache.

What would be the method to read/write into a user defined sorl cache while
indexing? How can I access the current SolrIndexSearcher from my code? Are
there any other solutions that I should look at?

Thanks!

Iana


Why won't dismax create multiple DisjunctionMaxQueries when autoGeneratePhraseQueries is false?

2012-06-29 Thread Joel Rosen
Hi, I am trying to configure Solr for Chinese search and I've been having
trouble getting the dismax query parser to behave correctly.

In schema.xml, I'm using SmartChineseAnalyzer on my fulltext field with
autoGeneratePhraseQueries=false.  I've verified that it is correctly
tokenizing Chinese words, and the query parser is in fact not generating
phrase queries.  But I can't figure out why dismax is only producing a
single DisjunctionMaxQuery object for multiple Chinese terms, thereby
producing an OR effect, which is not what I want.

Here's an example of the parsed query debug output that I get for a
multiple term English query:

str name=rawquerystringmy friend/str
str name=querystringmy friend/str
str name=parsedquery
+((DisjunctionMaxQuery((t_field_keywords:unified_fulltext:my)~0.01)
DisjunctionMaxQuery((t_field_keywords:unified_fulltext:friend)~0.01))~2)
/str
str name=parsedquery_toString
+(((t_field_keywords:unified_fulltext:my)~0.01
(t_field_keywords:unified_fulltext:friend)~0.01)~2)
/str

This is exactly what I want to happen for Chinese queries.  But for a
Chinese query, you can see that I only get a single DisjunctionMaxQuery
object:

str name=rawquerystring我的朋友/str
str name=querystring我的朋友/str
str name=parsedquery
+DisjunctionMaxQuery(((t_field_keywords:unified_fulltext:我
t_field_keywords:unified_fulltext:的
t_field_keywords:unified_fulltext:朋友))~0.01)
/str
str name=parsedquery_toString
+((t_field_keywords:unified_fulltext:我 t_field_keywords:unified_fulltext:的
t_field_keywords:unified_fulltext:朋友))~0.01
/str

The result of this is that an increase in the number of terms increases the
number of results, instead of narrowing them as it should.

I feel like this is so close to working... does anybody know what I need to
do to get the query parser to behave correctly?  Any help would be much
appreciated!

Joel


Solr - query

2012-06-29 Thread gopes
HI ,

I am searching a string using wildcard and I would like to change my query
from
 http://localhost:/solr/addrinst/select?q=1234+BAYstart=0rows=10
to 
 http://localhost:/solr/addrinst/select?q=1234 BAYstart=0rows=10

my request hanlder is requestHandler class=solr.SearchHandler
default=true name=auto
lst name=defaults
str name=defTypeedismax/str
   str name=echoParamsall/str
   int name=rows10/int

str name=qfid name Street_Addr/str
   str name=flid,name,Street_Addr/str
/lst
/requestHandler


Can some one give me a clue where I am going wrong

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-tp3992117.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it compulsory to define a tokenizer when defining field types in solr

2012-06-29 Thread Kissue Kissue
Thanks Erick for the clarification.

Cheers!

On Fri, Jun 29, 2012 at 2:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 Yes, it's mandatory to define at least one tokenizer (and only one
 tokenizer). If
 you need the whole input treated as one token, you can use
 KeywordTokenizerFactory.

 Best
 Erick

 On Thu, Jun 28, 2012 at 11:10 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  When defining a fieldtype is it compulsory to include a tokenizer in its
  definition?
 
  I have a field defined as follows without tokenizer:
 
  fieldType name=lowercase_pattern class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
   analyzer type=query
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 
  Using this field when i try to start up Solr it says the field is not
  recognised. But when i change it to the following with tokenizer included
  it works:
 
  fieldType name=lowercase_pattern class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 
  Thanks.



Searching against stored wild cards

2012-06-29 Thread Kissue Kissue
Hi,

I Want to know if it is in any way possible for me to do this Solr:

1. Store this field in Solr index - AB-CD-EF-*
2. Do a search for AB-CD-EF-GH and return back AB-CD-EF-*

Thanks.


Re: Replication Issue

2012-06-29 Thread Erick Erickson
Clocks on the separate machines are irrelevant, so don't worry about that bit.

The index version _starts out_ as a timestamp as I understand it, but
from there on when
you change the index and commit it should just bump up NOT get a new timestamp.

1 it's strange that the version on the master when you committed. _Unless_ you
didn't actually change the index. A commit doesn't do anything at all
without some
underlying change to the index, not even bump the index version I
don't think. But you should
be seeing the very last digits change on commit _if_ there have been
underlying changes.

2 It looks like you somehow changed the index on the slave at some
point. Did you
update the index there sometime independent of the master? Even though when
you fire up the slave for the first time, it gets a default timestamp
of right now, it's
changed to the version that corresponds to the master on the first replication.

3 Blowing away the index on the slave should have worked _if_ you removed the
index directory. Just issuing a delete on *:* wouldn't do much. when I
want to be
absolutely, completely sure I've gotten rid of an index, I shut down
the sever and
rm -rf solr_home/data/index (you can also just rmdir -rf
solr_home/data). It's
important that you remove the _directory_, not just the contents of
solr_home/data/index.

Bottom line:

I suspect something else happened in the mean-time that changed the underlying
slave timestamp that got you into this situation, perhaps you directly
updated the
slave index sometime?

Best
Erick

On Fri, Jun 29, 2012 at 12:54 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Nevermind, I realized that my master index was not tickling the index
 version number when a commit or optimize happened. I gave in and nuke
 and paved it, and now it seems fine.

 Is there any known reason why this would happen, so I can avoid this
 in the future?

 Thanks,


 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


 On Fri, Jun 29, 2012 at 10:42 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 Hi, I'm having trouble with replication on a brand new rollout of 3.6.

 Basically I've traced it to the slave always thinking the index it
 creates when it warms up is newer than what's on the master, no matter
 what I do... deleting the slave's index, committing or optimizing on
 the master, etc. I can see the replication request come in on the
 master, but nothing happens, presumably because of the Index Version
 discrepancy.

 The clocks of the two machines are within 3 seconds of one another,
 but I don't know if that's significant.

 Actually, I'm having trouble figuring out how Index Version is
 calculated at all, and before I dive into the source, I thought I'd
 ask here. My slave is saying Index Version 1340979968338, Generation
 1, and my master says Index Version 1340052708476, Generation 83549.

 Anybody have any ideas?

 Thanks,

 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


Re: Searching against stored wild cards

2012-06-29 Thread Upayavira
Skip the asterisk and analyse you search terms as an ngram, maybe an
edge-ngram, and then it'll match.

You'd be querying for:

A
AB
AB-
AB-C
AB-CD
AB-CD-
etc...

Any of those terms would match your terms.

Upayavira

On Fri, Jun 29, 2012, at 06:35 PM, Kissue Kissue wrote:
 Hi,
 
 I Want to know if it is in any way possible for me to do this Solr:
 
 1. Store this field in Solr index - AB-CD-EF-*
 2. Do a search for AB-CD-EF-GH and return back AB-CD-EF-*
 
 Thanks.


Re: Replication Issue

2012-06-29 Thread Michael Della Bitta
Ugh, after a mess of additional flailing around, it appears I just
discovered that the Replicate Now form on the Replication Admin page
does not work in the text-based browser 'links'. :(

Running /replication?command=fetchindex with curl did the trick. Now
everything is synced up.

Thanks for your reply, Erick!

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Fri, Jun 29, 2012 at 1:51 PM, Erick Erickson erickerick...@gmail.com wrote:
 Clocks on the separate machines are irrelevant, so don't worry about that bit.

 The index version _starts out_ as a timestamp as I understand it, but
 from there on when
 you change the index and commit it should just bump up NOT get a new 
 timestamp.

 1 it's strange that the version on the master when you committed. _Unless_ 
 you
 didn't actually change the index. A commit doesn't do anything at all
 without some
 underlying change to the index, not even bump the index version I
 don't think. But you should
 be seeing the very last digits change on commit _if_ there have been
 underlying changes.

 2 It looks like you somehow changed the index on the slave at some
 point. Did you
 update the index there sometime independent of the master? Even though when
 you fire up the slave for the first time, it gets a default timestamp
 of right now, it's
 changed to the version that corresponds to the master on the first 
 replication.

 3 Blowing away the index on the slave should have worked _if_ you removed the
 index directory. Just issuing a delete on *:* wouldn't do much. when I
 want to be
 absolutely, completely sure I've gotten rid of an index, I shut down
 the sever and
 rm -rf solr_home/data/index (you can also just rmdir -rf
 solr_home/data). It's
 important that you remove the _directory_, not just the contents of
 solr_home/data/index.

 Bottom line:

 I suspect something else happened in the mean-time that changed the underlying
 slave timestamp that got you into this situation, perhaps you directly
 updated the
 slave index sometime?

 Best
 Erick

 On Fri, Jun 29, 2012 at 12:54 PM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 Nevermind, I realized that my master index was not tickling the index
 version number when a commit or optimize happened. I gave in and nuke
 and paved it, and now it seems fine.

 Is there any known reason why this would happen, so I can avoid this
 in the future?

 Thanks,


 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


 On Fri, Jun 29, 2012 at 10:42 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 Hi, I'm having trouble with replication on a brand new rollout of 3.6.

 Basically I've traced it to the slave always thinking the index it
 creates when it warms up is newer than what's on the master, no matter
 what I do... deleting the slave's index, committing or optimizing on
 the master, etc. I can see the replication request come in on the
 master, but nothing happens, presumably because of the Index Version
 discrepancy.

 The clocks of the two machines are within 3 seconds of one another,
 but I don't know if that's significant.

 Actually, I'm having trouble figuring out how Index Version is
 calculated at all, and before I dive into the source, I thought I'd
 ask here. My slave is saying Index Version 1340979968338, Generation
 1, and my master says Index Version 1340052708476, Generation 83549.

 Anybody have any ideas?

 Thanks,

 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


Re: Solr - query

2012-06-29 Thread Michael Della Bitta
I think quotes are legal in URL encoding, so you might get away with
just putting a + between 1234 and BAY or failing that, %20.

Usually it's easier if you use a Solr client-side library to make
these types of calls so URL encoding isn't your problem, but I'm not
sure if that's a route that's available to you.

Michael Della Bitta

P.S. I think I had Thai food once near 1234 Bay. :)


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Fri, Jun 29, 2012 at 1:19 PM, gopes saraladevi.ramamoor...@gmail.com wrote:
 HI ,

 I am searching a string using wildcard and I would like to change my query
 from
  http://localhost:/solr/addrinst/select?q=1234+BAYstart=0rows=10
 to
  http://localhost:/solr/addrinst/select?q=1234 BAYstart=0rows=10

 my request hanlder is requestHandler class=solr.SearchHandler
 default=true name=auto
 lst name=defaults
    str name=defTypeedismax/str
   str name=echoParamsall/str
   int name=rows10/int

    str name=qfid name Street_Addr/str
   str name=flid,name,Street_Addr/str
 /lst
 /requestHandler


 Can some one give me a clue where I am going wrong

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-query-tp3992117.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: NGram and full word

2012-06-29 Thread Klostermeyer, Michael
With the help of this list, I solved a similar issue by altering my query as 
follows:

Before (did not return full word matches): q=searchTerm*
After (returned full-word matches and wildcard searches as you would expect): 
q=searchTerm OR searchTerm*

You can also boost the exact match by doing the following: q=searchTerm^2 OR 
searchTerm*

Not sure if the NGram changes things or not, but it might be a starting point.

Mike


-Original Message-
From: Arkadi Colson [mailto:ark...@smartbit.be] 
Sent: Friday, June 29, 2012 3:17 AM
To: solr-user@lucene.apache.org
Subject: NGram and full word

Hi

I have a question regarding the NGram filter and full word search.

When I insert arkadicolson into Solr and search for arkadic, solr will find 
a match.
When searching for arkadicols, Solr will not find a match because the 
maxGramSize is set to 8.
However when searching for the full word arkadicolson Solr will also not 
match.

Is there a way to also match full word in combination with NGram?

Thanks!

 fieldType name=text class=solr.TextField 
positionIncrementGap=100
   analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory 
language=Dutch /
 filter class=solr.NGramFilterFactory minGramSize=3 
maxGramSize=8/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/--
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory 
language=Dutch /
   /analyzer
 /fieldType

--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be



Re: NGram and full word

2012-06-29 Thread Lan
The search for the full word arkadicolson exceeds 8 characters so thats why
it's not working.

The fix is to add another field that will tokenize into full words. 

The query would look like this

some_field_ngram:arkadicolson AND some_field_whole_word:arkadicolson

--
View this message in context: 
http://lucene.472066.n3.nabble.com/NGram-and-full-word-tp3992035p3992160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard searches with leading and ending wildcard

2012-06-29 Thread Jack Krupansky
I think a doubled-ended wildcard essentially defeats the whole point of the 
reverse wildcard filter, which is to improve performance by avoiding a 
leading wildcard. So, if your data is such that a leading wildcard is okay, 
just use normal wildcards to begin with.


-- Jack Krupansky

-Original Message- 
From: maurizio1976

Sent: Friday, June 29, 2012 8:21 AM
To: solr-user@lucene.apache.org
Subject: Wildcard searches with leading and ending wildcard

Hi all,
I've been searching for an answer to this everywhere but I can never find an
answer that is perfect for my case, so I'll ask this myself.

I'm on Solr 3.6.
I'm using I use the *ReversedWildcardFilterFactory* in a field containing a
telephone number.
So only one word to be indexed, no phrases no strange tokens.
To be more exact: filter class=solr.ReversedWildcardFilterFactory
withOriginal=true
  maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/

I can check with Luke that two words are being indexed, one the reverse of
the other. Perfect.

I can run a query like this:*/ Num:*1234/* that will match docs starting
with 1234
and I can run a query like this:* /Num:1234*/* that will match docs ending
with 1234

but this is the question that everybody seems to be asking.
Can I run in any way a query that will match records that contains the
value 1234?

If I write this: Num:*1234* this will match docs containing 1234 but also
docs containing 4321 which is wrong. this means this query: /Num*4321*/ and
this query: /Num:*1234*/ return exactly the same result.

Is this the wrong approach? has anybody tried the N-gram solution to this
problem?

thanks very much
Maurizio


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-searches-with-leading-and-ending-wildcard-tp3992086.html
Sent from the Solr - User mailing list archive at Nabble.com.