Re: counter field

2012-04-06 Thread Manish Bafna
Yes, before indexing, we go and check whether that document is already
there in index or not.
Because along with the document, we also have meta-data information which
needs to be appended.

So, we have few multivalued metadata fields, which we update if the same
document is found again.


On Fri, Apr 6, 2012 at 10:17 AM, Walter Underwood wun...@wunderwood.orgwrote:

 So you will need to do a search for each document before adding it to the
 index, in case it is already there. That will be slow.

 And where do you store the last-assigned number?

 And there are plenty of other problems, like reloading after a corrupted
 index (disk failure), or deleted documents which are re-added later, or
 duplicates, splitting content across shards (requires a global lock across
 all shards to index each document), ...

 Two recommendations:

 1. Having two different unique IDs is likely to cause problems, so choose
 one.

 2. If you must have two IDs, use one table in a lightweight relational
 database to store the relationships between the md5 value and the serial
 number.

 wunder

 On Apr 5, 2012, at 9:37 PM, Manish Bafna wrote:

  Actually not.
  If i am updating the existing document, i need to keep the old number
  itself.
 
  may be this way we can do it.
  If we pass the number to the field, it will take that value, if we dont
  pass it, it will do auto-increment.
  Because if we update, i will have old number and i will pass it as a
 field
  again.
 
  On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood wun...@wunderwood.org
 wrote:
 
  Why?
 
  When you reindex, is it OK if they all change?
 
  If you reindex one document, is it OK if it gets a new sequential
 number?
 
  wunder
 
  On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
 
  We already have a unique key (We use md5 value).
  We need another id (sequential numbers).
 
  On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter 
  hossman_luc...@fucit.orgwrote:
 
 
  : We need to have a document id available for every document (Per
 core).
 
  : We can pass docid as one of the parameter for fq, and it will return
  the
  : docid in the search result.
 
 
  So it sounds like you need a *unique* id, but nothing you described
  requies that it be a counter.
 
  Take a look at the UUIDField, or consider using the
  SignatureUpdateProcessor to generate a key based on a hash of all the
  field values.
 
  -Hoss
 
 
 
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: It cost some many memory with solrj 3.5 how to decrease it?

2012-04-06 Thread a sd
Study the update examination more deeply,i logged all elapsetime value of
Updateresponse,  the result list following:
It seems that it spent almost 20 ms on adding/updating one document in
general, thus, i called which spend less than 20ms on adding one docs as
normal log,and the others were abnormal logs.
i can`t get a correct suit of solr 1.4, i use solr3.2 which has same
performance as solr 1.4 during the test.
solr3.5 vs solr 3.2
solr3.5
sum of docs:31998
sum of elapsetime:1218344 ms
average: 38.0744 ms /doc
sum of normal docs:28409
sum of normal elapsetime:442258
average=15.5675 ms/doc
normal percentage:28409/31998 = 88.78%
abnormal docs: 3590

solr 3.2
sum of docs:31998
sum of elapsetime:852935 ms
average:26.6559 ms /doc
sum of normal docs:28416
sum of normal elapsetime:443045
average=15.5914 ms/doc
normal percentage:28409/31998 = 88.80%
abnormal docs: 3160


What can be analyzed from them?

B.R.

murphy



On Fri, Apr 6, 2012 at 10:28 AM, a sd liurx.cn@gma

il.com wrote:

 hi,Erick.
 thanks at first.
 I had watched the status of JVM at  runtime helped by jconsole and
 jmap.
 1,When the Xmx was not assigned, then, the Old Gen area was full whose
 size was up to 1.5Gb and whose major content are instances of String ,
 when the whole size of heap was up to the maximum ( about 2GB), the JVM run
 gc() ,which wasted the CPU time,then, the performance was degraded sharply,
 which was from 100,000 docs per minute to 10,000 docs per minute, as a
 examination, i assigned Xmx=1024m purposely, the amount was down to 1000
 docs per minute.
 2,When assigned Xmx=4096m, i found that the Old Gen was up to 2.1 GB
 and the all size of JVM was up to 3GB, but, the performance with 100,000
 docs per minute can attained.
 During all of the test above, i only adjust the setting of client, which
 connect to the identical solr server and i empty the data directory of
 solr home before every test.
 By the way, i know the client code was very ugly occupied so many heap
 too, but, i wan`t permitted to promote them before i obtain a benchrank
 using solrj 3.5 as much as which the old version did using solrj 1.4.
 B.R
 murphy


 On Fri, Apr 6, 2012 at 5:54 AM, Erick Erickson erickerick...@gmail.comwrote:

 What's memory? Really, how are you measuring it?

 If it's virtual, you don't need to worry about it. Is this
 causing you a real problem or are you just nervous about
 the difference?

 Best
 Erick

 On Wed, Apr 4, 2012 at 11:23 PM, a sd liurx...@gmail.com wrote:
  hi,all.
 I have write a program which send data to solr using the update
  request handler, when i adopted server  client library ( namely solrj )
  with version 4.0 or 3.2, jvm`s heap size was up to 1.0 G about, but
 ,when i
  transfer the all of them to solr 3.5 ( both server and client libs), the
  size of heap was top to 3.0G ! There are the same server configuration
 and
  the identical program. What`s wrong with the new version of solrj 3.5 ,
 i
  had looked the source code, there is no difference between solrj 3.2 and
  solrj 3.5 where my program may invoke. How can i do to decrease the
 memory
  cost by solrj 3.5?
Any advice will be appreciated!
   murphy





Re: Choosing tokenizer based on language of document

2012-04-06 Thread Dominique Bejean

Hi,

Yes, I agree it is not an easy issue. Index all languages with the 
appropriate char filter, tokenizer and filters for each language is not 
possible without new text type and new analyzer development.


If you plan to index up to 10 different languages, I suggest one text 
field per language or one index per language.


One field for all language can be interesting if you plan to index a lot 
of different languages in the same index. In this case, have one field 
per language (text_en, text_fr, ...) can be complicated if you want the 
user be able in one query to retrieve documents in any languages. The 
query will be complex if you have 50 different languages (text_en:... OR 
text_fr:... OR ...).


In order to achieve this you will need to developp a specific analyzer. 
This analyzer will be in charge of use correct char filter, tokenizer 
and filters for the language of the document. You will need a 
configurable analyzer in order to change specific languages setting 
(enable stemming or not, chose a specific stopwords file, ...).


I did this several years ago for solr 1.4.1. This is still working for 
solr 3.x. The default of this analyzer is that all language settings are 
hard coded (tokenizer, filters, stopwords, ..). With Solr 4.0, the 
analyzer do not work anymore. I decided to redevelop it in order to be 
able to configure all languages settings in a external configuration 
file and have nothing hardcoded.


I had to develop the analyzer but also a field type.

The main issue is in fact that the analyzer is not aware of the values 
in other fields. So it is not possible to use an other field in order to 
specify the content language. The only way I found is to start content 
with a specific char sequence : [en]... or [fr]...
The analyzer needs to know the language of the query too. So query 
criteria for the multilingual field have to include the specific char 
sequence : [en]...


If you are interested by this work, let me know.

If someone knows how to provide to the analyzer the content language a 
index time or the query language at query time in an other way I did, I 
am interested :).


Regards.

Dominique











Le 05/04/12 23:36, Erick Erickson a écrit :

This is really difficult to imagine working well. Even if you
do choose the appropriate analysis chain (and it must
be a chain here), and manage to appropriately tokenize
for each language, what happens at query time?

How do you expect to get matches on, say, Ukranian when
the tokens of the query are in Erse?

This feels like an XY problem, can you explain at a
higher level what your requirements are?

Best
Erick

On Wed, Apr 4, 2012 at 8:29 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com  wrote:

Hi,
  I have documents in different languages and I want to choose the 
tokenizer to use for a document based on the language of the document. The 
language of the document is already known and is indexed in a field. What I 
want to do is when I index the text in the document, I want to choose the 
tokenizer to use based on the value of the language field. I want to use one 
field for the text in the document (defining multiple fields for each language 
is not an option). It seems like I can define a tokenizer for a field, so I 
guess what I need to do is to write a custom tokenizer that looks at the 
language field value of the document and calls the appropriate tokenizer for 
that language (e.g. StandardTokenizer for English, CJKTokenizer for CJK 
languages etc..). From whatever I have read, it seems quite straight forward to 
write a custom tokenizer, but how would this custom tokenizer know the language 
of the document? Is there some way I can pass in this value to the tokenizer? 
Or is there some way the tokenizer will have access to other fields in the 
document?. Would be really helpful if someone can provide an answer

Thanks
Prabhu




Re: A tool for frequent re-indexing...

2012-04-06 Thread Valeriy Felberg
I've implemented something like described in
https://issues.apache.org/jira/browse/SOLR-3246. The idea is to add an
update request processor at the end of the update chain in the core
you want to copy. The processor converts the SolrInputDocument to XML
(there is some utility method for doing this) and dumps the XML into a
file which can be fed into Solr again with curl. If you have many
documents you will probably want to distribute the XML files into
different directories using some common prefix in the id field.

On Fri, Apr 6, 2012 at 5:18 AM, Ahmet Arslan iori...@yahoo.com wrote:
 I am considering writing a small tool that would read from
 one solr core
 and write to another as a means of quick re-indexing of
 data.  I have a
 large-ish set (hundreds of thousands) of documents that I've
 already parsed
 with Tika and I keep changing bits and pieces in schema and
 config to try
 new things often.  Instead of having to go through the
 process of
 re-indexing from docs (and some DBs), I thought it may be
 much more faster
 to just read from one core and write into new core with new
 schema, analysers and/or settings.

 I was wondering if anyone else has done anything similar
 already?  It would
 be handy if I can use this sort of thing to spin off another
 core write to
 it and then swap the two cores discarding the older one.

 You might find these relevant :

 https://issues.apache.org/jira/browse/SOLR-3246

 http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor




Re: Creating a query-able dictionary using Solr

2012-04-06 Thread Serdyn du Toit
Hi Joel,

Not an advanced Solr user myself - only been looking at it for a while.
 Still, maybe you are looking to use a suggester?

http://wiki.apache.org/solr/Suggester (the examples at the bottom of the
page is very helpful)

I haven't worked with Pdf documents in Solr yet but the suggester does
seem to have the behavior you're looking for (when generating the
suggestions from an index).

Kind regards,
Serdyn du Toit


On Tue, Mar 6, 2012 at 6:25 AM, Beach, Joel jtbe...@qualcomm.com wrote:

 Hi there,

 Am looking at using Solr to perform the following tasks:

 1. Push a lot of PDF documents into SOLR.
 2. Build a database of all the words encountered in those documents.
 3. Be able to query for a list of words matching a string like a*

 For example, if the collection contains the words aardvark, apple, doctor
 and zebra,
 I would expect a query of a* to return the list:

 [ aardvark, apple ]

 I have done a google around for this in Solr and found similar things
 involving
 spell-checkers, but nothing that seems exactly the same.

 Anyone, already done this or something similar in Solr willing to point me
 in the
 right direction?

 Cheers,

 Joel


Re: A little onfusion with maxPosAsterisk

2012-04-06 Thread Dmitry Kan
Let's first figure out, why reversing a token is helpful for doing leading
wildcard searches. I'll assume you refer to ReversedWildcardFilterFactory.

If you have the query *foo, using a straightforward approach, you would
need to scan through the entire dictionary of terms (which can be billions)
in your solr index and try to match the suffix foo (which can start with
any prefix) = very time consuming and non-optimal.

If we used the ReversedWildcardFilterFactory instead, it would reverse
every token in the index and store both:

koongfoo (original token)

and

oofgnook (reversed token)

Now when searching with *foo, we could also reverse it to oof* and only
scan part of the dictionary, with terms starting with letter o, and
further, if applying some binary search, we could directly jump to tokens
that have oof as their prefix. Thus we have turned ineffective suffix
search into effective prefix search.

Back to your question:

maxPosAsterisk parameter controls when should an ineffective suffix query
term should be identified and effective prefix query term generated from
it. It says, that both *foo and f*oo (as an example) should be treated as
suffix queries to be turned into prefix queries oof* and oo*f.

Hope this helps.

Dmitry

On Fri, Apr 6, 2012 at 5:11 AM, neosky neosk...@yahoo.com wrote:

 maxPosAsterisk - maximum position (1-based) of the asterisk wildcard ('*')
 that triggers the reversal of query term. Asterisk that occurs at positions
 higher than this value will not cause the reversal of query term. Defaults
 to 2, meaning that asterisks on positions 1 and 2 will cause a reversal.

 I can't understand that will cause a reversal.
 I know the Solr will keep the original token and reverse token when
 withOriginal parameter is open
 Does that means the searcher will use the reverse one to help to process
 the
 query when cause a reversal?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tp3889226p3889226.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan


Re: SolrCloud replica and leader out of Sync somehow

2012-04-06 Thread Jamie Johnson
awesome Yonik.  I'll indeed try this.  Thanks!

On Thu, Apr 5, 2012 at 10:20 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson jej2...@gmail.com wrote:
 Not sure if this got lost in the shuffle, were there any thoughts on this?

 Sorting by id could be pretty expensive (memory-wise), so I don't
 think it should be default or anything.
 We also need a way for a client to hit the same set of servers again
 anyway (to handle other possible variations like commit time).

 To handle the tiebreak stuff, you could also sort by _version_ - that
 should be unique in an index and is already used under the covers and
 hence shouldn't add any extra memory overhead.  versions increase over
 time, so _version desc should give you newer documents first.

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10




 On Wed, Mar 21, 2012 at 11:02 AM, Jamie Johnson jej2...@gmail.com wrote:
 Given that in a distributed environment the docids are not guaranteed
 to be the same across shards should the sorting use the uniqueId field
 as the tie breaker by default?

 On Tue, Mar 20, 2012 at 2:10 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Tue, Mar 20, 2012 at 2:02 PM, Jamie Johnson jej2...@gmail.com wrote:
 I'll try to dig for the JIRA.  Also I'm assuming this could happen on
 any sort, not just score correct?  Meaning if we sorted by a date
 field and there were duplicates in that date field order wouldn't be
 guaranteed for the same reasons right?

 Correct - internal docid is the tiebreaker for all sorts.

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10


Re: Is there any performance cost of using lots of OR in the solr query

2012-04-06 Thread Shawn Heisey

On 4/5/2012 3:49 PM, Erick Erickson wrote:

Of course putting more clauses in an OR query will
have a performance cost, there's more work to do

OK, being a smart-alec aside you will probably
be fine with a few hundred clauses. The question
is simply whether the performance hit is acceptable.
I'm afraid that question can't be answered in the
abstract, you'll have to test...

Since you're putting them in an fq, there's also some chance
that they'll be re-used from the cache, at least if there
are common patterns.


Roz,

I have a similar situation going on in my index.  Because employees have 
access to far more than real users, they get filter queries constructed 
that have HUGE number of clauses in them.  We have implemented a new 
field for a feature that we call search groups but it has not 
penetrated all aspects of the application yet.  Also, until we can make 
those groups use a hierarchy, which is not a trivial undertaking, we may 
be stuck with large filter queries.


These complex filters have led to a problem that you have probably not 
considered - really long filterCache autowarm times.  I have reduced the 
autoWarm value on my filterCache to FOUR, and there are still times that 
the autowarm takes up to 60 seconds.  Most of the time it is only a few 
seconds, with up to 30 seconds being relatively common.


I just thought of a new localparam feature for this situation and filed 
SOLR-.  I will talk to our developers about using the existing 
localparam that skips filterCache entirely.


Thanks,
Shawn



Re: counter field

2012-04-06 Thread Shawn Heisey

On 4/5/2012 1:53 AM, Manish Bafna wrote:

Hi,
Is it possible to define a field as Counter Column which can be
auto-incremented.


Manish,

Where does your data come from?  Can you add the autoincrement field to 
the data source?


My data comes from MySQL, where the private key is an autoincrement 
field.  MySQL is very good at autoincrement fields.


Walter, we do have two unique ID values in our system, enforced by 
MySQL, and it hasn't caused us any problems yet.  One is the 
autoincrement field just mentioned and the other is another id that is 
specific to our application.  We use the autoincrement field to identify 
deleted documents and as a position indicator for the build program to 
add new documents to Solr.  The other unique field is Solr's uniqueKey.


Thanks,
Shawn



SolrCloud Zookeeper view does not work on latest snapshot

2012-04-06 Thread Jamie Johnson
I just downloaded the latest snapshot and fired it up to take a look
around and I'm getting the following error when looking at the Cloud
view.

Loading of undefined failed with HTTP-Status 404

The request I see going out is as follows

http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json

this doesn't work but this does

http://localhost:8501/solr/zookeeper?wt=json

Any thoughts why this would happen?


Re: SolrCloud Zookeeper view does not work on latest snapshot

2012-04-06 Thread Jamie Johnson
I looked at our old system and indeed it used to make a call to
/solr/zookeeper not /solr/corename/zookeeper.  I am making a change
locally so I can run with this but is this a bug or did I much
something up with my configuration?

On Fri, Apr 6, 2012 at 9:33 AM, Jamie Johnson jej2...@gmail.com wrote:
 I just downloaded the latest snapshot and fired it up to take a look
 around and I'm getting the following error when looking at the Cloud
 view.

 Loading of undefined failed with HTTP-Status 404

 The request I see going out is as follows

 http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json

 this doesn't work but this does

 http://localhost:8501/solr/zookeeper?wt=json

 Any thoughts why this would happen?


Re: waitFlush and waitSearcher with SolrServer.add(docs, commitWithinMs)

2012-04-06 Thread Erick Erickson
You've got it. That's the post I was talking about, I was rushed and couldn't
find it quickly...

LucidWorks Enterprise uses a trunk version of Solr, so DWPT is in that
code in 2.0. For Solr-only, you can just check out a trunk build.

Best
Erick


On Thu, Apr 5, 2012 at 7:54 PM, Mike O'Leary tmole...@uw.edu wrote:
 First of all, what I was seeing was different from what I thought I was 
 seeing because a few weeks ago I uncommented the autoCommit block in the 
 solrconfig.xml file and I didn't realize it until yesterday just before I 
 went home, so that was controlling the commits more than the add and commit 
 calls that I was making. When I commented that block out again, the times for 
 index with add(docs, commitWithinMs) and with add(docs) and commit(false, 
 false) were very similar. Both of them were about 20 minutes faster (38 
 minutes instead of about an hour) than indexing with autoCommit set to 
 commit after every 1,000 documents or fifteen minutes.

 Is this the blog post you are talking about: 
 http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/?
  It seems to be about the right topic.

 I am using Solr 3.5. The feature matrix on one of the Lucid Imagination web 
 pages says that DocumentWriterPerThread is available in Solr 4.0 and 
 LucidWorks 2.0. I assume that means LucidWorks Enterprise. Is that right?
 Thanks,
 Mike

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, April 05, 2012 2:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: waitFlush and waitSearcher with SolrServer.add(docs, 
 commitWithinMs)

 Solr version? I suspect your outlier is due to merging segments, if so this 
 should have happened quite some time into the run. See Simon Wilnauer's blog 
 post on DocumenWriterPerThread (trunk) code.

 What commitWithin time are you using?


 Best
 Erick

 On Wed, Apr 4, 2012 at 7:50 PM, Mike O'Leary tmole...@uw.edu wrote:
 I am indexing some database contents using add(docs, commitWithinMs), and 
 those add calls are taking over 80% of the time once the database begins 
 returning results. I was wondering if setting waitSearcher to false would 
 speed this up. Many of the calls take 1 to 6 seconds, with one outlier that 
 took over 11 minutes.
 Thanks,
 Mike

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Wednesday, April 04, 2012 4:15 PM
 To: solr-user@lucene.apache.org
 Subject: Re: waitFlush and waitSearcher with SolrServer.add(docs,
 commitWithinMs)


 On Apr 4, 2012, at 6:50 PM, Mike O'Leary wrote:

 If you index a set of documents with SolrJ and use
 StreamingUpdateSolrServer.add(CollectionSolrInputDocument docs, int
 commitWithinMs), it will perform a commit within the time specified, and it 
 seems to use default values for waitFlush and waitSearcher.

 Is there a place where you can specify different values for waitFlush
 and waitSearcher, or if you want to use different values do you have
 to call StreamingUpdateSolrServer.add(CollectionSolrInputDocument
 docs) and then call StreamingUpdateSolrServer.commit(waitFlush, 
 waitSearcher) explicitly?
 Thanks,
 Mike


 waitFlush actually does nothing in recent versions of Solr. waitSearcher 
 doesn't seem so important when the commit is not done explicitly by the user 
 or a client.

 - Mark Miller
 lucidimagination.com













Re: Is there any performance cost of using lots of OR in the solr query

2012-04-06 Thread Erick Erickson
Shawn:

Ahhh, so *that* was what your JIRA was about

Consider https://issues.apache.org/jira/browse/SOLR-2429
for your ACL calculations, that's what this was developed
for.

The basic idea is that you can write a custom filter that returns
whether the document should be included in the results set that's
only called _after_ all other clauses (search and FQs) have been
satisfied.

Here's the issue. Normally, fqs are calculated across the entire
document set. That's what allows them to be cached and
re-used. But, as you've found, doing ACL calculations
for the entire document set is expensive. So this is an attempt
to make a lower-cost alternative. The downside is that it is NOT
cached, so it must be calculated anew each time. But it's only
calculated for a subset of documents.

Best
Erick

On Fri, Apr 6, 2012 at 9:00 AM, Shawn Heisey s...@elyograg.org wrote:
 On 4/5/2012 3:49 PM, Erick Erickson wrote:

 Of course putting more clauses in an OR query will
 have a performance cost, there's more work to do

 OK, being a smart-alec aside you will probably
 be fine with a few hundred clauses. The question
 is simply whether the performance hit is acceptable.
 I'm afraid that question can't be answered in the
 abstract, you'll have to test...

 Since you're putting them in an fq, there's also some chance
 that they'll be re-used from the cache, at least if there
 are common patterns.


 Roz,

 I have a similar situation going on in my index.  Because employees have
 access to far more than real users, they get filter queries constructed that
 have HUGE number of clauses in them.  We have implemented a new field for a
 feature that we call search groups but it has not penetrated all aspects
 of the application yet.  Also, until we can make those groups use a
 hierarchy, which is not a trivial undertaking, we may be stuck with large
 filter queries.

 These complex filters have led to a problem that you have probably not
 considered - really long filterCache autowarm times.  I have reduced the
 autoWarm value on my filterCache to FOUR, and there are still times that the
 autowarm takes up to 60 seconds.  Most of the time it is only a few seconds,
 with up to 30 seconds being relatively common.

 I just thought of a new localparam feature for this situation and filed
 SOLR-.  I will talk to our developers about using the existing
 localparam that skips filterCache entirely.

 Thanks,
 Shawn



RE: upgrade solr from 1.4 to 3.5 not working

2012-04-06 Thread Robert Petersen
Note that I am trying to upgrade from the Lucid Imagination distribution
of Solr 1.4, dunno if that makes a difference.  We have an existing
index of 11 million documents which I am trying to preserve in the
upgrade process.

-Original Message-
From: Robert Petersen [mailto:rober...@buy.com] 
Sent: Thursday, April 05, 2012 2:21 PM
To: solr-user@lucene.apache.org
Subject: upgrade solr from 1.4 to 3.5 not working

Hi folks, I'm a little stumped here.

 

I have an existing solr 1.4 setup which is well configured.  I want to
upgrade to the latest solr release, and after reading release notes, the
wiki, etc, I concluded the correct path would be to not change any
config items and just replace the solr.war file in tomcats webapps
folder with the new one and then start tomcat back up.

 

This worked fine, solr came up.  The problem is that on the solr info
page it still says that I am running solr 1.4 even after several
restarts and even a server reboot.  Am I missing something?  Info says
this though there is no solr 1.4 war file anywhere under tomcat root:

 

Solr Specification Version: 1.4.0.2009.12.10.10.34.34

Solr Implementation Version: 1.4 exported - sam - 2009-12-10
10:34:34

Lucene Specification Version: 2.9.1

Lucene Implementation Version: 2.9.1 exported - 2009-12-10
10:32:14

Current Time: Thu Apr 05 12:56:12 PDT 2012

Server Start Time:Thu Apr 05 12:52:25 PDT 2012

 

Any help would be appreciated.

Thanks

Robi



RE: upgrade solr from 1.4 to 3.5 not working

2012-04-06 Thread Robert Petersen
OK I found in the tomcat documentation that I not only have to drop the
war file into webapps but also have to delete the expanded version of
the war that tomcat makes.  Now tomcat doesn't find the velocity
response writer which I seem to recall seeing some note about.  I'll try
to find that again.  Thanks for the help?  Oh well...

-Original Message-
From: Robert Petersen [mailto:rober...@buy.com] 
Sent: Friday, April 06, 2012 8:27 AM
To: solr-user@lucene.apache.org
Subject: RE: upgrade solr from 1.4 to 3.5 not working

Note that I am trying to upgrade from the Lucid Imagination distribution
of Solr 1.4, dunno if that makes a difference.  We have an existing
index of 11 million documents which I am trying to preserve in the
upgrade process.

-Original Message-
From: Robert Petersen [mailto:rober...@buy.com] 
Sent: Thursday, April 05, 2012 2:21 PM
To: solr-user@lucene.apache.org
Subject: upgrade solr from 1.4 to 3.5 not working

Hi folks, I'm a little stumped here.

 

I have an existing solr 1.4 setup which is well configured.  I want to
upgrade to the latest solr release, and after reading release notes, the
wiki, etc, I concluded the correct path would be to not change any
config items and just replace the solr.war file in tomcats webapps
folder with the new one and then start tomcat back up.

 

This worked fine, solr came up.  The problem is that on the solr info
page it still says that I am running solr 1.4 even after several
restarts and even a server reboot.  Am I missing something?  Info says
this though there is no solr 1.4 war file anywhere under tomcat root:

 

Solr Specification Version: 1.4.0.2009.12.10.10.34.34

Solr Implementation Version: 1.4 exported - sam - 2009-12-10
10:34:34

Lucene Specification Version: 2.9.1

Lucene Implementation Version: 2.9.1 exported - 2009-12-10
10:32:14

Current Time: Thu Apr 05 12:56:12 PDT 2012

Server Start Time:Thu Apr 05 12:52:25 PDT 2012

 

Any help would be appreciated.

Thanks

Robi



Re: schema design question

2012-04-06 Thread Erick Erickson
I'd consider a field like associated_with_album, and a
field that identifies the kind of record this is track or album.

Then you can form a query like -associated_with_album:true
(where '-' is the Lucene or NOT).

And then group by kind to get separate groups of albums and
tracks.

Hope this helps
Erick

On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
ntucker-ml-solr-us...@august20th.com wrote:
 Apologies if this is a very straightforward schema design problem that
 should be fairly obvious, but I'm not seeing a good way to do it.
 Let's say I have an index that wants to model Albums and Tracks, and
 they all have arbitrary tags attached to them (represented by
 multivalue string type fields).  Tracks also have an album id field
 which can be used to associate them with an album.  I'd like to
 perform a query which shows both Track and Album results, but
 suppresses Tracks that are associated with Albums in the result set.

 I am tempted to use a join here, but I have reservations because it
 is my understanding that joins cannot work across shards, and I'm not
 sure it's a good idea to limit myself in that way if possible.  Any
 suggestions?  Is there a standard solution to this type of problem
 where you've got hierarchical items and you don't want children shown
 in the same result as the parent?


Re: A little onfusion with maxPosAsterisk

2012-04-06 Thread neosky
great! thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tp3889226p3890776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: schema design question

2012-04-06 Thread Neal Tucker
Thanks, but I don't want to exclude all tracks that are associated
with albums, I want to exclude tracks that are associated with albums
*which match the query* (tracks and their associated albums may have
different tags).  I don't think your suggestion covers that.

On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson erickerick...@gmail.com wrote:
 I'd consider a field like associated_with_album, and a
 field that identifies the kind of record this is track or album.

 Then you can form a query like -associated_with_album:true
 (where '-' is the Lucene or NOT).

 And then group by kind to get separate groups of albums and
 tracks.

 Hope this helps
 Erick

 On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
 ntucker-ml-solr-us...@august20th.com wrote:
 Apologies if this is a very straightforward schema design problem that
 should be fairly obvious, but I'm not seeing a good way to do it.
 Let's say I have an index that wants to model Albums and Tracks, and
 they all have arbitrary tags attached to them (represented by
 multivalue string type fields).  Tracks also have an album id field
 which can be used to associate them with an album.  I'd like to
 perform a query which shows both Track and Album results, but
 suppresses Tracks that are associated with Albums in the result set.

 I am tempted to use a join here, but I have reservations because it
 is my understanding that joins cannot work across shards, and I'm not
 sure it's a good idea to limit myself in that way if possible.  Any
 suggestions?  Is there a standard solution to this type of problem
 where you've got hierarchical items and you don't want children shown
 in the same result as the parent?


SolrEntityProcessor Configuration Problem

2012-04-06 Thread michael . kroh
Dear all,
I'm facing a problem with SolrEntityProcessor, when having it configured 
under a JDBC Datasource.
My configuration looks like this:

entity name=V_MARKET_STUDIES  datasource=jdbc-2 query=select * from 
V_MARKET_STUDIES transformer=ClobTransformer

field column=ID name=id /
field column=TYPE name=type /
field column=LOCALE name=locale /

field column=TITLE name=title /

field column=KEYWORDS name=keywords  
clob=true/
field column=TOPICS name=topics/
field column=EXTENDED_KEYWORDS 
name=extended_keywordsclob=true/
field column=PUBLICATION_DATE 
name=publication_date/
 
field column=OWNER name=owner /

field column=DL_FILE_ENTRY_ID 
name=dl_file_entry_id /
field column=DL_FILE_VERSION_ID 
name=dl_file_version_id /
field column=DL_FOLDER_ID name=dl_folder_id 
/
field column=FILE_NAME name=file_name /
field column=EXTENSION name=extension /
field column=URL_LINK name=urllink /


entity name=sep   processor=SolrEntityProcessor
 fl=content url=http://vmcenter120:8983/solr/; 
query=folderId:${V_MARKET_STUDIES.DL_FOLDER_ID} 
fq=entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}
field column=content name=content /
 
/entity
/entity

I have 6 rows in the Oracle Database, but only the first row is processed 
right, means that the 2nd Solr is queried
and the results went to the document, the remaining 5 rows where processed 
without quering the 2nd Solr and therfore
didn't have the content field filled.

Any suggestions?
Did I configured something wrong, or misunderstand something wrong?
Thanks for your help


Best regards
Michael

solr analysis-extras configuration

2012-04-06 Thread N. Tucker
Hello, I'm running into an odd problem trying to use ICUTokenizer
under a solr installation running under tomcat on ubuntu.  It seems
that all the appropriate JAR files are loaded:

INFO: Adding 'file:/usr/share/solr/lib/lucene-stempel-3.5.0.jar' to classloader
INFO: Adding 'file:/usr/share/solr/lib/lucene-smartcn-3.5.0.jar' to classloader
INFO: Adding 'file:/usr/share/solr/lib/icu4j-4_8_1_1.jar' to classloader
INFO: Adding 'file:/usr/share/solr/lib/lucene-icu-3.5.0.jar' to classloader
INFO: Adding 'file:/usr/share/solr/lib/apache-solr-analysis-extras-3.5.0.jar'
to classloader
... but later: ...
SEVERE: java.lang.NoClassDefFoundError:
org/apache/lucene/analysis/icu/segmentation/ICUTokenizer

I'm not too clear on the correct way to add the contrib bits other
than copying them into the 'lib' directory under solrhome.  They are
obviously found there (and I have verified that ICUTokenizer is in
lucene-icu-3.5.0.jar), but there's still a problem loading the
ICUTokenizer class.  Any tips on troubleshooting this?  Are there more
depenencies that I'm unaware of?


Re: SolrCloud Zookeeper view does not work on latest snapshot

2012-04-06 Thread Ryan McKinley
There have been a bunch of changes getting the zookeeper info and UI
looking good.  The info moved from being on the core to using a
servlet at the root level.

Note, it is not a request handler anymore, so the wt=XXX has no
effect.  It is always JSON

ryan


On Fri, Apr 6, 2012 at 7:01 AM, Jamie Johnson jej2...@gmail.com wrote:
 I looked at our old system and indeed it used to make a call to
 /solr/zookeeper not /solr/corename/zookeeper.  I am making a change
 locally so I can run with this but is this a bug or did I much
 something up with my configuration?

 On Fri, Apr 6, 2012 at 9:33 AM, Jamie Johnson jej2...@gmail.com wrote:
 I just downloaded the latest snapshot and fired it up to take a look
 around and I'm getting the following error when looking at the Cloud
 view.

 Loading of undefined failed with HTTP-Status 404

 The request I see going out is as follows

 http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json

 this doesn't work but this does

 http://localhost:8501/solr/zookeeper?wt=json

 Any thoughts why this would happen?


Re: Solr dismax not returning expected results

2012-04-06 Thread dboychuck
Adding autoGeneratePhraseQueries=true to my field definitions has solved
the problem

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dismax-not-returning-expected-results-tp3891346p3891594.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr analysis-extras configuration

2012-04-06 Thread N. Tucker
Further info: I can make this work if I stay out of tomcat -- I
download a fresh solr binary distro, copy those five JARs from 'dist'
and 'contrib' into example/solr/lib/, copy my solrconfig.xml and
schema.xml, and run 'java -jar start.jar', and it works fine.  But
trying to add those same JARs to my tomcat instance's solrhome/lib
doesn't work.  Any ideas how to troubleshoot?


On Fri, Apr 6, 2012 at 12:15 PM, N. Tucker
ntucker-ml-solr-us...@august20th.com wrote:
 Hello, I'm running into an odd problem trying to use ICUTokenizer
 under a solr installation running under tomcat on ubuntu.  It seems
 that all the appropriate JAR files are loaded:

 INFO: Adding 'file:/usr/share/solr/lib/lucene-stempel-3.5.0.jar' to 
 classloader
 INFO: Adding 'file:/usr/share/solr/lib/lucene-smartcn-3.5.0.jar' to 
 classloader
 INFO: Adding 'file:/usr/share/solr/lib/icu4j-4_8_1_1.jar' to classloader
 INFO: Adding 'file:/usr/share/solr/lib/lucene-icu-3.5.0.jar' to classloader
 INFO: Adding 'file:/usr/share/solr/lib/apache-solr-analysis-extras-3.5.0.jar'
 to classloader
 ... but later: ...
 SEVERE: java.lang.NoClassDefFoundError:
 org/apache/lucene/analysis/icu/segmentation/ICUTokenizer

 I'm not too clear on the correct way to add the contrib bits other
 than copying them into the 'lib' directory under solrhome.  They are
 obviously found there (and I have verified that ICUTokenizer is in
 lucene-icu-3.5.0.jar), but there's still a problem loading the
 ICUTokenizer class.  Any tips on troubleshooting this?  Are there more
 depenencies that I'm unaware of?


Re: SolrCloud Zookeeper view does not work on latest snapshot

2012-04-06 Thread Jamie Johnson
Thanks Ryan.  So to be clear this is a bug then?  I went into the
cloud.js and changed the url used to access this information so that
it would work, wasn't sure if it was kosher or not.

On 4/6/12, Ryan McKinley ryan...@gmail.com wrote:
 There have been a bunch of changes getting the zookeeper info and UI
 looking good.  The info moved from being on the core to using a
 servlet at the root level.

 Note, it is not a request handler anymore, so the wt=XXX has no
 effect.  It is always JSON

 ryan


 On Fri, Apr 6, 2012 at 7:01 AM, Jamie Johnson jej2...@gmail.com wrote:
 I looked at our old system and indeed it used to make a call to
 /solr/zookeeper not /solr/corename/zookeeper.  I am making a change
 locally so I can run with this but is this a bug or did I much
 something up with my configuration?

 On Fri, Apr 6, 2012 at 9:33 AM, Jamie Johnson jej2...@gmail.com wrote:
 I just downloaded the latest snapshot and fired it up to take a look
 around and I'm getting the following error when looking at the Cloud
 view.

 Loading of undefined failed with HTTP-Status 404

 The request I see going out is as follows

 http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json

 this doesn't work but this does

 http://localhost:8501/solr/zookeeper?wt=json

 Any thoughts why this would happen?



Re: SolrEntityProcessor Configuration Problem

2012-04-06 Thread Lance Norskog
The SolrEntityProcessor resolves all of its parameters at start time,
not for each query. This technique cannot work. I filed it:

https://issues.apache.org/jira/browse/SOLR-3336

On Fri, Apr 6, 2012 at 11:13 AM,  michael.k...@basf.com wrote:
 Dear all,
 I'm facing a problem with SolrEntityProcessor, when having it configured
 under a JDBC Datasource.
 My configuration looks like this:

 entity name=V_MARKET_STUDIES  datasource=jdbc-2 query=select * from
 V_MARKET_STUDIES transformer=ClobTransformer

                        field column=ID name=id /
                        field column=TYPE name=type /
                        field column=LOCALE name=locale /

                        field column=TITLE name=title /

                        field column=KEYWORDS name=keywords
 clob=true/
                        field column=TOPICS name=topics/
                        field column=EXTENDED_KEYWORDS
 name=extended_keywords                clob=true/
                        field column=PUBLICATION_DATE
 name=publication_date/

                        field column=OWNER name=owner /

                        field column=DL_FILE_ENTRY_ID
 name=dl_file_entry_id /
                        field column=DL_FILE_VERSION_ID
 name=dl_file_version_id /
                        field column=DL_FOLDER_ID name=dl_folder_id
 /
                        field column=FILE_NAME name=file_name /
                        field column=EXTENSION name=extension /
                        field column=URL_LINK name=urllink /


                entity name=sep   processor=SolrEntityProcessor
                 fl=content url=http://vmcenter120:8983/solr/;
 query=folderId:${V_MARKET_STUDIES.DL_FOLDER_ID}
 fq=entryClassPK:${V_MARKET_STUDIES.DL_FILE_ENTRY_ID}
                field column=content name=content /

                /entity
        /entity

 I have 6 rows in the Oracle Database, but only the first row is processed
 right, means that the 2nd Solr is queried
 and the results went to the document, the remaining 5 rows where processed
 without quering the 2nd Solr and therfore
 didn't have the content field filled.

 Any suggestions?
 Did I configured something wrong, or misunderstand something wrong?
 Thanks for your help


 Best regards
 Michael



-- 
Lance Norskog
goks...@gmail.com


Re: schema design question

2012-04-06 Thread Lance Norskog
(albums:query OR tracks:query) AND NOT(tracks:query - albums:query)

Is this it? That last clause does sound like a join.

How do you shard? Is it possible to put all associated albums and
tracks in one shard? You can then do a join query against each shard
and merge the output yourself.

On Fri, Apr 6, 2012 at 9:59 AM, Neal Tucker ntuc...@august20th.com wrote:
 Thanks, but I don't want to exclude all tracks that are associated
 with albums, I want to exclude tracks that are associated with albums
 *which match the query* (tracks and their associated albums may have
 different tags).  I don't think your suggestion covers that.

 On Fri, Apr 6, 2012 at 9:35 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 I'd consider a field like associated_with_album, and a
 field that identifies the kind of record this is track or album.

 Then you can form a query like -associated_with_album:true
 (where '-' is the Lucene or NOT).

 And then group by kind to get separate groups of albums and
 tracks.

 Hope this helps
 Erick

 On Thu, Apr 5, 2012 at 9:00 PM, N. Tucker
 ntucker-ml-solr-us...@august20th.com wrote:
 Apologies if this is a very straightforward schema design problem that
 should be fairly obvious, but I'm not seeing a good way to do it.
 Let's say I have an index that wants to model Albums and Tracks, and
 they all have arbitrary tags attached to them (represented by
 multivalue string type fields).  Tracks also have an album id field
 which can be used to associate them with an album.  I'd like to
 perform a query which shows both Track and Album results, but
 suppresses Tracks that are associated with Albums in the result set.

 I am tempted to use a join here, but I have reservations because it
 is my understanding that joins cannot work across shards, and I'm not
 sure it's a good idea to limit myself in that way if possible.  Any
 suggestions?  Is there a standard solution to this type of problem
 where you've got hierarchical items and you don't want children shown
 in the same result as the parent?



-- 
Lance Norskog
goks...@gmail.com


Re: solr analysis-extras configuration

2012-04-06 Thread Lance Norskog
Tomcat needs an explicit parameter somewhere to use UTF-8 text. It's
on the wiki how to do this.

On Fri, Apr 6, 2012 at 4:41 PM, N. Tucker
ntucker-ml-solr-us...@august20th.com wrote:
 Further info: I can make this work if I stay out of tomcat -- I
 download a fresh solr binary distro, copy those five JARs from 'dist'
 and 'contrib' into example/solr/lib/, copy my solrconfig.xml and
 schema.xml, and run 'java -jar start.jar', and it works fine.  But
 trying to add those same JARs to my tomcat instance's solrhome/lib
 doesn't work.  Any ideas how to troubleshoot?


 On Fri, Apr 6, 2012 at 12:15 PM, N. Tucker
 ntucker-ml-solr-us...@august20th.com wrote:
 Hello, I'm running into an odd problem trying to use ICUTokenizer
 under a solr installation running under tomcat on ubuntu.  It seems
 that all the appropriate JAR files are loaded:

 INFO: Adding 'file:/usr/share/solr/lib/lucene-stempel-3.5.0.jar' to 
 classloader
 INFO: Adding 'file:/usr/share/solr/lib/lucene-smartcn-3.5.0.jar' to 
 classloader
 INFO: Adding 'file:/usr/share/solr/lib/icu4j-4_8_1_1.jar' to classloader
 INFO: Adding 'file:/usr/share/solr/lib/lucene-icu-3.5.0.jar' to classloader
 INFO: Adding 'file:/usr/share/solr/lib/apache-solr-analysis-extras-3.5.0.jar'
 to classloader
 ... but later: ...
 SEVERE: java.lang.NoClassDefFoundError:
 org/apache/lucene/analysis/icu/segmentation/ICUTokenizer

 I'm not too clear on the correct way to add the contrib bits other
 than copying them into the 'lib' directory under solrhome.  They are
 obviously found there (and I have verified that ICUTokenizer is in
 lucene-icu-3.5.0.jar), but there's still a problem loading the
 ICUTokenizer class.  Any tips on troubleshooting this?  Are there more
 depenencies that I'm unaware of?



-- 
Lance Norskog
goks...@gmail.com