AW: Question about dates and SolrJ

2013-01-13 Thread Uwe Clement
In 3.6.1 i also got back a Date insance, now from 4.0 I receive also a
String.

I don't like this, but I adapted my software now.

Is there no way to change this behavior in the config?

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:s...@elyograg.org]
Gesendet: Sonntag, 13. Januar 2013 07.53
An: solr-user@lucene.apache.org
Betreff: Re: Question about dates and SolrJ

On 1/12/2013 7:51 PM, Jack Park wrote:
 My work engages SolrJ, with which I send documents off to Solr 4 which
 properly store, as viewed in the admin panel, as this example:
 2013-02-04T02:11:39.995Z

 When I retrieve a document with that date, I use the SolrDocument
 returned as a MapString,Object in which the date now looks like
 this:
 Sun Feb 03 18:11:39 PST 2013

 I am thinking that I am missing something in the SolrJ configuration,
 though it could be in how I structure the query; for now, here is the
 simplistic way I setup SolrJ:

 HttpSolrServer server = new HttpSolrServer(solrURL);
 server.setParser(new XMLResponseParser())

 Is there something I am missing to retain dates as Solr stores them?

Quick note: setting the parser is NOT necessary unless you are trying to
connect radically different versions of Solr and SolrJ (1.x and 3.x/later,
to be precise), and will in fact make SolrJ slightly slower when
contacting Solr.  Just let it use the default javabin parser -- it's
faster.

If your date field in Solr is an actual date type, then you should be
getting back a Date object in Java which you can manipulate in all the
usual Java ways.  The format that you are seeing matches the toString()
output from a Date object:

http://docs.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%2
9

You'll almost certainly have to cast the object so it's the right type:

Date dateField = (Date) doc.get(datefieldname);

Thanks,
Shawn



SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread uwe72
i have very big documents in the index.

i want to update a multivalue field of a document, without loading the whole
document.

how can i do this?

is there somewhere a good documentation?

regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exactly-tp4032976.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: CoreAdmin STATUS performance

2013-01-13 Thread Shahar Davidson
Thanks for sharing this info, Per - this info may prove to be valuable for me 
in the future.

Shahar.

-Original Message-
From: Per Steffensen [mailto:st...@designware.dk] 
Sent: Thursday, January 10, 2013 6:10 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

The collections are created dynamically. Not on update though. We use one 
collection per month and we have a timer-job running (every hour or so), which 
checks if all collections that need to exist actually does exist - if not it 
creates the collection(s). The rule is that the collection for next month has 
to exist as soon as we enter current month, so the first time the timer job 
runs e.g. 1. July it will create the August-collection. We never get data with 
timestamp in the future. 
Therefore if the timer-job just gets to run once within every month we will 
always have needed collections ready.

We create collections using the new Collection API in Solr. Be used to manage 
creation of every single Shard/Replica/Core of the collections during the Core 
Admin API in Solr, but since an Collection API was introduced we decided that 
we better use that. In 4.0 it did not have the features we needed, which 
triggered SOLR-4114, SOLR-4120 and
SOLR-4140 which will be available in 4.1. With those features we are now using 
the Collection API.

BTW, our timer-job also handles deletion of old collections. In our system 
you can configure how many historic month-collection you will keep before it is 
ok to delete them. Lets say that this is configured to 3, as soon at it becomes 
1. July the timer-job will delete the March-collection (the historic 
collections to keep will just have become April-, May- and June-collections). 
This way we will always have a least
3 months of historic data, and last in a month close to 4 months of history. It 
does not matter that we have a little to much history, when we just do not go 
below the lower limit on lenght of historic data. We also use the new 
Collection API for deletion.

Regards, Per Steffensen

On 1/10/13 3:04 PM, Shahar Davidson wrote:
 Hi Per,

 Thanks for your reply!

 That's a very interesting approach.

 In your system, how are the collections created? In other words, are the 
 collections created dynamically upon an update (for example, per new day)?
 If they are created dynamically, who handles their creation (client/server)  
 and how is it done?

 I'd love to hear more about it!

 Appreciate your help,

 Shahar.

 -Original Message-
 From: Per Steffensen [mailto:st...@designware.dk]
 Sent: Thursday, January 10, 2013 1:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: CoreAdmin STATUS performance

 On 1/10/13 10:09 AM, Shahar Davidson wrote:
 search request, the system must be aware of all available cores in 
 order to execute distributed search on_all_  relevant cores
 For this purpose I would definitely recommend that you go SolrCloud.

 Further more we do something ekstra:
 We have several collections each containing data from a specific 
 period in time - timestamp of ingoing data decides which collection it 
 is indexed into. One important search-criteria for our clients are 
 search on timestamp-interval. Therefore most searches can be 
 restricted to only consider a subset of all our collections. Instead 
 of having the logic calculating the subset of collections to search 
 (given the timestamp
 search-interval) in clients, we just let clients do dumb searches by giving 
 the timestamp-interval. The subset of collections to search are calculated on 
 server-side from the timestamp-interval in the search-query. We handle this 
 in a Solr SearchComponent which we place early in the chain of 
 SearchComponents. Maybe you can get some inspiration by this approach, if it 
 is also relevant for you.

 Regards, Per Steffensen

 Email secured by Check Point



Email secured by Check Point


RE: CoreAdmin STATUS performance

2013-01-13 Thread Shahar Davidson
Shawn, Per and anyone else who has participated in this thread - thank you!

I have finally resorted to apply a minor patch the Solr code. 
I have noticed that most of the time of the STATUS request is spent when 
collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
etc.).
In the STATUS request I added support for a new parameter which, if present, 
will skip collection of the Index info (hence will only return general static 
info, among it the core name) - this will, in fact, cut down the request time 
by an order of two magnitudes!
In my case, it decreased the request time from around 800ms to around 1ms-4ms.

Regards,

Shahar.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, January 10, 2013 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: CoreAdmin STATUS performance

On 1/10/2013 2:09 AM, Shahar Davidson wrote:
 As for your first question, the core info needs to be gathered upon every 
 search request because cores are created dynamically.
 When a user initiates a search request, the system must be aware of 
 all available cores in order to execute distributed search on _all_ relevant 
 cores. (the user must get reliable and most up to date data) The reason that 
 800ms seems a lot to me is because the overall execution time takes about 
 2500ms and a large part of it is due to the STATUS request.

 The minimal interval concept is a good idea and indeed we've considered it, 
 yet it poses a slight problem when building a RT system which needs to return 
 to most up to date data.
 I am just trying to understand if there's some other way to hasten the 
 STATUS reply (for example, by asking the STATUS request to return just 
 certain core attributes, such as name, instead of collecting 
 everything)

Are there a *huge* number of SolrJ clients in the wild, or is it something like 
a server farm where you are in control of everything?  If it's the latter, what 
I think I would do is have an asynchronous thread that periodically (every few 
seconds) updates the client's view of what cores exist.  When a query is made, 
it will use that information, speeding up your queries by 800 milliseconds and 
ensuring that new cores will not have long delays before they become 
searchable.  If you have a huge number of clients in the wild, it would still 
be possible, but ensuring that those clients get updated might be hard.

If you also delete cores as well as add them, that complicates things.  
You'd have to have the clients be smart enough to exclude the last core on the 
list (by whatever sorting mechanism you require), and you'd have to wait long 
enough (30 seconds, maybe?) before *actually* deleting the last core to be sure 
that no clients are accessing it.

Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
4.0.  SolrCloud manages your cores for you automatically.  
You'd probably be using a slightly customized SolrCloud, including the custom 
hashing capability added by SOLR-2592.  I don't know what other customizations 
you might need.

Thanks,
Shawn


Email secured by Check Point


Re: CoreAdmin STATUS performance

2013-01-13 Thread Stefan Matheis
Shahar


would you mind, if i ask you to open an jira-issue for that? attaching your 
changes as typical patch?
perhaps we could use that for the UI, in those cases where we don't need to 
full set of information ..

Stefan 


On Sunday, January 13, 2013 at 12:28 PM, Shahar Davidson wrote:

 Shawn, Per and anyone else who has participated in this thread - thank you!
 
 I have finally resorted to apply a minor patch the Solr code. 
 I have noticed that most of the time of the STATUS request is spent when 
 collecting Index related info (such as segmentCount, sizeInBytes, numDocs, 
 etc.).
 In the STATUS request I added support for a new parameter which, if present, 
 will skip collection of the Index info (hence will only return general static 
 info, among it the core name) - this will, in fact, cut down the request time 
 by an order of two magnitudes!
 In my case, it decreased the request time from around 800ms to around 1ms-4ms.
 
 Regards,
 
 Shahar.
 
 -Original Message-
 From: Shawn Heisey [mailto:s...@elyograg.org] 
 Sent: Thursday, January 10, 2013 5:14 PM
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Subject: Re: CoreAdmin STATUS performance
 
 On 1/10/2013 2:09 AM, Shahar Davidson wrote:
  As for your first question, the core info needs to be gathered upon every 
  search request because cores are created dynamically.
  When a user initiates a search request, the system must be aware of 
  all available cores in order to execute distributed search on _all_ 
  relevant cores. (the user must get reliable and most up to date data) The 
  reason that 800ms seems a lot to me is because the overall execution time 
  takes about 2500ms and a large part of it is due to the STATUS request.
  
  The minimal interval concept is a good idea and indeed we've considered 
  it, yet it poses a slight problem when building a RT system which needs to 
  return to most up to date data.
  I am just trying to understand if there's some other way to hasten the 
  STATUS reply (for example, by asking the STATUS request to return just 
  certain core attributes, such as name, instead of collecting 
  everything)
 
 
 
 Are there a *huge* number of SolrJ clients in the wild, or is it something 
 like a server farm where you are in control of everything? If it's the 
 latter, what I think I would do is have an asynchronous thread that 
 periodically (every few seconds) updates the client's view of what cores 
 exist. When a query is made, it will use that information, speeding up your 
 queries by 800 milliseconds and ensuring that new cores will not have long 
 delays before they become searchable. If you have a huge number of clients in 
 the wild, it would still be possible, but ensuring that those clients get 
 updated might be hard.
 
 If you also delete cores as well as add them, that complicates things. 
 You'd have to have the clients be smart enough to exclude the last core on 
 the list (by whatever sorting mechanism you require), and you'd have to wait 
 long enough (30 seconds, maybe?) before *actually* deleting the last core to 
 be sure that no clients are accessing it.
 
 Or you could use SolrCloud, as Per suggested, but with 4.1, not the released 
 4.0. SolrCloud manages your cores for you automatically. 
 You'd probably be using a slightly customized SolrCloud, including the custom 
 hashing capability added by SOLR-2592. I don't know what other customizations 
 you might need.
 
 Thanks,
 Shawn
 
 
 Email secured by Check Point 




Re: Probleme running solr 4.0 in Websphere 7

2013-01-13 Thread Erick Erickson
This looks like you're getting old jars somewhere in your classpath on the
websphere box. I know some classes have moved around between 3.6 and 4.0.

It's tedious, but take a look at your solr log file, you should see a bunch
of messages about what jars are being loaded. Do all of them look correct?

Best
Erick


On Thu, Jan 10, 2013 at 3:46 PM, Riad I.A riad...@hotmail.com wrote:


 I'm trying tu run solr 4.0 on Websphere 7 and have problems on starting
 solr.
 I tried with solr 3.6 and everythings ok as I  can access the admin UI.
 For solr 4.0 when I try to access the admin page I have
 ClassNotFoundException on solr.WhitespaceTokenizerFactory
 I noticed that many analyser classes were moved from solr-core to
 lucen-core analyser package.

 to solve the problem I replaced the shorthand core with
 org.apache.lucene.analyser.core in the schema.xml file and the problem
 desepear but got another ClassNotFoundExceptionabout another
 TokenizerFactory or filter and again replaced with the right package and so
 forth.

 starngely I hadn't those kind of errors under Tomcat by using the same War
 ! is there any special config to do with Websphere 7 to get it run ?

 thanks for your help



Re: Suggestion that preserve original phrase case

2013-01-13 Thread Erick Erickson
One way I've seen this done is to index pairs like
lowercaseversion:LowerCaseVersion. You can't push this whole thing through
your field as defined since it'll all be lowercased, you have to produce
the left hand side of the above yourself and just use KeywordTokenizer
without LowercaseFilter.

Then, your application displays the right-hand-side of the returned token.

Simple solution, not very elegant, but sometimes the easiest...

Best
Erick


On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote:

 Hi*,

 *
 I have been trying to figure out a way for case insensitive suggestion but
 which should return original phrase as result.* *I am using* *solr 3.5*

 *
 *For eg:

 *
 If I index 'Hello world' and search  for 'hello' it needs to return *'Hello
 world'* not *'hello world'. *My configurations are as follows,*
 *
 *
 New field type:*
 fieldType class=solr.TextField name=text_auto
   analyzer
tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.LowerCaseFilterFactory/
 /analyzer

 *Field values*:
field name=label type=text indexed=true stored=true
 termVectors=true omitNorms=true/
field name=label_autocomplete type=text_auto indexed=true
 stored=true multiValued=false/
copyField source=label dest=label_autocomplete /

 *Spellcheck Component*:
   searchComponent name=suggest class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetext_auto/str
 lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=buildOnOptimizetrue/str
 str name=buildOnCommittrue/str
 str name=fieldlabel_autocomplete/str
   /lst
 /searchComponent


 Kindly share your suggestions to implement this behavior.

 --
 Regards,
 Selvam
 KnackForge http://knackforge.com
 Acquia Service Partner
 No. 1, 12th Line, K.K. Road, Venkatapuram,
 Ambattur, Chennai,
 Tamil Nadu, India.
 PIN - 600 053.



Re: SolrCloud removing shard (how to not loose data)

2013-01-13 Thread Erick Erickson
I don't think this will work in the long run with Solr4 (not sure you're
using this or not). Solr4 will assign updates to a shard based on a hash of
the uniqueKey. So let's say you have docs on  your original three shards:
shard 1 has docs 1, 4, 7
shard 2 has docs 2, 5, 8
shard 3 has docs 3, 6, 9

Now you merge shards 2 and 3, and you have
shard 1 - 1, 4, 7
shard 2 - 2, 3, 5, 6, 8, 9

Now if you update docs 1 or 2, everything's fine. But, if you re-index doc
3, it'll be assigned shard 1. Now you have two live documents on different
shards with the same ID. You'll get both back for searches, one will be
stale, etc. This is a Bad Thing.

And even if you're on 3.x and assigning docs to shards yourself, you now
have pretty unbalanced shards, shard2 is twice as big as shard1.

NOTE: The actual doc-shard assignment is NOT a simple round-robin, this is
just for illustration

Unless re-indexing is _really_ expensive, I'd just count on re-indexing
when changing the number of shards. At least until shard splitting is in
place for Solr4. And I'm not sure shard splitting will also handle shard
merging, I'd check before assuming so...

Best
Erick


On Fri, Jan 11, 2013 at 8:47 AM, mizayah miza...@gmail.com wrote:

 Seams I'm to lazy.
 I found this http://wiki.apache.org/solr/MergingSolrIndexes, and it works
 rly.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-removing-shard-how-to-not-loose-data-tp4032138p4032508.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to disable\clear filterCache(from SolrIndexSearcher ) in a custom searchComponent

2013-01-13 Thread Erick Erickson
I admit I only skimmed your post, and at the level you're at I'm not sure
how to hook it in, but see:
https://issues.apache.org/jira/browse/SOLR-2429(SOLR-3.4) which allows
you to specify cache=false which will specifically
NOT put the filter into the filter cache at the Solr level.

Best
Erick


On Fri, Jan 11, 2013 at 11:24 AM, radu radu.moldo...@atigeo.com wrote:

 Hello  thank you in advance for your help!,

 *Context:*
 I have implemented a custom search component that receives 3 parameters
 field, termValue and payloadX.
 The component should search for a termValue in the requested Lucene field
 and for each *termValue* to check *payloadX* in its associated payload the
 information.

 *Constraints:*
 I don't want to disable filterCache from solconfig.xml the filterCache
 class=solr.FastLRUCache  since I have other searchComponents that
 could use the filterCache.

 I have implemented this the payload search using SpanTermQuery and
 attached it to q:field=termValue
 public class MySearchComponent extends XPatternsSearchComponent {

 public void prepare(ResponseBuilder rb){
 ...   rb.setQueryString(parameters.**get(CommonParams.Q)...
 }

  public void process(ResponseBuilder rb) {
 ...

 SolrIndexSearcher.QueryResult queryResult = new
 SolrIndexSearcher.QueryResult(**);// ??? question for help

 *CustomSpanTermQuery* customFilterQuery = new CustomSpanTermQuery(field,
 term, payload); //search for payloadCriteria in the payload in a specific
 field for a specific term
 QueryCommand queryCommand = rb.getQueryCommand().**
 setFilterList(filterQuery));

 rb.req.getSearcher().search(**queryResult, queryCommand);

 ...
 }

 *Issue:*
 If I call the search component with field1, termValue1 and:
  - *payload1*(the first search) the result from filtering it is saved
 in filterCache.
  - *payload2*(second time) the results from the first
 search(filterCache) are returned and not a different expected result set.

 Findings:
 I noticed that in SolrIndexSearch, filterCache is private so I can not
 change\clear it through inheritance.
 Also I tried to use rb.getQueryCommand().**replaceFlags() but
 SolrIndexSearch.NO_CHECK_**FILTERCACHE|NO_CHECK_QCACHE|**NO_SET_QCACHE
 are not public too.

 *Question*:
 How to disable\clear filterCache(from SolrIndexSearcher ) *only *for a
 custom search component.
 Do  I have other options\approaches?

 Best regards,
 Radu



Re: Solr 4.0, slow opening searchers

2013-01-13 Thread Erick Erickson
In addition to Alan's comment, are you doing any warmup queries? Your Solr
logs should show you some interesting stats, and the admin page also has
some stats about warmup times. Although I'd expect similar issues when
reopening searchers if it was just warmup queries.

But 267M docs on a single machine (spread over 9 cores or not) is quite a
lot (depending, of course, on how beefy the machine is and the
characteristics of your corpus). It's possible you're just I/O bound at
startup, experiencing memory pressure, etc. that is, your index is just too
large for your hardware. I've seen machines vary from 10M to 300M docs
being reasonable.

FWIW,
Erick


On Fri, Jan 11, 2013 at 12:31 PM, Alan Woodward a...@flax.co.uk wrote:

 Hi Marcel,

 Are you committing data with hard commits or soft commits?  I've seen
 systems where we've inadvertently only used soft commits, which means that
 the entire transaction log has to be re-read on startup, which can take a
 long time.  Hard commits flush indexed data to disk, and make it a lot
 quicker to restart.

 Alan Woodward
 a...@flax.co.uk


 On 11 Jan 2013, at 13:51, Marcel Bremer wrote:

  Hi,
 
  We're experiencing slow startup times of searchers in Solr when
 containing a large number of documents.
 
  We use Solr v4.0 with Jetty and currently have 267.657.634 documents
 stored, spread across 9 cores. These documents contain keywords, with
 additional statistics, which we are using for suggestions and related
 keywords. When we (re)start Solr on one of our servers it can take up to
 two hours before Solr has opened all of it's searchers and starts accepting
 connections again. We can't figure out why it takes so long to open those
 searchers. Also the CPU and memory usage of Solr while opening searchers is
 not extremely high.
 
  Are there any known issues or tips someone could give us to speed up
 opening searchers?
 
  If you need more details, please ping me.
 
 
  Best regards,
 
  Marcel Bremer
  Vinden.nl BV




Re: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Erick Erickson
Atomic updates work by storing (stored=true) all the fields (note, you
don't have to set stored=true for the destinations of copyField). Anyway,
when you use the atomic update syntax under the covers Solr reads all the
stored fields out, re-assembles the document and re-indexes it. So your
index may be significantly larger. Also note that in the 4.1 world, stored
fields are automatically compressed so this may not be so much of a problem.

And, there's been at least 1 or 2 fixes to this since 4.0 as I remember, so
you might want to wait for 4.1 to experiment with (there's talk of cutting
RC1 for Solr4.1 early next week) or use a nightly build.

Best
Erick


On Sun, Jan 13, 2013 at 3:43 AM, uwe72 uwe.clem...@exxcellent.de wrote:

 i have very big documents in the index.

 i want to update a multivalue field of a document, without loading the
 whole
 document.

 how can i do this?

 is there somewhere a good documentation?

 regards



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exactly-tp4032976.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Index sharing between multiple slaves

2013-01-13 Thread suri
Sorry, might have shared more info. Planning to have Index files in NAS and
share these index files across multiple nodes. We have 4 slave nodes. For
redundancy we might be having 2 nodes per a shared index. Any issues you
foresee with this. I will post details once we test this.

Cheers,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-sharing-between-multiple-slaves-tp4025996p4033006.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to manage solr cloud collections-sharding?

2013-01-13 Thread adfel70
Hi,
I know a few question on this issue have already been posted, but I dint
find full answers in any of those posts.

I'm using solr-4.0.0
I need my solr cluster to have multiple collections, each collection with
different configuration (at least different schema.xml file).
I follow the solrCloud tutorial page and execute this command:
/java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=5 -jar start.jar/
when I start a solr servers I have collection1 in clustserState.json with
each node assigned to some shard.

questions so far:
1.Is this first command 100% necessary?
2. Do I have to defined the number of shards before starting solr instances? 
3. What if I want to add a shard after I started all solr instances and
haven't indexed yet?
4. what if I want to add a shard after indexing? 
5. what is the role that clustserState.json plays? is it just a json file to
show in the GUI? Or is it the only file that persists the current state of
the cluster? 
6. Can I edit it manually? should I?

I add another schema-B.xml file to the zookeeper and open another collection
by using coreAdmin Rest API.
I want this collection to have 10 shards and not 5 as I defined for the
previous collection.
So I run
/http://server:port/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=datashard=shard//
10 times with different / each run.

questions:
1. is this an appropriate way to use the core admin API? should I specify
the shard Id? I do it because it gives me a way to control the number of
shards (each new shard id creates a new shard). but should I use it this
way?
2. Can I have different number of shards in different collections on the
same cluster?
3. If yes - then what is the purpose of the first bootstrap command?


another question:
I saw that in 4.1 version, each shard has another parameter - range. what is
this parameter used for? would I have to re-index when upgrading from 4.0 to
4.1?


this will help a lot in understanding the whole collection-sharding
architecture in solr cloud.
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-manage-solr-cloud-collections-sharding-tp4033009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NULL POINTER EXCEPTION WITH SOLR SUGGESTER

2013-01-13 Thread Jack Krupansky

What URL did you use? What is your data like?

I tried your exact config but with the field name of name rather than 
spell_check, using the Solr 4.0 example. Then I added the following data:


curl http://localhost:8983/solr/update?commit=true -H 
'Content-type:application/csv' -d '

id,name
sug-1,aardvark abacus ball bill cat cello
sug-2,abate accord band bell cattle check
sug-3,adorn border clean clock'

Then I issued a suggest request using curl and got the expected response:

Jack Krupansky@JackKrupansky ~ $ curl 
http://localhost:8983/solr/suggest?q=bindent=true;

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
/lst
lst name=spellcheck
 lst name=suggestions
   lst name=b
 int name=numFound5/int
 int name=startOffset0/int
 int name=endOffset1/int
 arr name=suggestion
   strball/str
   strband/str
   strbell/str
   strbill/str
   strborder/str
 /arr
   /lst
   str name=collationball/str
 /lst
/lst
/response

So, try that simple example first and make sure it works for you, then see 
what else is different in your failing scenario.


-- Jack Krupansky

-Original Message- 
From: obi240

Sent: Saturday, January 12, 2013 12:15 PM
To: solr-user@lucene.apache.org
Subject: NULL POINTER EXCEPTION WITH SOLR SUGGESTER

Hi,

I'm currently working with SOLR 4. I tried calling my suggester feature and
got the error below:

5001java.lang.NullPointerException at
org.apache.lucene.search.suggest.fst.FSTCompletionLookup.lookup(FSTCompletionLookup.java:237)
at
org.apache.solr.spelling.suggest.Suggester.getSuggestions(Suggester.java:190)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:964)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722) 500

My suggest searchcomponent and request handler are as below:


 searchComponent class=solr.SpellCheckComponent name=suggest
   lst name=spellchecker
 str name=namesuggest/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookup/str
 str name=fieldspell_check/str
 float name=threshold0.005/float
 str name=buildOnCommittrue/str

   /lst
 /searchComponent


 requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
   lst name=defaults
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.onlyMorePopulartrue/str
 str name=spellcheck.count5/str
 str name=spellcheck.collatetrue/str
   /lst
   arr name=components
 strsuggest/str
   /arr
 /requestHandler

Can anyone point out what I'm doing wrong here?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/NULL-POINTER-EXCEPTION-WITH-SOLR-SUGGESTER-tp4032763.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Index sharing between multiple slaves

2013-01-13 Thread Upayavira
It can work, so I believe. However, it is not normal Solr usage, so you
are less likely to find people who can support you in it.

Upayavira

On Sun, Jan 13, 2013, at 03:59 PM, suri wrote:
 Sorry, might have shared more info. Planning to have Index files in NAS
 and
 share these index files across multiple nodes. We have 4 slave nodes. For
 redundancy we might be having 2 nodes per a shared index. Any issues you
 foresee with this. I will post details once we test this.
 
 Cheers,
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-sharing-between-multiple-slaves-tp4025996p4033006.html
 Sent from the Solr - User mailing list archive at Nabble.com.


AW: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Uwe Clement
Thanks erick,

the main reason why i want to use atomic updates is, to increase updating
existing kind of large documents.

So if under to cover, everything is the same (loading the whole doc,
updating, re-index the whole doc) it is not interesting for me anymore.

What is the best the most performant way to update a large document?

Any recommendations?

THANKS!

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com]
Gesendet: Sonntag, 13. Januar 2013 16.53
An: solr-user@lucene.apache.org
Betreff: Re: SolrJ | Atomic Updates | How works exactly?

Atomic updates work by storing (stored=true) all the fields (note, you
don't have to set stored=true for the destinations of copyField).
Anyway, when you use the atomic update syntax under the covers Solr reads
all the stored fields out, re-assembles the document and re-indexes it. So
your index may be significantly larger. Also note that in the 4.1 world,
stored fields are automatically compressed so this may not be so much of a
problem.

And, there's been at least 1 or 2 fixes to this since 4.0 as I remember,
so you might want to wait for 4.1 to experiment with (there's talk of
cutting
RC1 for Solr4.1 early next week) or use a nightly build.

Best
Erick


On Sun, Jan 13, 2013 at 3:43 AM, uwe72 uwe.clem...@exxcellent.de wrote:

 i have very big documents in the index.

 i want to update a multivalue field of a document, without loading the
 whole document.

 how can i do this?

 is there somewhere a good documentation?

 regards



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrJ-Atomic-Updates-How-works-exac
 tly-tp4032976.html Sent from the Solr - User mailing list archive at
 Nabble.com.



Re: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Yonik Seeley
On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de wrote:
 What is the best the most performant way to update a large document?

That *is* the best way to update a large document that we currently have.
Although it re-indexes under the covers, it ensures that it's atomic,
and it's faster because it does everything in a single request.

-Yonik
http://lucidworks.com


AW: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Uwe Clement
Thanks Yonik.

Is this already working well on solr 4.0? or better to wait until solr
4.1?!


-Ursprüngliche Nachricht-
Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik
Seeley
Gesendet: Sonntag, 13. Januar 2013 20.24
An: solr-user@lucene.apache.org
Betreff: Re: SolrJ | Atomic Updates | How works exactly?

On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de
wrote:
 What is the best the most performant way to update a large document?

That *is* the best way to update a large document that we currently have.
Although it re-indexes under the covers, it ensures that it's atomic, and
it's faster because it does everything in a single request.

-Yonik
http://lucidworks.com


Re: Question about dates and SolrJ

2013-01-13 Thread Jack Park
Thanks Shawn.

I stopped setting the parser as suggested.

I found that what I had to do is to just store Date objects in my
documents, then, at the last minute, when building a SolrDocument to
send, convert with DateField. When I Export to XML, I export to that
DateField string, then convert the zulu string back to a Date object
as needed.

Seems to be working fine now.

Many thanks
Jack

On Sat, Jan 12, 2013 at 10:52 PM, Shawn Heisey s...@elyograg.org wrote:
 On 1/12/2013 7:51 PM, Jack Park wrote:

 My work engages SolrJ, with which I send documents off to Solr 4 which
 properly store, as viewed in the admin panel, as this example:
 2013-02-04T02:11:39.995Z

 When I retrieve a document with that date, I use the SolrDocument
 returned as a MapString,Object in which the date now looks like
 this:
 Sun Feb 03 18:11:39 PST 2013

 I am thinking that I am missing something in the SolrJ configuration,
 though it could be in how I structure the query; for now, here is the
 simplistic way I setup SolrJ:

 HttpSolrServer server = new HttpSolrServer(solrURL);
 server.setParser(new XMLResponseParser())

 Is there something I am missing to retain dates as Solr stores them?


 Quick note: setting the parser is NOT necessary unless you are trying to
 connect radically different versions of Solr and SolrJ (1.x and 3.x/later,
 to be precise), and will in fact make SolrJ slightly slower when contacting
 Solr.  Just let it use the default javabin parser -- it's faster.

 If your date field in Solr is an actual date type, then you should be
 getting back a Date object in Java which you can manipulate in all the usual
 Java ways.  The format that you are seeing matches the toString() output
 from a Date object:

 http://docs.oracle.com/javase/6/docs/api/java/util/Date.html#toString%28%29

 You'll almost certainly have to cast the object so it's the right type:

 Date dateField = (Date) doc.get(datefieldname);

 Thanks,
 Shawn



SolrCloud sort inconsistency

2013-01-13 Thread yriveiro
How is possible that this sorted query returns different results?

The highest value is the id P2450024023, sometimes the value returned is not
the highest.

This is an example, the second curl request is the correct result.

NOTE: I did the query when a indexing process was running.

➜  ~  curl -H Cache-Control: no-cache
http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
 
{
  responseHeader:{
status:0,
QTime:5,
params:{
  cache:False,
  rows:10,
  fl:id=sort=id desc,
  q:id:*}},
  response:{numFound:2387312,start:0,maxScore:1.0,docs:[
  {
id:P2443605077},
  {
id:P2443588094},
  {
id:P2443647855},
  {
id:P2443613193},
  {
id:P2443572098},
  {
id:P2443562507},
  {
id:P2443643935},
  {
id:P2443556464},
  {
id:P2443625267},
  {
id:P2443580781}]
  }}
➜  ~  curl -H Cache-Control: no-cache
http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
{
  responseHeader:{
status:0,
QTime:4,
params:{
  cache:False,
  rows:10,
  fl:id=sort=id desc,
  q:id:*}},
  response:{numFound:2387312,start:0,maxScore:1.0,docs:[
  {
id:P2450024023},
  {
id:P2450017490},
  {
id:P2450062568},
  {
id:P2450053498},
  {
id:P2449990839},
  {
id:P2449973572},
  {
id:P2449957535},
  {
id:P2450099098},
  {
id:P2450090195},
  {
id:P2450072528}]
  }}
➜  ~  curl -H Cache-Control: no-cache
http://192.168.1.241:8983/solr/ST-SHARD_0212/query\?q\=id:\*\rows\=10\fl\=id\=sort\=id%20desc\cache\=False
{
  responseHeader:{
status:0,
QTime:6,
params:{
  cache:False,
  rows:10,
  fl:id=sort=id desc,
  q:id:*}},
  response:{numFound:2387312,start:0,maxScore:1.0,docs:[
  {
id:P2450024023},
  {
id:P2450017490},
  {
id:P2450062568},
  {
id:P2450053498},
  {
id:P2449990839},
  {
id:P2449973572},
  {
id:P2449957535},
  {
id:P2450099098},
  {
id:P2450090195},
  {
id:P2450072528}]
  }}
➜  ~



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-sort-inconsistency-tp4033046.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Upayavira
This is present in 4.0. Not sure if there re ny improvements in 4.1.

Upayavira

On Sun, Jan 13, 2013, at 07:35 PM, Uwe Clement wrote:
 Thanks Yonik.
 
 Is this already working well on solr 4.0? or better to wait until solr
 4.1?!
 
 
 -Ursprüngliche Nachricht-
 Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik
 Seeley
 Gesendet: Sonntag, 13. Januar 2013 20.24
 An: solr-user@lucene.apache.org
 Betreff: Re: SolrJ | Atomic Updates | How works exactly?
 
 On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de
 wrote:
  What is the best the most performant way to update a large document?
 
 That *is* the best way to update a large document that we currently have.
 Although it re-indexes under the covers, it ensures that it's atomic, and
 it's faster because it does everything in a single request.
 
 -Yonik
 http://lucidworks.com


Re: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Erik Hatcher
There's several JIRA issues, but I several were duplicates of the same 
underlying issue:

   
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20issuetype%20%3D%20Bug%20AND%20fixVersion%20%3D%20%224.1%22%20AND%20status%20%3D%20Resolved%20AND%20text%20~%20%22atomic%20update%22

Erik


On Jan 13, 2013, at 19:49 , Upayavira wrote:

 This is present in 4.0. Not sure if there re ny improvements in 4.1.
 
 Upayavira
 
 On Sun, Jan 13, 2013, at 07:35 PM, Uwe Clement wrote:
 Thanks Yonik.
 
 Is this already working well on solr 4.0? or better to wait until solr
 4.1?!
 
 
 -Ursprüngliche Nachricht-
 Von: ysee...@gmail.com [mailto:ysee...@gmail.com] Im Auftrag von Yonik
 Seeley
 Gesendet: Sonntag, 13. Januar 2013 20.24
 An: solr-user@lucene.apache.org
 Betreff: Re: SolrJ | Atomic Updates | How works exactly?
 
 On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement uwe.clem...@exxcellent.de
 wrote:
 What is the best the most performant way to update a large document?
 
 That *is* the best way to update a large document that we currently have.
 Although it re-indexes under the covers, it ensures that it's atomic, and
 it's faster because it does everything in a single request.
 
 -Yonik
 http://lucidworks.com



RSS tutorial that comes with the apache-solr not indexing

2013-01-13 Thread bibhor
Hi
I am trying to use the RSS tutorial that comes with the apache-solr.
I am not sure if I missed anything but when I do full-import no indexing
happens.
These are the steps that I am taking:

1) Download apache-solr-3.6.2 (http://lucene.apache.org/solr/)
2) Start the solr by doing: java -Dsolr.solr.home=./example-DIH/solr/ -jar
start.jar
3) Goto url:
http://192.168.1.12:8983/solr/rss/dataimport?command=full-import
4) When I do this it says: Indexing completed. Added/Updated: 0 documents.
Deleted 0 documents.

Now I know that the default example is getting the RSS from:
http://rss.slashdot.org/Slashdot/slashdot
This default example is empty when I view it in chrome. It does have XML
data in the source but I am not sure if this has anything to do with the
import failure.
 
I also modified the rss-config so that I can test other RSS sources. I used
http://www.feedforall.com/sample.xml and updated the rss-config.xml but this
did the same and did not Add/Update any documents.
Any help is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/RSS-tutorial-that-comes-with-the-apache-solr-not-indexing-tp4033067.html
Sent from the Solr - User mailing list archive at Nabble.com.