SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello everyone, I´ve tested atomic updates via Ajax calls and now I´m starting with atomic updates via SolrJ... but the way I´m proceeding doesn´t seem to work well. Here is the snippet: *SolrInputDocument do = ne SolrInputDocument();* *doc.addField(id, myId);* * * *MapString, ListString

Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread jmlucjav
If you are using DIH, is just doing (for a mysql project I have around for example) something like this: CONCAT(lat, ',',lon) as latlon -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-schema-xml-and-data-config-xml-tp4020376p4020437.html Sent

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Thread update: When I use a simple: *Map operation = new HashMap();* Instead of: *MapString, ListString operation = new HashMapString, ListString();* The result looks better, but it´s still wrong: fieldName: [ [Value1, Value2] ], However, ListString value is received as a simple String

RE: Solr 4.0 indexing performance

2012-11-15 Thread Markus Jelsma
Hi - you're likely seeing a drop in performance because of durability which is enabled by default via a transaction log. When disabled 4.0 is iirc slightly faster than 3.x. -Original message- From:Nils Weinander nils.weinan...@gmail.com Sent: Thu 15-Nov-2012 10:35 To:

Re: Solr 4.0 indexing performance

2012-11-15 Thread Nils Weinander
Ah, thanks Markus! That's a good thing. I tried disabling the transaction log, the difference performance is marginal. So, I'll stick with the transaction logging. On Thu, Nov 15, 2012 at 11:02 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi - you're likely seeing a drop in performance

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda luisca...@gmail.comwrote: Thread update: When I use a simple: *Map operation = new HashMap();* Instead of: *MapString, ListString operation = new HashMapString, ListString();* The result looks better, but it´s still wrong:

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hello, Sami. It will be the first issue that I open so, should I create it under Solr 4.0 version or in Solr 4.1.0 one? Thanks, - Luis Cappa. 2012/11/15 Sami Siren ssi...@gmail.com On Thu, Nov 15, 2012 at 11:51 AM, Luis Cappa Banda luisca...@gmail.com wrote: Thread update: When I use

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Ok, done: https://issues.apache.org/jira/browse/SOLR-4080 Regards, - Luis Cappa. 2012/11/15 Luis Cappa Banda luisca...@gmail.com Hello, Sami. It will be the first issue that I open so, should I create it under Solr 4.0 version or in Solr 4.1.0 one? Thanks, - Luis Cappa. 2012/11/15

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Actually it seems that xml/binary request writers only behave differently when using array[] as the value. if I use ArrayList it also works with the xml format (4.1 branch). Still it's annoying that the two request writers behave differently so I guess it's worth adding the jira anyway. The

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
I´ll have a look to Solr source code and try to fix the bug. If I succeed I´ll update JIRA issue with it, :-) 2012/11/15 Sami Siren ssi...@gmail.com Actually it seems that xml/binary request writers only behave differently when using array[] as the value. if I use ArrayList it also works with

Re: Faceting Question

2012-11-15 Thread Alexey Serba
Seems like pivot faceting is what you looking for ( http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting ) Note: it currently does not work in distributed mode - see https://issues.apache.org/jira/browse/SOLR-2894 On Thu, Nov 15, 2012 at 7:46 AM, Jamie Johnson

Re: Solr defining Schema structure trouble.

2012-11-15 Thread denl0
Yes this is what I'm trying to do. But stuff related to the document like language/title/...(i got way more fields) are stored many times. Each page has a part of data that's the same is it possible to seperate that data? -- View this message in context:

Re: consistency in SolrCloud replication

2012-11-15 Thread Bill Au
Thanks for the info, Mark. By a request won't return until it's affected all replicas, are you referring to the update request or the query? Bill On Wed, Nov 14, 2012 at 7:57 PM, Mark Miller markrmil...@gmail.com wrote: It's included as soon as it has been indexed - though a request won't

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Hi, Sami. Doing some tests I´ve used the same code as you and did a quick execution: *HttpSolrServer server = new HttpSolrServer( http://localhost:8080/solrserver/core1http://localhost:10080/newscover_es/items_es );* * * * try {* * * * HashMap editTags = new HashMap();* * editTags.put(set,

Re: SolrJ: atomic updates.

2012-11-15 Thread Sami Siren
Try setting Request writer to binary like this: server.setParser(new BinaryResponseParser()); server.setRequestWriter(new BinaryRequestWriter()); Or then instead of string array use ArrayListString() that contains your strings as the value for the map On Thu, Nov 15, 2012 at 3:58

Re: SolrJ: atomic updates.

2012-11-15 Thread Luis Cappa Banda
Uhm, after setting both Response and Request Writers it worked OK with * HttpSolrServer*. I´ve tried to find a way to set BinaryResponseParser and BinaryRequestWriter with *CloudServer *(or even via *LbHttpSolrServer*) but I found nothing. Suggestions? :-/ - Luis Cappa. 2012/11/15 Sami Siren

RE: Error loading class solr.CJKBigramFilterFactory

2012-11-15 Thread Frederico Azeiteiro
:) Just installed 3.6.1 and its working just fine. Something should be wrong with my tomcat/solr install. Thank you Robert. //Frederico   -Mensagem original- De: Robert Muir [mailto:rcm...@gmail.com] Enviada: quarta-feira, 14 de Novembro de 2012 19:18 Para:

Re: Solr 4.0 Spatial Search schema.xml and data-config.xml

2012-11-15 Thread David Smiley (@MITRE.org)
The particular JavaScript I referred to is this: function processAdd(cmd) { doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument lat = doc.getFieldValue(LATITUDE); lon = doc.getFieldValue(LONGITUDE); if (lat != null lon != null) doc.setField(latLon, lat+,+lon); }

Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
I'm talking about an update request. So if you make an update, when it returns, your next search will see the update, because it will be on all replicas. Another process that is searching rapidly may see an eventually consistent view though (very briefly). We have some ideas to make that view more

RE: DIH nested entities don't work

2012-11-15 Thread mroosendaal
Hi James, Just gave it a go and it worked! That's the good news. The problem now is getting it to work faster. It took over 2 hours just to index 4 views and i need to get information from 26. I tried adding the defaultRowPrefetch=2 as a jdbc parameter but it does not seem to honour that. It

CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Hello, I´ve found what It seems to be a bug JIRA-SOLR4080https://issues.apache.org/jira/browse/SOLR-4080?focusedCommentId=13498055page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13498055 with CloudSolrServer during atomic updates via SolrJ. Thanks to Sami I

Re: Solr Indexing MAX FILE LIMIT

2012-11-15 Thread Alexandre Rafalovitch
Maybe you can start by testing this with split -l and xargs :-) These are standard Unix toolkit approaches and since you use one of them (curl) you may be happy to use others too. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn:

DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Sébastien Lorber
Hello, I don't know if this is a bug or a missing feature, nor if it was corrected in new versions of Solr (can't find any JIRA about it), so I just want to show you the problem... I can't test with Solr 4.0, I have a legacy system, not a lot of time, not a Solr expert at all and it seems just

Re: DataImportHandler in Solr 1.4 bug?

2012-11-15 Thread Andy Lester
On Nov 15, 2012, at 8:02 AM, Sébastien Lorber lorber.sebast...@gmail.com wrote: entity name=PARAM query=SELECT key_name AS KEY, string_val AS VALUE FROM BATCH_JOB_PARAMS WHERE JOB_INSTANCE_ID = ${JOB_EXEC.JOB_INSTANCE_ID} field column=VALUE name=JOB_PARAM_${PARAM.KEY} /

Re: consistency in SolrCloud replication

2012-11-15 Thread David Smiley (@MITRE.org)
Mark Miller-3 wrote I'm talking about an update request. So if you make an update, when it returns, your next search will see the update, because it will be on all replicas. I presume this is only the case if (of course) the client also sent a commit. So you're saying the commit call will not

High Slave CPU Intermittently After Replication

2012-11-15 Thread richardg
Here is our setup: Solr 4.0 Master replicates to three slaves after optimize We have a problem were every so often after replication the CPU load on the Slave servers maxes out and request come to a crawl. We do a dataimport every 10 minutes and depending on the number of updates since the

Re: Unable to run two multicore Solr instances under Tomcat

2012-11-15 Thread Erick Erickson
Thanks for wrapping this up, it's always nice to get closure, especially when it comes to googling G.. On Wed, Nov 14, 2012 at 5:34 AM, Adam Neal an...@mass.co.uk wrote: Just to wrap up this one. Previously all the lib jars were located in the war file on our setup, this was mainly to ease

Re: Solr defining Schema structure trouble.

2012-11-15 Thread Jack Krupansky
Ah... sure, you can create a schema that has several different document types in it, with extra fields that are used in some but not all documents - books have the metadata fields but no page bodies while pages have page bodies but no metadata. And maybe even do a Solr join for the block of

Re: best practicies dealing with solr collections and instances

2012-11-15 Thread Erick Erickson
Well, what does maintenance entail? Changing schema? Rebuilding the index? Many operations under the maintenance rubrik can be done with core admin handler requests, see: http://wiki.apache.org/solr/CoreAdmin But if that doesn't solve your problem, then probably running in two separate JVMs is

Re: Run multiple instances of solr using single data directory

2012-11-15 Thread Erick Erickson
I think this is rather dangerous. How would these multiple slaves coordinate replication? Would they all replicate at once? If only one was configured to replicate, how would the others know to reopen serchers? Furthermore, simply opening up more Solr instances on the same machine isn't expanding

Re: Nested Join Queries

2012-11-15 Thread Erick Erickson
Gerald: Here's the place to start: http://wiki.apache.org/solr/HowToContribute But the basic setup is 1 create a JIRA login (anyone can) 2 create a JIRA if one doesn't exist 3 generate the patch. From your root level (the one that contains solr and lucene dirs) and svn diff SOLR-###.patch wher

Re: SolrCloud: Shard resize

2012-11-15 Thread Erick Erickson
Currently you have to re-index all of your data. If you don't you'll have a situation in which the same document (by uniqueKey) exists in two shards and that document may show up twice in your results list. NOTE: by reindex all your data, you need to _delete_ all your data first. If you just add

how make a suggester?

2012-11-15 Thread iwo
Hello, I would like implement a suggester with solr, which is the best way now in your opinion? thanks in advance I. - Complicare è facile, semplificare é difficile. Complicated is easy, simple is hard. quote: http://it.wikipedia.org/wiki/Bruno_Munari -- View this message in context:

Re: consistency in SolrCloud replication

2012-11-15 Thread Mark Miller
It depends - no commit necessary for realtime get. Otherwise, yes, you would need to do at least a soft commit. That works the same way though - so if you make your update, then do a soft commit, you can be sure your next search will see the update on all the replicas. And with realtime get, of

Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Sami Siren
hi, did you try setting your values in a List, for example ArrayList it should work when you use that even without specifying reguest-/response writer. -- Sami Siren On Thu, Nov 15, 2012 at 4:56 PM, Luis Cappa Banda luisca...@gmail.comwrote: Hello, I´ve found what It seems to be a bug

Re: CloudSolrServer and LBHttpSolrServer: setting BinaryResponseParser and BinaryRequestWriter.

2012-11-15 Thread Luis Cappa Banda
Yes, my first attemp was with a ListString, but it didn´t work. Then I started to try another ways such as a String[] array with no success. Regards, - Luis Cappa. 2012/11/15 Sami Siren ssi...@gmail.com hi, did you try setting your values in a List, for example ArrayList it should work

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Oh I'm sorry, I should have read your question more clearly. I totally forgot that solr.PointType supports a configurable number of dimensions. If you need more than 2 dimensions as your example shows you do, then you'll have to resort to indexing your spatial data in another Solr core as

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Hi yun Not sure to understand your need... There is no relationship between a query string and DIH. What you want to achieve (if fetch 1 rows means select 1 rows from a table) can be done by limiting the number of rows you SQL select will return (the syntax differs from SGBD to SGBD).

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
Wasn't obvious ;). Maybe you could try local params...something like q={!q.op=OR%20rows=3}yourQueryHere Hope this helps Dom 2012/11/15 jefferyyuan yuanyun...@gmail.com Thanks for the reply. I am using SolrEntityProcessor to import data from another remote solr server - not database, so

Re: BM25 model for solr 4?

2012-11-15 Thread Tom Burton-West
Hello Floyd, There is a ton of research literature out there comparing BM25 to vector space. But you have to be careful interpreting it. BM25 originally beat the SMART vector space model in the early TRECs because it did better tf and length normalization. Pivoted Document Length

RE: DIH nested entities don't work

2012-11-15 Thread Dyer, James
Depending on how much data you're pulling back, 2 hours might be a reasonable amount of time. Of course if you had it a lot faster with Endeca Forge, I can understand your questioning this. Keep in mind that the way you're setting up, it will build each cache, 1 at a time. I'm pretty sure

Re: PointType multivalued query

2012-11-15 Thread blopez
Hi David, thanks for your reply. I've tested this datatype and the values are indexed fine (I'm using 6-dimensions points). I'm trying to retrieve results and it works only with the 2 first dimensions (X and Y), but it's not taking into account the others 4 dimensions. I've been reading the

Re: Admin Permissions

2012-11-15 Thread Michael Long
I figured out you can disable the core admin in solr.xml, but then it breaks the admin as apparently it relies on that. I tried tomcat security but haven't been able to make it work I think as this point I may just write a query/debugging app that the developers could use On 11/13/2012

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Borja, Umm, I'm quite confused with the use-case you present. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tp4020445p4020609.html Sent from the Solr -

cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
Hi, I have a question about the optimal way to distribute solr indexes across a cloud. I have a small number of collections (less than 10). And a small cluster (6 nodes), but each node has several disks - 5 of which I am using for my solr indexes. The cluster is also a hadoop cluster, so the

Re: PointType multivalued query

2012-11-15 Thread blopez
Hi, I think it's not a good idea to make Join operations between Solr cores because of the performance (we managed a lot of data). The point is that we want to store documents, each one with several information sets (let's name them Points), each one identified by 6 values (that's why I was

Re: PointType multivalued query

2012-11-15 Thread blopez
Sorry I tried to explain it too fast. Imagine the usecase that I wrote on the first post. A document can have more than one 6-Dimensions point. So my first approach was: doc field name=pk1/field field name=docId10/field field name=point2,2,2,2,2,2/field /doc doc field name=pk2/field

Re: PointType multivalued query

2012-11-15 Thread David Smiley (@MITRE.org)
Sorry, you're out of luck. SRPT could be generalized but that's a bit of work. The trickiest part I think would be writing a multi-dimensional SpatialPrefixTree impl. If the # of discrete values at each dimension is pretty small (100? ish?), then there is a way using term positions and span

Re: cores shards and disks in SolrCloud

2012-11-15 Thread Upayavira
Personally I see no benefit to have more than one JVM per node, cores can handle it. I would say that splitting a 20m index into 25 shards strikes me as serious overkill, unless you expect to expand significantly. 20m would likely be okay with two or three shards. You can store the indexes for

Re: High Slave CPU Intermittently After Replication

2012-11-15 Thread Upayavira
One question is, why optimise? The newer TieredMergePolicy, as I understand it, takes away much of the need for optimising an index. As to maxing, after a replication, your caches need warming. Watch how often you replicate, nd check on the admin UI how long it takes to warm caches. You may be

RE: cores shards and disks in SolrCloud

2012-11-15 Thread Buttler, David
The main reason to split a collection into 25 shards is to reduce the impact of the loss of a disk. I was running an older version of solr, a disk went down, and my entire collection was offline. Solr 4 offers shards.tolerant to reduce the impact of the loss of a disk: fewer documents will be

Re: zkcli issues

2012-11-15 Thread Nick Chase
Unfortunately, this doesn't seem to solve the issue; now I'm beginning to wonder if maybe it's because I'm on Windows. Has anyone successfully run ZkCLI on Windows? Nick On 11/12/2012 2:27 AM, Jeevanandam Madanagopal wrote: Nick - Sorry, embedded links are not shown in previous email.

Re: consistency in SolrCloud replication

2012-11-15 Thread Otis Gospodnetic
I think Bill was asking about search I think the Q is whether the query hitting the shard where a doc was sent for indexing would see that doc even before that doc has been copied to replicas. I didn't test it, but I'd think the answer would be positive because of the xa log. Otis --

Re: Solr 4.0 indexing performance

2012-11-15 Thread Otis Gospodnetic
But slower indexing with solr 4.0 sounds suspicious to me... you compared your configs? JVM parameters? GC? IO? CPU? Otis -- Performance Monitoring - http://sematext.com/spm On Nov 15, 2012 5:26 AM, Nils Weinander nils.weinan...@gmail.com wrote: Ah, thanks Markus! That's a good thing. I

Re: Solr 4.0 indexing performance

2012-11-15 Thread Jack Krupansky
Did you start from scratch, or did you bulk index into an existing index? There is some backcompat logic in there, which is convenient, but not necessarily the best performance. -- Jack Krupansky -Original Message- From: Nils Weinander Sent: Thursday, November 15, 2012 1:29 AM To:

Re: cores shards and disks in SolrCloud

2012-11-15 Thread Otis Gospodnetic
Hi, I think here you want to use a single JVM per server - no need for multiple JVMs, JVM per Collection and such. If you can spread data over more than 1 disk on each of your servers, great, that will help. Re data loss - yes, you really should just be using replication. Sharding a ton will

Re: Patch Needed for Issue Solr-3790

2012-11-15 Thread mechravi25
Hi Koji, Thank you for your reply..will test for the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Patch-Needed-for-Issue-Solr-3790-tp4019256p4020651.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: how make a suggester?

2012-11-15 Thread Otis Gospodnetic
Hi Iwo, This is kind of a common question. Have a look at http://search-lucene.com/?q=autocomplete+OR+suggesterfc_project=Solrfc_type=mail+_hash_+userfor lots of discussions on this topic. In short, you could use the Suggester that comes with Solr or you could do

Re: BM25 model for solr 4?

2012-11-15 Thread Floyd Wu
Thanks everyone, especially to Tom, you do give me detailed explanation about this topic. Of course in academic we do need to interpret result carefully, what I care about is from end-users point of view, using BM25 will result better ranking instead of using lucene's original VSM+Boolean model?

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Dominique Debailleux
First query is OK; it just doesn't fit your need if I understand Could you confirm that the expected result is 6 rows (3 rows w/ppt plus 3 rows/pdf) ? 2012/11/15 jefferyyuan yuanyun...@gmail.com Thanks :) local param is very useful, but seems it doesn't work here: I tried:

Re: Custom Solr indexer/searcher

2012-11-15 Thread John Whelan
Scott, I probably have no idea as to what I'm saying, but if you're looking for finding results in a N-dimensional space, you might look at creating a field of type 'point'. Point-type fields have a dimension attribute; I believe that it can be set to a large integer value. Barring that, there

Re: Is there a way to limit returned rows directly in a query string?

2012-11-15 Thread Mikhail Khludnev
Yun, Literally you can call another QParser from the middle of a query and apply local params to it via nested queries feature http://searchhub.org/2009/03/31/nested-queries-in-solr/ syntax is little bit tricky though. But calling other QParser and attempting specify number of rows for it makes

Re: Custom Solr indexer/searcher

2012-11-15 Thread Mikhail Khludnev
Scott, It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and