Re: field title_ngram was indexed without position data; cannot run PhraseQuery

2013-10-15 Thread Jason Hellman
If you consider what n-grams do this should make sense to you. Consider the following piece of data: White iPod If the field is fed through a bigram filter (n-gram with size of 2) the resulting token stream would appear as such: wh hi it te ip po od The usual use of n-grams is to match

Re: Concurent indexing

2013-10-14 Thread Jason Hellman
The limitations on how many threads you can use to load data is primarily driven by factors on your hardware: CPU, heap usage, I/O, and the like. It is common for most index load processes to be able to handle more incoming data on the Solr side of the equation than can typically be loaded

Re: Update existing documents when using ExtractingRequestHandler?

2013-10-10 Thread Jason Hellman
As an endorsement of Erick's like, the primary benefit I see to processing through your own code is better error-, exception-, and logging-handling which is trivial for you to write. Consider that your code could reside on any server, either receiving through a PUSH or PULLing the data from

Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-10 Thread Jason Hellman
The best use case I see for atomic updates typically involves avoid transmission of large documents for small field updates. If you are updating a readCount field of a PDF document that is 1MB in size you will avoid resending the 1MB PDF document's data in order to increment the readCount

Re: Solr auto suggestion not working

2013-10-10 Thread Jason Hellman
Very specifically, what is the field definition that is being used for the suggestions? On Oct 10, 2013, at 5:49 AM, Furkan KAMACI furkankam...@gmail.com wrote: What is your configuration for auto suggestion? 2013/10/10 ar...@skillnetinc.com ar...@skillnetinc.com Hi, We are

Re: How to achieve distributed spelling check in SolrCloud ?

2013-10-08 Thread Jason Hellman
The shards.qt parameter is the easiest one to forget, with the most dramatic of consequences! On Oct 8, 2013, at 11:10 AM, shamik sham...@gmail.com wrote: James, Thanks for your reply. The shards.qt did the trick. I read the documentation earlier but was not clear on the implementation,

Re: Delete a field - Atomic updates (SOLR 4.1.0) without using null=true

2013-10-07 Thread Jason Hellman
I don't know if there's a way to accomplish your goal directly, but as a pure workaround, you can write a routine to fetch all the stored values and resubmit the document without the field in question. This is what atomic updates do, minus the overhead of the transmission. On Oct 7, 2013, at

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jason Hellman
fq=here:there OR this:that For the lurker: an AND should be: fq=here:therefq=this:that While you can, technically, pass: fq=here:there AND this:that Solr will cache the separate fq= parameters and reuse them in any context. The AND(ed) filter will be cached as a single

Re: Some text not indexed in solr4.4

2013-09-17 Thread Jason Hellman
Utkarsh, Check to see if the value is actually indexed into the field by using the Terms request handler: http://localhost:8983/solr/terms?terms.fl=textterms.prefix=d (adjust the prefix to whatever you're looking for) This should get you going in the right direction. Jason On Sep 17, 2013

Re: JSON update request handler commitWithin

2013-09-05 Thread Jason Hellman
They have modified the mechanisms for committing documents…Solr in DSE is not stock Solr...so you are likely encountering a boundary where stock Solr behavior is not fully supported. I would definitely reach out to them to find out if they support the request. On Sep 5, 2013, at 8:27 AM, Ryan,

Re: data/index naming format

2013-09-05 Thread Jason Hellman
The circumstance I've most typically seen the index.timestamp show up is when an update is sent to a slave server. The replication then appears to preserve the updated slave index in a separate folder while still respecting the correct data from the master. On Sep 5, 2013, at 8:03 PM, Shawn

Re: SolrCloud Set up

2013-08-30 Thread Jason Hellman
One additional thought here: from a paranoid risk-management perspective it's not a good idea to have two critical services dependent upon a single point of failure if the hardware fails. Obviously risk-management is suited to taste, so you may feel the cost/benefit does not merit the

Re: Indexing hangs when more than 1 server in a cluster

2013-08-14 Thread Jason Hellman
(with openSearcher=true) will work just fine (YMMV). Jason On Aug 14, 2013, at 4:51 AM, Erick Erickson erickerick...@gmail.com wrote: right, SOLR-5081 is possible but somewhat unlikely given the fact that you actually don't have very many nodes in your cluster. soft commits aren't relevant

Re: Facet field display name

2013-08-13 Thread Jason Hellman
It's been my experience that using they convenient feature to change the output key still doesn't save you from having to map it back to the field name underlying it in order to trigger the filter query. With that in mind it just makes more sense to me to leave the effort in the View portion

Re: Indexing hangs when more than 1 server in a cluster

2013-08-13 Thread Jason Hellman
While I don't have a past history of this issue to use as reference, if I were in your shoes I would consider trying your updates with softCommit disabled. My suspicion is you're experiencing some issue with the transaction logging and how it's managed when your hard commit occurs. If you can

Re: Spelling suggestions.

2013-08-09 Thread Jason Hellman
The majority of the behavior outlined in that wiki page should work quite sufficiently for 3.5.0. Note that there are only a few items that are marked Solr4.0 only (DirectSolrSpellChecker and WordBreakSolrSpellChecker, for example). On Aug 9, 2013, at 6:26 AM, Kamaljeet Kaur

Re: Phrase query with prefix query

2013-08-02 Thread Jason Hellman
Or shingles, presuming you want to tokenize and output unigrams. On Aug 2, 2013, at 11:33 AM, Walter Underwood wun...@wunderwood.org wrote: Search against a field using edge N-grams. --wunder On Aug 2, 2013, at 11:16 AM, T. Kuro Kurosaka wrote: Is there a query parser that supports a

Re: restricting a query by a set of field values

2013-07-29 Thread Jason Hellman
the result set you desire. Please beware that a very large boolean set (your IN(…) parameter) may be expensive to run. Jason On Jul 29, 2013, at 7:33 AM, Benjamin Ryan benjamin.r...@manchester.ac.uk wrote: Hi, Is it possible to construct a query in SOLR to perform a query

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Jason Hellman
Nitin, You need to ensure the fields you wish to see are marked stored=true in your schema.xml file, and you should include fields in your fl= parameter (fl=*,score is a good place to start). Jason On Jul 29, 2013, at 8:08 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote: Hi, I am using Solr

Re: solr - set fileds as default search field

2013-07-29 Thread Jason Hellman
Or use the copyField technique to a single searchable field and set df= to that field. The example schema does this with the field called text. On Jul 29, 2013, at 8:35 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, df is a single valued parameter. Only one field can be a default field.

Re: solr 4.3, autocommit, maxdocs

2013-07-15 Thread Jason Hellman
can either change the value to true, or alternatively call a deterministic commit call at the end of your load (a solr/update?commit=true will default to openSearcher=true). Hope that's of use! Jason On Jul 15, 2013, at 9:52 AM, Jonathan Rochkind rochk...@jhu.edu wrote: I have a solr 4.3

Re: Commit different database rows to solr with same id value?

2013-07-11 Thread Jason Huang
cool. so far I've been using the default collection 1 only. thanks, Jason On Thu, Jul 11, 2013 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote: Just use the address in the url. You don't have to use the core name if the defaults are set, which is usually collection1. So it's

Commit different database rows to solr with same id value?

2013-07-10 Thread Jason Huang
primary key still exist? We don't want to have to always change the primary key format to ensure a uniqueness of the primary key among all different types of database tables. thanks! Jason

Re: Commit different database rows to solr with same id value?

2013-07-10 Thread Jason Huang
want to commit the data from table2 to a new core? Anyone knows how I can do that? thanks, Jason On Wed, Jul 10, 2013 at 11:18 AM, David Quarterman da...@corexe.com wrote: Hi Jason, Assuming you're using DIH, why not build a new, unique id within the query to use as the 'doc_id' for SOLR

Re: Using the Schema API from SolrJ

2013-07-06 Thread Jason Hellman
you can also call the file admin request handler: http://localhost:8983/solr/admin/file?file=schema.xml …and parse the whole stinking thing :) Jason On Jul 6, 2013, at 1:59 PM, Steven Glass steven.gl...@zekira.com wrote: Does anyone have any idea how I can access the schema version info

Re: Surprising score?

2013-07-05 Thread Jason Hellman
/solr/4_3_1/solr-core/org/apache/solr/search/similarities/SweetSpotSimilarityFactory.html Jason On Jul 5, 2013, at 5:59 AM, pravesh suyalprav...@yahoo.com wrote: Is there a way to omitNorms and still be able to use {!boost b=boost} ? OR you could let /omitNorms=false/ as usual and have your

Re: 2.1billion+ document

2013-07-05 Thread Jason Hellman
to isolate domain data to single shards so as to allow isolated queries against dedicated data models in single shards. But if you just want to basics, it really is as easy as describe above. Jason On Jul 5, 2013, at 7:36 PM, Ali, Saqib docbook@gmail.com wrote: Hello Otis, I was thinking

Re: how to replicate Solr Cloud

2013-06-25 Thread Jason Hellman
that approach? Jason On Jun 25, 2013, at 10:07 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
Vinay, What autoCommit settings do you have for your indexing process? Jason On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote: Here is the ulimit -a output: core file size (blocks, -c) 0 data seg size(kbytes, -d) unlimited scheduling priority

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
the commit occurs at either breakpoint. 30 seconds is plenty of time for 5 parallel processes of 20 document submissions to push you over the edge. Jason On Jun 24, 2013, at 2:21 PM, Vinay Pothnis poth...@gmail.com wrote: I have 'softAutoCommit' at 1 second and 'hardAutoCommit' at 30 seconds

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
but with continued dialog through channels like these there are fewer territories without good cartography :) Hope that's of use! Jason On Jun 24, 2013, at 7:12 PM, Scott Lundgren scott.lundg...@carbonblack.com wrote: Jason, Regarding your statement push you over the edge- what does

Re: Restarting SOLR will remove all cache?

2013-06-24 Thread Jason Hellman
Shalin, There's one point to test without caches, which is to establish how much value a cache actually provides. For me, this primarily means providing a benchmark by which to decide when to stop obsessing over caches. But yes, for load testing I definitely agree :) Jason On Jun 21, 2013

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Jason Hellman
by directory size (and not explicitly by the viewable files) you may very well be seeing this. Jason On Jun 16, 2013, at 4:53 AM, Erick Erickson erickerick...@gmail.com wrote: Optimzing will _temporarily_ double the index size, but it shouldn't be permanent. Is it possible that you have

Re: Filtering down terms in suggest

2013-06-12 Thread Jason Hellman
with wildcard searches, or better yet NGram (EdgeNGram) behavior to get the right suggestion data back. I would suggest an additional core to accomplish this (fed via replication) to avoid cache entry collision with your normal queries. Hope that's useful to you. Jason On Jun 12, 2013, at 7:43 AM

Re: Filtering down terms in suggest

2013-06-11 Thread Jason Hellman
(again, easily configured via wildcard patterns) and then send the suggestion query to the right field. Obviously this will get out of hand if you have too many of these...so this has limits. Jason On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal alghos...@gmail.com wrote: Hi, Trying to find a way

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Jason Hellman
Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jason Hellman
you'd need a similar construct for each. I cannot attest to performance at scale with such a construct…but just showing a way you can go about this if you feel compelled enough to do so. Jason On Jun 3, 2013, at 8:08 AM, Jack Krupansky j...@basetechnology.com wrote: No, but you can

Re: Getting tons of EofException with jetty/SolrCloud

2013-05-31 Thread Jason Hellman
Those are default, though autoSoftCommit is commented out by default. Keep in mind about the hard commit running every 15 seconds: it is not updating your searchable data (due to the openSearcher=false setting). In theory, your data should be searchable due to autoSoftCommit running every 1

Re: 2 VM setup for SOLRCLOUD?

2013-05-30 Thread Jason Hellman
-robin distribute requests to other shards once a query begins execution. But you do need an entry point externally to be defined through your load balancer. Hope this is useful! Jason On May 30, 2013, at 12:48 PM, James Dulin jdu...@crelate.com wrote: Working to setup SolrCloud in Windows Azure

Re: Nested Facets and distributed shard system.

2013-05-28 Thread Jason Hellman
You have mentioned Pivot Facets, but have you looked at the Path Hierarchy Tokenizer Factory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory This matches your use case, as best as I understand it. Jason On May 28, 2013, at 12:47 PM, vibhoreng04

Re: split document or not

2013-05-28 Thread Jason Hellman
absolutely isolated results for paragraphs, and give you a great deal of flexibility on how to query the results in cases where you do or do not need them grouped. Jason On May 28, 2013, at 3:10 PM, Hard_Club meddn...@gmail.com wrote: Thanks, Alexandre. But I need to know in which paragraph

Re: filter query by string length or word count?

2013-05-22 Thread Jason Hellman
Sam, I would highly suggest counting the words in your external pipeline and sending that value in as a specific field. It can then be queried quite simply with a: wordcount:{80 TO *] (Note the { next to 80, excluding the value of 80) Jason On May 22, 2013, at 11:37 AM, Sam Lee skyn

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jason Hellman
And use the /terms request handler to view what is present in the field: /solr/terms?terms.fl=text_esterms.prefix=a You're looking to ensure the index does, in fact, have the accented characters present. It's just a sanity check, but could possibly save you a little (sanity, that is). Jason

Re: multiple cache for same field

2013-05-20 Thread Jason Hellman
Most definitely not the number of unique elements in each segment. My 32 document sample index (built from the default example docs data) has the following: entry#0: 'StandardDirectoryReader(​segments_b:29 _8(​4.2.1):C32)'='manu_exact',class

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-18 Thread Jason Hellman
Rishi, Fantastic! Thank you so very much for sharing the details. Jason On May 17, 2013, at 12:29 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team

Re: Deleting an entry from a collection when they key has : in it

2013-05-17 Thread Jason Hellman
The first rule of Solr without Unique Key is that we don't talk about Solr without a Unique Key. The second rule... On May 16, 2013, at 8:47 PM, Jack Krupansky j...@basetechnology.com wrote: Technically, core Solr does not require a unique key. A lot of features in Solr do require unique

Re: Aggregate word counts over a subset of documents

2013-05-16 Thread Jason Hellman
category…and, of course, the entire set of documents considered for these facets is constrained by the current query. I think this maps to your requirement. Jason On May 16, 2013, at 12:29 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: Is there a way to get aggregate word counts

RE: Looking to see if solrj 3.5 could be used with solr server 4.2.1

2013-05-14 Thread Jason M. Hellman
Peter, Thanks for taking the time to spell out what you were going through. It's great to have details like to to mull over. Jason On 2013-05-14 12:44, Lee, Peter wrote: Thank you one and all for your input. The problem we were tripping over turned out NOT to be related to using

Re: Solr - Best Java Combination for performance?

2013-05-11 Thread Jason Hellman
I have run across plenty of implementations using just about every common servlet container on the market, and haven't run across any common problems to dissuade you against any one of them. On the JVM front most people seem to use Oracle because of it ubiquity. But I have also run across a

Re: Negative Boosting at Recent Versions of Solr?

2013-05-10 Thread Jason Hellman
You learned the gosh-darndest things: http://localhost:8983/solr/browse?q=ipodboost=product(price,-2)debugQuery=on …nets: -0.3797992 = (MATCH) sum of: 0.13510442 = (MATCH) max of: 0.045963455 = (MATCH) weight(text:ipod^0.5 in 4) [DefaultSimilarity], result of: 0.045963455 =

Re: Looking for Best Practice of Spellchecker

2013-05-10 Thread Jason Hellman
save a lot of headache and time. Jason On May 10, 2013, at 7:32 AM, Dyer, James james.d...@ingramcontent.com wrote: Nicholas, It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki. Read through this section: http://wiki.apache.org/solr

Re: Sharing index data between two Solr instances

2013-05-10 Thread Jason Hellman
of queries. You may also want to consider having a Master/Slave relationship via replication for higher availability. it is trivial to set up and works like a charm. Jason On May 10, 2013, at 8:14 AM, milen.ti...@materna.de wrote: Hello together! I've been googleing on this topic

Re: Sharing index data between two Solr instances

2013-05-10 Thread Jason Hellman
SolrCloud) configuration. You have a lot of options! But the replication master/slave behavior is rock solid and does nearly everything you seek. Jason On May 10, 2013, at 8:40 AM, milen.ti...@materna.de wrote: Hello Jason, Thanks for Your quick response! The alternative of using the Solr

Re: SOLR guidance required

2013-05-10 Thread Jason Hellman
One more tip on the use of filter queries. DO: fq=name1:value1fq=name2:value2fq=namen:valuen DON'T: fq=name1:value1 AND name2:value2 AND name3:value3 Where OR operators apply, this does not matter. But your Solr cache will be much more savvy with the first construct. Jason On May 10, 2013

Re: Does Distributed Search are Cached Only the By Node That Runs Query?

2013-05-10 Thread Jason Hellman
And for 10,000 documents across n shards, that can be significant! On May 10, 2013, at 11:43 AM, Joel Bernstein joels...@gmail.com wrote: How many shards are in your collection? The query aggregator node will pull pack that results from each shard and hold the results in memory. Then it will

Re: Use case for storing positions and offsets in index?

2013-05-09 Thread Jason Hellman
Consider further that term vector data and highlighting becomes very useful if you highlight externally to Solr. That is to say, you have the data stored externally and wish to re-parse positions of terms (especially synonyms) from source material. This is a (not too uncommon) technique used

Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
the group.offset parameter. This will shift the position in the returned array of documents to the value provided. Thus: group.limit=1group.field=companyidgroup.offset=1 …would return the second item in each companyid group matching your current query. Jason On May 9, 2013, at 10:30 AM, Luis

Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
From: http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0 Slf4j/logging jars are no longer included in the Solr webapp. All logging jars are now in example/lib/ext. Changing logging impls is now as easy as updating the jars in this folder with those

Re: More Like This and Caching

2013-05-09 Thread Jason Hellman
Purely from empirical observation, both the DocumentCache and QueryResultCache are being populated and reused in reloads of a simple MLT search. You can see in the cache inserts how much extra-curricular activity is happening to populate the MLT data by how many inserts and lookups occur on

Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
If you nab the jars in example/lib/ext and place them within the appropriate folder in Tomcat (and this will somewhat depend on which version of Tomcat you are using…let's presume tomcat/lib as a brute-force approach) you should be back in business. On May 9, 2013, at 11:41 AM, richardg

Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
lcguerreroc...@gmail.com wrote: Thank you for the prompt reply jason. The group.offset parameter is working for me, now I can iterate through all items for each company. The problem I'm having right now is pagination. Is there a way how this can be implemented out of the box with solr? Before

Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-04 Thread Jason Hellman
I have to imagine I'm quibbling with the original assertion that Solr 4.x is architected with a dependency on Zookeeper when I say the following: Solr 4.x is not architected with a dependency on Zookeeper. SolrCloud, however, is. As such, if a line of reasoning drives greater concern about

Book text with chapter line number

2013-04-23 Thread Jason Funk
Hello. I'm trying to figure out if Solr is going to work for a new project that I am wanting to build. At it's heart it's a book text searching application. Each book is broken into chapters and each chapter is broken into lines. I want to be able to search these books and return relevant

Re: Book text with chapter line number

2013-04-23 Thread Jason Funk
, with fields for book, chapter, page, and line number. -- Jack Krupansky -Original Message- From: Jason Funk Sent: Tuesday, April 23, 2013 5:02 PM To: solr-user@lucene.apache.org Subject: Book text with chapter line number Hello. I'm trying to figure out if Solr is going to work

Re: old index not cleaned up on the slave

2013-01-01 Thread Jason
Hi, Upayavira I know multiple segments are not problem. But I always optimize index on master server before replicate. So just single segment file is on master. File lists of the master server directory are below. Additionally, segments_1 and segments_2 on slave server are deleted by hand.

Re: old index not cleaned up on the slave

2012-12-30 Thread Jason
Hi, Erick I didn't configure anything for index backup. My ReplicationHandler configuration is below. Other setting in solrconfig.xml is almost default. Is there a deletion policy for replication? I know maxNumberOfBackups parameter, but this is for master server. Are there any configuration for

old index not cleaned up on the slave

2012-12-27 Thread Jason
Hi, I'm using master/slave replication on Solr 4.0. Replication is successfully run. But old index not cleaned up. Is that bug or not? My slave index directory is below... $ ls -l solr_kr/krg01/data/index/ total 23472512 -rw-r--r--. 1 tomcat tomcat563722625 Dec 24 21:48 _15.fdt -rw-r--r--.

how to assign dedicated server for indexing and add more shard in SolrCloud

2012-12-04 Thread Jason
I'm using master and slave server for scaling. Master is dedicated for indexing and slave is for searching. Now, I'm planning to move SolrCloud. It has leader and replicas. Leader acts like master and replicas acts like slave. Is it right? so, I'm wondering two things. First, How can I assign

IOException occured when talking to server

2012-10-21 Thread Jason
Hi, I'm encountering below error repeatedly when trying out distributed search. At that time, every server was not stalled. Has anyone know what the problem is? 2012-10-18 09:09:54,813 [http-8080-exec-8819] ERROR org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:

khugepaged runnging and eating 100% cpu.

2012-10-18 Thread Jason
the khugepaged is and why it's eating 100% cpu and when it's run. please someone explain to me. Thanks, Jason -- View this message in context: http://lucene.472066.n3.nabble.com/khugepaged-runnging-and-eating-100-cpu-tp4014635.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: long query response time in shards search

2012-10-08 Thread Jason
Hi, We're using Solr 4.0 and servicing patent search. Patent search intends to very complex queries including wildcard. I think Ngram or EdgeNgram filter is alternative. But every terms included a query don't have wildcard. So we can't use that filter. If I make empty core and use in main core

long query response time in shards search

2012-10-07 Thread Jason
We're running 10 solr cores(c00,c01,...,c09) in a box and querying like http://x.x.x.x/c00/select?q=testshards=c00,c01,..,c09 This means all of the result are merged in core c00. Is this not good use in shards search? When we analyze log file, query response time in core c00 is often too long. How

Re: long query response time in shards search

2012-10-07 Thread Jason
Hi, Otis Thanks your reply. yes, all cores are in same server. * what do you consider too long? just id(key) query response takes too long. almost id(key) query response takes under 10ms. example - 2012-10-05 16:38:32,078 [http-8080-exec-3979] INFO

SolrCloud AutoSharding?

2012-10-04 Thread Jason Huang
is too big)? thanks! Jason

Re: SolrCloud AutoSharding?

2012-10-04 Thread Jason Huang
! Jason On Thu, Oct 4, 2012 at 1:36 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: SolrCloud doesn't auto-shard at this point. It doesn't split indexes either (there is an open issue for this: https://issues.apache.org/jira/browse/SOLR-3755 ) At this point you need to specify

Re: SolrCloud AutoSharding?

2012-10-04 Thread Jason Huang
Thanks Otis. This starts to make more sense to me. I will go through the links in your signature and dig into it. Still learning but this is a good direction. thanks! Jason On Thu, Oct 4, 2012 at 2:55 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You could start with one node

Re: Count disctint groups in grouping distributed

2012-09-12 Thread Jason Rutherglen
Distinct in a distributed environment would require de-duplication en-masse, use Hive or MapReduce instead. On Wed, Sep 12, 2012 at 11:53 AM, yriveiro yago.rive...@gmail.com wrote: Hi, Exists the possibility of do a distinct group count in a grouping done using a sharding schema? This issue

Re: Connect to SOLR over socket file

2012-08-10 Thread Jason Axelson
or something like metasearch (I'm using Ruby on Rails). Jason On Thu, Aug 9, 2012 at 5:49 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it possible to connect to SOLR over a socket file as is possible : with mysql? I've looked around and I get the feeling that I may be : mi-understanding

Re: Connect to SOLR over socket file

2012-08-09 Thread Jason Axelson
). Jason On Tue, Aug 7, 2012 at 9:14 PM, Michael Kuhlmann k...@solarier.de wrote: On 07.08.2012 21:43, Jason Axelson wrote: Hi, Is it possible to connect to SOLR over a socket file as is possible with mysql? I've looked around and I get the feeling that I may be mi-understanding part

Connect to SOLR over socket file

2012-08-07 Thread Jason Axelson
Hi, Is it possible to connect to SOLR over a socket file as is possible with mysql? I've looked around and I get the feeling that I may be mi-understanding part of SOLR's architecture. Any pointers are welcome. Thanks, Jason

java.net.SocketException: Connection reset

2012-07-29 Thread Jason
I've got SocketException(Connection reset) frequently. This is occurred during distibuted search and logged like below in request server. At First, I thought that the reason of exception is long gc pause time of jvm. So I changed connectionTimeout of the connector in tomcat server.xml to 6ms.

Re: Re:shard connection timeout

2012-07-10 Thread Jason
Hi Hans yes, that remote server is ok. actually we got this error when remote server is executing garbage collecting and that time is over about 1 minute. remote server is very busy and memory usage is high. -- View this message in context:

Re: shard connection timeout

2012-07-10 Thread Jason
Actually we got this error when remote server is executing garbage collecting and that time is over about 1 minute. Solr server sometimes is frozen during gc and occurred connection refused error. Our gc option is -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+AggressiveOpts Response waiting is

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Hi Amit, If the caches were per-segment, then NRT would be optimal in Solr. Currently the caches are stored per-multiple-segments, meaning after each 'soft' commit, the cache(s) will be purged. On Fri, Jul 6, 2012 at 9:45 PM, Amit Nithian anith...@gmail.com wrote: Sorry I'm a bit new to the

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
, as with some other Apache licensed Lucene based search engines. On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Currently the caches are stored per-multiple-segments, meaning after

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
to per-segment? How do I do that? Thanks. From: Jason Rutherglen jason.rutherg...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, July 7, 2012 11:32 AM Subject: Re: Nrt and caching The field caches are per-segment, which are used for sorting

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
Average should be doable in Solr, maybe not today, not sure. Median is the challenge :) Try Hive. On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood wun...@wunderwood.orgwrote: It sounds like you need a database for analytics, not a search engine. Solr cannot do aggregates like that. It can

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
://LinkedIn.com/in/JeremyBranham http://jeremybranham.**wordpress.com/http://jeremybranham.wordpress.com/ http://Zeroth.biz -Original Message- From: Jason Rutherglen Sent: Saturday, July 07, 2012 2:45 PM To: solr-user@lucene.apache.org Subject: Re: Grouping and Averages Average should

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
with activity regarding adding this feature to Solr. On Sat, Jul 7, 2012 at 8:32 PM, Andy angelf...@yahoo.com wrote: Jason, If I just use stock Solr 4.0 without modifying the source code, does that mean multi-value faceting will be very slow when I'm constantly inserting/updating documents

Re: Search timeout for Solrcloud

2012-06-05 Thread Jason Rutherglen
There isn't a solution for killing long running queries that works. On Tue, Jun 5, 2012 at 1:34 AM, arin_g arin...@gmail.com wrote: Hi, We use solrcloud in production, and we are facing some issues with queries that take very long specially deep paging queries, these queries keep our servers

Re: Benchmark Solr vs Elastic Search vs Sensei

2012-04-27 Thread Jason Rutherglen
I think Datatax Enterprise is faster than Solr Cloud with transaction logging turned on. Cassandra has it's own fast(er) transaction logging mechanism. Of course it's best to use two HDs when testing, eg, one for the data, the other for the transaction log. On Fri, Apr 27, 2012 at 12:58 PM,

embedded solr populating field of type LatLonType

2012-04-24 Thread Jason Cunning
Hi, I have a question concerning the spatial field type LatLonType and populating it via an embedded solr server in java. So far I've only ever had to index simple types like boolean, float, and string. This is the first complex type. So I'd like to use the following field definition for

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
indices, nodes and aliases on the fly I think there is a way how to handle growing data set with ease. If anyone is interested such scenario has been discussed in detail in ES mail list. Regards, Lukas On Tue, Apr 17, 2012 at 2:42 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: One

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen
not want to run into system X vs system Y flame here...) Regards, Lukas On Wed, Apr 18, 2012 at 2:22 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'm curious how on the fly updates are handled as a new shard is added to an alias.  Eg, how does the system know to which shard

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-17 Thread Jason Rutherglen
rearranging the hash 'ring' both logically and physically. In addition, there is the potential for data loss which Cassandra has the technology for. On Tue, Apr 17, 2012 at 1:33 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I think Jason is right - there is no index splitting in ES and SolrCloud

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-16 Thread Jason Rutherglen
One of big weaknesses of Solr Cloud (and ES?) is the lack of the ability to redistribute shards across servers. Meaning, as a single shard grows too large, splitting the shard, while live updates. How do you plan on elastically adding more servers without this feature? Cassandra and HBase

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-15 Thread Jason Rutherglen
This was done in SOLR-1301 going on several years ago now. On Sat, Apr 14, 2012 at 4:11 PM, Lance Norskog goks...@gmail.com wrote: It sounds like you really want the final map/reduce phase to put Solr index files into HDFS. Solr has a feature to do this called 'Embedded Solr'. This packages

Re: Frequent garbage collections after a day of operation

2012-02-16 Thread Jason Rutherglen
One thing that could fit the pattern you describe would be Solr caches filling up and getting you too close to your JVM or memory limit This [uncommitted] issue would solve that problem by allowing the GC to collect caches that become too large, though in practice, the cache setting would need

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Jason Rutherglen
). If you are fine with that, then your statements are contradictory. On Thu, Jan 19, 2012 at 12:31 PM, Steven A Rowe sar...@syr.edu wrote: Jason, If I understand you correctly, you're referring to a thread http://search-lucene.com/m/iMCFOqzcmS1/%22Performance+Monitoring+SaaS+for+Solr%22/v

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-18 Thread Jason Rutherglen
Steven, If you are going to admonish people for advertising, it should be equally dished out or not at all. On Wed, Jan 18, 2012 at 6:38 PM, Steven A Rowe sar...@syr.edu wrote: Hi Peter, Commercial solicitations are taboo here, except in the context of a request for help that is directly

<    1   2   3   4   5   6   7   8   >