Re: Overall large size in Solr across collections

2016-04-19 Thread Zheng Lin Edwin Yeo
Thanks for the information Shawn. I believe it could be due to the types of file that is being indexed. Currently, I'm indexing the EML files which are in HTML format, and they are more rich in content (with in line images and full text), while previously the EML files are in Plain Text format,

Re: Storing different collection on different hard disk

2016-04-19 Thread Zheng Lin Edwin Yeo
Thanks for your info. I tried to set, but Solr is not able to find the indexes, and I get the following error: - *collection1:* org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: java.io.IOException: The filename, directory name, or volume label syntax is

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-19 Thread David Smiley
Hi Anton, Perhaps you should request a more detailed / high-res heatmap, and then work with that, perhaps using some clustering technique? I confess I don't work on the UI end of things these days. p.s. I'm on vacation this week; so I don't respond quickly ~ David On Thu, Apr 7, 2016 at 3:43

how to restrict phrase to appear in same child document

2016-04-19 Thread Yangrui Guo
hello I have a nested document type in my index. Here's the structure of my document: { id: { car: color: } { driver: color: } } However, when I use the query q={!parent which="content_type:parent"}+(black AND driver)={!parent which="content_type:parent"}+(white AND mercedes),

RE: Streaming with facets

2016-04-19 Thread Davis, Daniel (NIH/NLM) [C]
Thanks, Yonik, that makes great sense. My understanding of "many parts of Solr can already stream" is that not all sets of SearchHandler parameters are equal. One set of SearchHandler parameters can be best for classic <1 second web search, one set of SearchHandler parameters may be best for

Re: Streaming with facets

2016-04-19 Thread Yonik Seeley
Part of the difficulty is that "stream" and "streaming" are rather overloaded terms. Many parts of Solr can already stream, with varying degrees of how much state is aggregated / internally collected before "streaming" starts. Faceting can be truly streamed *if* the sort order is by the bucket

Streaming with facets

2016-04-19 Thread Davis, Daniel (NIH/NLM) [C]
So, can someone clarify how faceting works with streaming expressions? I can see how document search can return documents as it finds them, using any particular ordering desired - just a parse tree of query operators with priority queues (or something more complicated) within each query

Re: Return only parent on child query match (w/o block-join)

2016-04-19 Thread Susmit Shukla
Hi Shamik, you could try solr grouping using group.query construct. you could discard the child match from the result i.e. any doc that has parent_doc_id field and use join to fetch the parent record q=*:*=true=title:title2={!join from=parent_doc_id to=doc_id}parent_doc_id:*=10 Thanks, Susmit

Re: Indexing 700 docs per second

2016-04-19 Thread Jeff Wartes
I have no numbers to back this up, but I’d expect Atomic Updates to be slightly slower than a full update, since the atomic approach has to retrieve the fields you didn't specify before it can write the new (updated) document. On 4/19/16, 11:54 AM, "Tim Robertson"

Re: Indexing 700 docs per second

2016-04-19 Thread Tim Robertson
Hi Mark, We were putting in and updating docs of around 20-25 indexed fields (mainly INTs, but some Strings and multivalue fields) at >1000/sec on far lesser hardware and a total of 600 million docs (batch updates of course) while also serving live queries for a website which had about 30

Re: Overall large size in Solr across collections

2016-04-19 Thread Shawn Heisey
On 4/19/2016 9:28 AM, Zheng Lin Edwin Yeo wrote: > Currently, the searching performance is still doing fine, but it is the > indexing that is slowing down. Not sure if increasing the RAM, or changing > to a SSD hard disk will help with the indexing speed? You need to figure out exactly what is

Return only parent on child query match (w/o block-join)

2016-04-19 Thread Shamik Bandopadhyay
Hi, I have a set of documents indexed which has a pseudo parent-child relationship. Each child document had a reference to the parent document. Due to document availability complexity (and the condition of updating both parent-child documents at the time of indexing), I'm not able to use

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread John Bickerstaff
When combining a load balancer with SolrCloud, the handler definitions in solrconfig.xml should set preferLocalShards to true (which Tom mentioned) Thanks Shawn! I was wondering where to set this... Yup - my IT guy is sharp, sharp, sharp -- nice to get this confirmation from the list... On

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread John Bickerstaff
@Charlie It's easy to do and wow does it save time and database resources... I've built a Spring Boot Micro-services architecture that also registers in Zookeeper. One micro-service pulls from the original data source and pushes to Kafka. The second micro-service pulls from Kafka into SOLR.

Re: Cannot use Phrase Queries in eDisMax and filtering

2016-04-19 Thread Antoine LE FLOC'H
Thanks Erick, looking at parseFieldBoostsAndSlop() do you confirm that pf3=1& pf2=1& is not valid and it has to be a multivalued list of fields ? Thank you. On Tue, Apr 19, 2016 at 1:59 AM, Erick Erickson wrote: > bq: I cannot find either the condition on the

Re: add field requires collection reload

2016-04-19 Thread Erick Erickson
bq: In yesterday's test run I actually had only one node I think it's still the same issue. The update happens too fast for the core reload. Don't know that for sure mind you... A cheap solution would be to wait a bit before sending the update. Clumsy but maybe good enough for now? Or put in

Re: Indexing 700 docs per second

2016-04-19 Thread Erick Erickson
Make very sure you batch updates though. Here's a benchmark I ran: https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/ NOTE: it's not entirely clear that you want to put 122M docs on a single shard. Depending on the queries you'll run you may want 2 or more shards, but that

Re: Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-19 Thread Doug Turnbull
Doh! Thanks Yonik. Yes that's right. Thought I had double checked On Tue, Apr 19, 2016 at 12:24 PM Yonik Seeley wrote: > Hey Doug, > Not sure if the URL matters, but I thougt it was this one: > > >

Re: Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-19 Thread Yonik Seeley
Hey Doug, Not sure if the URL matters, but I thougt it was this one: https://blab.im/matthew-l-overstreet-solr-6-is-available-find-out-about-what-s-new -Yonik On Tue, Apr 19, 2016 at 10:37 AM, Doug Turnbull wrote: > Hey Solristas: > > We do a regular

Re: Overall large size in Solr across collections

2016-04-19 Thread Zheng Lin Edwin Yeo
Hi Shawn, Currently, the searching performance is still doing fine, but it is the indexing that is slowing down. Not sure if increasing the RAM, or changing to a SSD hard disk will help with the indexing speed? Regards, Edwin On 19 April 2016 at 21:57, Shawn Heisey wrote:

Re: Is there any JIRA changed the stored order of multivalued field?

2016-04-19 Thread forest_soup
Thanks! That's very helpful! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-any-JIRA-changed-the-stored-order-of-multivalued-field-tp4264325p4271312.html Sent from the Solr - User mailing list archive at Nabble.com.

Is there any detailed condition on which the snapshot pull recovery will occur?

2016-04-19 Thread forest_soup
We have a SolrCloud with solr v5.3.2. collection1 contains 1 shard with 2 replicas on solr nodes: solr1 and solr2 respectively. In solrconfig.xml, there are updateLog config and uploaded to ZK and effective: ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} 1000

Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-19 Thread Doug Turnbull
Hey Solristas: We do a regular podcast called Search Disco . Today we'll be discussing the recent release of Solr 6 with Solr creator, Yonik Seeley and Solr committer Erik Hatcher. *Subscrbe to participate live*

Re: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-19 Thread Shawn Heisey
On 4/19/2016 5:00 AM, Rohana Rajapakse wrote: > Found the missing CloudSolrClient ::Builder class in the master branch, and > the code goes a bit further now. Still Solr cloud is not starting up. It is > failing to register Solr servers with Zookeeper. This, combined with the earlier message

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread Shawn Heisey
On 4/18/2016 11:22 AM, John Bickerstaff wrote: > So - my IT guy makes the case that we don't really need Zookeeper / Solr > Cloud... > I'm biased in terms of using the most recent functionality, but I'm aware > that bias is not necessarily based on facts and want to do my due > diligence... > >

Re: Overall large size in Solr across collections

2016-04-19 Thread Shawn Heisey
On 4/18/2016 8:50 PM, Zheng Lin Edwin Yeo wrote: > Thanks for your explanation. > > I have set my segment size to 20GB under the TieredMergePolicy > > "maxMergeAtOnce">10 10 "maxMergedSegmentMB">20480 That just controls the maximum size of a segment. This defaults to 5GB. When segments

Re: Indexing 700 docs per second

2016-04-19 Thread Susheel Kumar
It sounds achievable with your machine configuration and i would suggest to try out atomic update. Use SolrJ with multi-threaded indexing for higher indexing rate. Thanks, Susheel On Tue, Apr 19, 2016 at 9:27 AM, Tom Evans wrote: > On Tue, Apr 19, 2016 at 10:25

Re: what is opening realtime Searcher

2016-04-19 Thread Yonik Seeley
On Mon, Apr 18, 2016 at 8:02 PM, Erick Erickson wrote: > This is about real-time get. To clarify, it's used to handle real-time get type functionality in general. It's used internally in a couple ways, not just when a user issues a "real-time get". -Yonik

Re: Indexing 700 docs per second

2016-04-19 Thread Tom Evans
On Tue, Apr 19, 2016 at 10:25 AM, Mark Robinson wrote: > Hi, > > I have a requirement to index (mainly updation) 700 docs per second. > Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around 260 > byes (6 fields out of which only 2 will undergo updation at

restore issue

2016-04-19 Thread Jan Verweij van searchXperts
Hi, Just need to check the following. Currently building a test environment with solrcloud and 3 nodes After loading some data into solr I did the following: 1. create a backup using the replication handler like http://localhost:8983/solr/ [SHARDNAME]

RE: MiniSolrCloudCluster usage in solr 7.0.0 - Got It Working!

2016-04-19 Thread Rohana Rajapakse
Thanks for Shawn Heisey and Chris Hostetter for your support. Finally I got it working. As you both pointed out, for the time being, I will start with an empty baseDir. Best, Rohana -Original Message- From: Rohana Rajapakse [mailto:rohana.rajapa...@gossinteractive.com] Sent: 19 April

Re: Storing different collection on different hard disk

2016-04-19 Thread Alexandre Rafalovitch
Have you tried setting dataDir parameter in the core.properties file? https://cwiki.apache.org/confluence/display/solr/Defining+core.properties Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 19 April 2016 at 20:43, Zheng Lin

RE: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-19 Thread Rohana Rajapakse
Found the missing CloudSolrClient ::Builder class in the master branch, and the code goes a bit further now. Still Solr cloud is not starting up. It is failing to register Solr servers with Zookeeper. Here is the stack trace: java.lang.IllegalStateException: Solr servers failed to register

Storing different collection on different hard disk

2016-04-19 Thread Zheng Lin Edwin Yeo
Hi, I would like to find out is it possible to store the indexes file of different collections in different hard disk? Like for example, I want to store the indexes of collection1 in Hard Disk 1, and the indexes of collection2 in Hard Disk 2. I am using Solr 5.4.0 Regards, Edwin

Indexing 700 docs per second

2016-04-19 Thread Mark Robinson
Hi, I have a requirement to index (mainly updation) 700 docs per second. Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around 260 byes (6 fields out of which only 2 will undergo updation at the above rate). This collection has around 122Million docs and that count is pretty much

Re: Wildcard query behavior.

2016-04-19 Thread Modassar Ather
Yes! wildcards are not analyzed. Thanks Shwan for reminding me. Thanks Erick for your response. Best, Modassar On Mon, Apr 18, 2016 at 8:53 PM, Erick Erickson wrote: > Here's a blog on the subject: > >

RE: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-19 Thread Rohana Rajapakse
After resolving another dependency, now I find that solr-solrj.jar is missing the "Builder" method in CloudSolrClient class. I have checked both solr-solrj-6.0.0 and 7.0.0. java.lang.NoClassDefFoundError: org/apache/solr/client/solrj/impl/CloudSolrClient$Builder at

Re: Can a field be an array of fields?

2016-04-19 Thread Bastien Latard - MDPI AG
Thank you Jack and Daniel, I somehow missed your answers. Yes, I already thought about the JSON possibility, but I was more concerned of having such structure in result: "docs":[ { [...] "authors_array": [ [ "given_name":["Bastien"],

Re: what is opening realtime Searcher

2016-04-19 Thread Jaroslaw Rozanski
Hi Erick, Thanks for the info. Was under impression that we have extra setting "openSearcher" to control when the searchers are being opened. >From what you saying a searcher can be opened not only as a result of hard or soft commit. What I am observe, to follow your example: T0 - everything

Re: Verifying - SOLR Cloud replaces load balancer?

2016-04-19 Thread Charlie Hull
On 18/04/2016 18:22, John Bickerstaff wrote: So - my IT guy makes the case that we don't really need Zookeeper / Solr Cloud... He may be right - we're serving static data (changes to the collection occur only 2 or 3 times a year and are minor) We probably could have 3 or 4 Solr nodes running

RE: MiniSolrCloudCluster usage in solr 7.0.0

2016-04-19 Thread Rohana Rajapakse
Tried again with an empty baseDir, and this time it's a different error. The error is thrown during the execution of the line: msc = new MiniSolrCloudCluster(2, Paths.get("testcluster"), jettyConfig); Here is the full stack trace: org.apache.solr.common.SolrException:

Denormalization and data retrieval

2016-04-19 Thread Bastien Latard - MDPI AG
Hi, What's the correct way to create index(es) using denormalization? 1. something like that? OR even: 2. OR a different index for each SQL table? -> if yes, how can I then retrieve all the needed data (i.e.: intersection)?...JOIN/Streaming exp.? I have more than