Re: backups of analyzingInfixSuggesterIndexDir

2016-05-13 Thread Arcadius Ahouansou
Hello Oakley. I am not familiar with the backup process either. The analyzingInfixSuggesterIndexDir not being in the backup may not be an issue. I would suggest you restore the backup on Solr and see whether it's created automatically for you. If not, there are many options like

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein
Also the hashJoin is going to read the entire entity table into memory. If that's a large index that could be using lots of memory. 25 million docs should be ok to /export from one node, as long as you have enough memory to load the docValues for the fields for sorting and exporting. Breaking

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
Thanks very much for the advice. Yes, I'm running in a very basic single shard environment. I thought that 25M docs was small enough to not require anything special but I will try scaling like you suggest and let you know what happens. Cheers, Ryan On Fri, May 13, 2016 at 4:53 PM, Joel

Re: Does anybody crawl to a database and then index from the database to Solr?

2016-05-13 Thread Erick Erickson
Clayton: I think you've done a pretty thorough investigation, I think you're spot-on. The only thing I would add is that you _will_ reindex your entire corpus multiple times. Count on it. Sometime, somewhere, somebody will say "gee, wouldn't it be nice if we could ". And to support it you'll

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein
I would try breaking down the second query to see when the problems occur. 1) Start with just a single *:* search from one of the collections. 2) Then test the innerJoin. The innerJoin won't take much memory as it's a streaming merge join. 3) Then try the full thing. If you're running a large

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
qt="/export" immediately fixed the query in Question #1. Sorry for missing that in the docs! The second query (with /export) crashes the server so I was going to look at parallelization if you think that's a good idea. It also seems unwise to joining into 26M docs so maybe I can reconfigure the

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein
A couple of other things: 1) Your innerJoin can parallelized across workers to improve performance. Take a look at the docs on the parallel function for the details. 2) It looks like you might be doing graph operations with joins. You might to take a look at the gatherNodes function coming in

Re: Streaming Expression joins not returning all results

2016-05-13 Thread Joel Bernstein
When doing things that require all the results (like joins) you need to specify the /export handler in the search function. qt="/export" The search function defaults to the /select handler which is designed to return the top N results. The /export handler always returns all results that match

[Solr 6] Migration from Solr 4.10.2

2016-05-13 Thread Alessandro Benedetti
I'm planning a migration from 4.10.2 to 6.0 . Because we generate the index on daily basis from scratch, we don't need to migrate the index but actually only migrate the server instances. With my team we were doing some experiments on some dev machines, basically comparing Solr 4.10.2 and Solr 6.0

Streaming Expression joins not returning all results

2016-05-13 Thread Ryan Cutter
Question #1: triple_type collection has a few hundred docs and triple has 25M docs. When I search for a particular subject_id in triple which I know has 14 results and do not pass in 'rows' params, it returns 0 results: innerJoin( search(triple, q=subject_id:1656521,

Re: Does anybody crawl to a database and then index from the database to Solr?

2016-05-13 Thread John Bickerstaff
I've been working on a less-complex thing along the same lines - taking all the data from our corporate database and pumping it into Kafka for long-term storage -- and the ability to "play back" all the Kafka messages any time we need to re-index. That simpler scenario has worked like a charm. I

Does anybody crawl to a database and then index from the database to Solr?

2016-05-13 Thread Pryor, Clayton J
Question: Do any of you have your crawlers write to a database rather than directly to Solr and then use a connector to index to Solr from the database? If so, have you encountered any issues with this approach? If not, why not? I have searched forums and the Solr/Lucene email archives

RE: Issue with Solr6 CDCR

2016-05-13 Thread Satvinder Singh
Also, I am using an external zookeeper ensemble with 3 nodes. Thanks [http://www.nc4worldwide.com/_signature/nc4.png] Satvinder Singh Security Systems Engineer satvinder.si...@nc4.com 703.682.6000 x276 direct 703.989.8030 cell

RE: Issue with Solr6 CDCR

2016-05-13 Thread Satvinder Singh
[http://www.nc4worldwide.com/_signature/nc4.png] Satvinder Singh Security Systems Engineer satvinder.si...@nc4.com 703.682.6000 x276 direct 703.989.8030 cell www.NC4.com

Re: More Like This on not new documents

2016-05-13 Thread Nick D
https://wiki.apache.org/solr/MoreLikeThisHandler Bottom of the page, using context streams. I believe this still works in newer versions of Solr. Although I have not tested it on a new version of Solr. But if you plan on indexing the document anyways then just indexing and then passing the ID to

Re: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-13 Thread Erick Erickson
I changed the CDCR doc, Oliver could you take a glance and see if it is clear now? All I changed was the sample solrconfig sections https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462 Thanks, Erick On Fri, May 13, 2016 at 6:23 AM, Oliver Rudolph

Re: Error

2016-05-13 Thread Erick Erickson
This is the same problem, you're simply committing too often, either soft commit or hard commit with openSearcher=true. https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ You haven't told us how you're committing, I'd guess either 1> you

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread John Bickerstaff
I should clarify: http:/XXX.XXX.XX.XX:8983/solr/yourCoreName/select q=*%3A*=0=json=true=true=category "yourCoreName" will get built in for you if you use the Solr Admin UI for queries -- On Fri, May 13, 2016 at 9:36 AM, John Bickerstaff wrote: > In case it's helpful

Re: backups of analyzingInfixSuggesterIndexDir

2016-05-13 Thread Erick Erickson
No option that I know of, but I'm not up on the details of backup, maybe someone else can chime in? I kind of doubt it though, the choice of where to put the suggest index is totally arbitrary so I'm not sure how backup/restore would know where to look. On Thu, May 12, 2016 at 8:09 AM, Oakley,

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread John Bickerstaff
In case it's helpful for a quick and dirty peek at your facets, the following URL (in a browser or Curl) will get you basic facets for a field named "category" -- assuming you change the IP address / hostname to match yours. http:/XXX.XXX.XX.XX:8983/solr/statdx_shard1_replica3/select

RE: dtSearch parser & Introduction

2016-05-13 Thread Allison, Timothy B.
>...and I've just blogged about some of the issues one can run into with this >sort of project, hope this is useful! http://www.flax.co.uk/blog/2016/05/13/old-new-query-parser/ +1 completely non-trivial task to roll your own. I'd add that incorporating multiterm analysis (analysis/normalization

RE: http request to MiniSolrCloudCluster

2016-05-13 Thread Rohana Rajapakse
I am only setting up a MiniSolrCloudCluster with 2 servers like this: JettyConfig jettyConfig = JettyConfig.builder().waitForLoadingCoresToFinish(null).setContext("/solr").build(); MiniSolrCloudCluster miniCluster = new MiniSolrCloudCluster(2,

Re: URL parameters combined with text param

2016-05-13 Thread Ahmet Arslan
Hi, In the first debug query response, special words are also queries so it is not working. Not sure edismax query parser recognizes _query_ field. But lucene query parser does. Try to switch to lucene query parser. Also if you can divide your query words into q and fq below will work:

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread Joel Bernstein
You may also want to try out the SQL interface in Solr 6.0 which supports SELECT DISTINCT queries. https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface#ParallelSQLInterface-SELECTDISTINCTQueries Joel Bernstein http://joelsolr.blogspot.com/ On Fri, May 13, 2016 at 9:47 AM, GW

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread GW
Thank you Shawn, I will toy with these over the weekend. Solr/Hadoop/Hbase has been a nasty learning curve for me, It would probably would have been a lot easier if I didn't have 30 years of RDBMS stuck in my head. Again, Many thanks for your response. On 13 May 2016 at 08:57, Shawn Heisey

Re: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-13 Thread Oliver Rudolph
Hi, I had the same problem. The documentation is kind of missleading here. You must not add a new element to your config but update the existing . All you need to do is add the class="solr.CdcrUpdateLog" element to the element inside your existing . Hope this helps! Mit freundlichen

Re: dtSearch parser & Introduction

2016-05-13 Thread Charlie Hull
On 13/05/2016 10:41, Charlie Hull wrote: On 12/05/2016 23:50, Brandon Miller wrote: Hello, all! I'm a BloombergBNA employee and need to obtain/write a dtSearch parser for solr (and probably a bunch of other things a little later). I've looked at the available parsers and thought that the

Re: Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread Shawn Heisey
On 5/13/2016 6:48 AM, GW wrote: > Let's say I have 10,000 documents and there is a field named "category" and > lets say there are 200 categories but I do not know what they are. > > My question: Is there a query/filter that can pull a list of distinct > categories? Sounds like a job for faceting

Re: http request to MiniSolrCloudCluster

2016-05-13 Thread Shawn Heisey
On 5/13/2016 2:26 AM, Rohana Rajapakse wrote: > Hmmm. I now get the following errors when trying to access my Mini cluster > over http: > > 09:13:19,611 WARN ~ Exception causing close of session 0x0 due to > java.io.IOException: Len error 1347375956 > 09:13:19,611 INFO ~ Closed socket

Is there an equivalent to an SQL "select distinct" in Solr

2016-05-13 Thread GW
Let's say I have 10,000 documents and there is a field named "category" and lets say there are 200 categories but I do not know what they are. My question: Is there a query/filter that can pull a list of distinct categories? Thanks in advance, GW

Re: Fwd: Solr Cloud 6.0.0 hangs when creating large amount of collections and node fails to recover after restart

2016-05-13 Thread Shawn Heisey
On 5/13/2016 2:19 AM, Horváth Péter Gergely wrote: > Thank you for your feedback, I much appreciate your inputs. I don't > have > strong requirements regarding structuring the data: do you think > I could use a single, relatively large collection with some > discriminator field instead of

RE: dtSearch parser & Introduction

2016-05-13 Thread Allison, Timothy B.
Depending on your needs, you might want to take a look at my SpanQueryParser (LUCENE-5205/SOLR-5410). It does not offer dtsearch syntax, but if the SurroundQueryParser was close enough, this parser may be of use. If you need modifications to it, let me know. I'm in the process of adding

Re: dtSearch parser & Introduction

2016-05-13 Thread Charlie Hull
On 12/05/2016 23:50, Brandon Miller wrote: Hello, all! I'm a BloombergBNA employee and need to obtain/write a dtSearch parser for solr (and probably a bunch of other things a little later). I've looked at the available parsers and thought that the surround parser may do the trick, but it

More Like This on not new documents

2016-05-13 Thread Vincenzo D'Amore
Hi all, anybody know if is there a chance to use the mlt component with a new document not existing in the collection? In other words, if I have a new document, should I always first add it to my collection and only then, using the mlt component, have the list of similar documents? Best

RE: http request to MiniSolrCloudCluster

2016-05-13 Thread Rohana Rajapakse
Hmmm. I now get the following errors when trying to access my Mini cluster over http: 09:13:19,611 WARN ~ Exception causing close of session 0x0 due to java.io.IOException: Len error 1347375956 09:13:19,611 INFO ~ Closed socket connection for client /127.0.0.1:23244 (no session established

Re: Fwd: Solr Cloud 6.0.0 hangs when creating large amount of collections and node fails to recover after restart

2016-05-13 Thread Horváth Péter Gergely
Hi Shawn, Thank you for your feedback, I much appreciate your inputs. I don't have strong requirements regarding structuring the data: do you think I could use a single, relatively large collection with some discriminator field instead of multiple thousands of separate collections? Thanks, Peter

mmseg4j cause error in Solr 6.0.0

2016-05-13 Thread scott.chu
Previously I make an configset with mmseg4j tokenizer and create a core on Solr 5.4.1 under Win7. It's successfully. Today I repeat same steps under Solr 6.0.0. When I crate collection, it return the error message: * ERROR: Failed to create collection

Re: URL parameters combined with text param

2016-05-13 Thread Bastien Latard - MDPI AG
Thanks both! I already tried "=true", but it doesn't tell me that much...Or at least, I don't see any problem... Below are the responses... 1. /select?q=hospital AND_query_:"{!q.op=AND v=$a}"=abstract,title=hospital Leapfrog=true 0 280 hospital AND_query_:"{!q.op=AND v=$a}"