Re: Making managed schema unmutable correctly?

2016-03-19 Thread Alexandre Rafalovitch
you can downconfig it: > > zkcli.sh -z localhost:9983 -cmd downconfig -confdir conf -confname > configname > git add conf > git commit > > - This was simplified because I simply didn't successfully index until I'd > defined all the needed field

Re: indexing Free-form text description

2016-03-19 Thread Alexandre Rafalovitch
Well, Solr ships with nearly 10 examples. So, if you go through them, you will know quite a lot. This article (mine) may help you to navigate them: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ More specifically, as Erick said, your question is too generic. One step forward

How is _rest_managed.json used?

2016-03-19 Thread Alexandre Rafalovitch
Hello, What is _rest_managed.json actually for? I can see the mechanics in the Ref Guide and even found where it is managed by source code. But I cannot figure out how it actually fits into a workflow. It seems to be a registry of REST managed components (e.g. synonyms) for when they are NOT

Making managed schema unmutable correctly?

2016-03-16 Thread Alexandre Rafalovitch
So, I am looking at the Solr 5.5 examples with their all-in by-default managed schemas. And I am scratching my head on the workflow users are expected to follow. One example is straight from documentation: "With the above configuration, you can use the Schema API to modify the schema as much as

Re: Retrieving of Field Type

2016-03-08 Thread Alexandre Rafalovitch
The Admin UI does and it uses Javascript. So you know it is possible. Admin UI uses Luke for technical-level info: http://localhost:8983/solr/techproducts/admin/luke You can use Schema API for slightly better one: http://localhost:8983/solr/techproducts/schema You can also use Schema API to get

Re: Custom field using PatternCaptureGroupFilterFactory

2016-03-06 Thread Alexandre Rafalovitch
I don't see the brackets that mark the group you actually want to capture. As per: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternCaptureGroupTokenFilter.html I am also not sure if you actually need "{0,1}" part. Regards, Alex. Newsletter and

Re: Solr (5.3.1) doesn't delete orphaned child documents

2016-03-03 Thread Alexandre Rafalovitch
I suspect not (starting from 'delete parent only'). I would check this against Solr 5.5 as it fixed a bunch of parent/child related issues. See, for example, SOLR-5211 Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 4 March

Re: Indexing Twitter - Hypothetical

2016-03-03 Thread Alexandre Rafalovitch
I think some of the Twitter's need to index in a particular way comes from their real-time need. So, that's part of the decision for the original poster, on how responsive data needs to be. As to the rest, I think the company that shows twitter messages on TV does something similar with Solr.

Re: Indexing books, chapters and pages

2016-03-01 Thread Alexandre Rafalovitch
Here is an - untested - possible approach. I might be missing something by combining these things in too many layers, but. 1) Have chapter as parent documents and pages as children within that. Block index them together. 2) On pages, include page text (probably not stored) as one field. Also

Re: Filter factory to reduce word from plural forms to singular forms correctly?

2016-02-29 Thread Alexandre Rafalovitch
ek > > > > On 3/1/2016 8:13 AM, Alexandre Rafalovitch wrote: >> >> On 29 February 2016 at 20:40, Derek Poh <d...@globalsources.com> wrote: >>> >>> Is there other filter factory that can reduce pluralto singular >>> correctly? >> >&

Re: Filter factory to reduce word from plural forms to singular forms correctly?

2016-02-29 Thread Alexandre Rafalovitch
On 29 February 2016 at 20:40, Derek Poh wrote: > Is there other filter factory that can reduce pluralto singular correctly? English is not an easy language and most of the heuristic filters have issues. You could try copyField and multiple approaches. Or, if this is a

Re: ExtendedDisMax configuration nowhere to be found

2016-02-29 Thread Alexandre Rafalovitch
On 29 February 2016 at 09:40, wrote: > I have no problem with automatic. It is "automagicall" stuff that I find a > bit hard to like. Ie things that are automatic, but doesn't explain how and > why they are automatic. But Disney Land and Disney World are

Re: Is anybody using Config API/configoverlay.json, useParams/params.json, and/or initParams?

2016-02-26 Thread Alexandre Rafalovitch
February 2016 at 08:42, Erik Hatcher <erik.hatc...@gmail.com> wrote: > data_driven /browse does. And example/files builds upon that a lot more. I > did it that way to personally explore the configset feature. > >Erik > >> On Feb 26, 2016, at 16:12, Alexandre Rafa

Is anybody using Config API/configoverlay.json, useParams/params.json, and/or initParams?

2016-02-26 Thread Alexandre Rafalovitch
Hi, I am creating an explanation of solrconfig.xml for the beginners and want to know whether anybody is actually using overrides and initParams in the wild. Sometimes, features exist for edge cases, but may not be worth spending much attention on in the beginner docs. Any feedback (on the list

(Solr 5.5) How do beginners modify dynamic schema now that it is default?

2016-02-24 Thread Alexandre Rafalovitch
Hi, In Solr 5.5, all the shipped examples now use dynamic schema. So, how are they expected to add new types? We have "add/delete fields" UI in the new Admin UI, but not "add/delete types". Do we expect them to use REST end points and curl? Or to not modify types at all? Or edit the "do not

Re: query knowledge graph

2016-02-12 Thread Alexandre Rafalovitch
The last Lucene/Solr Revolution had a number of presentations on relevancy. I would recommend watching them as a first step. They are on YouTube under Lucidworks channel. There is also an early release book from Mannings called Relevant Search which you will find very useful. Regards, Alex.

Re: How is Tika used with Solr

2016-02-09 Thread Alexandre Rafalovitch
Solr uses Tika directly. And not in the most efficient way. It is there mostly for convenience rather than performance. So, for performance, Solr recommendation is also to run Tika separately and only send Solr the processed documents. Regards, Alex. Newsletter and resources for Solr

Re: replicate indexing to second site

2016-02-09 Thread Alexandre Rafalovitch
This issue might be similar to what Apple presented at the closing keynote at Solr Revolution 2014. I believe they used a queue on each of the site feeding into Solr. The presentation should be online. Regards, Alex. Newsletter and resources for Solr beginners and intermediates:

Re: ​Securing fields and documents with Shield | Elastic

2016-02-04 Thread Alexandre Rafalovitch
I have not used Shield yet, so this is based just on the document you sent. I would use different Request Handler endpoints for different users and put the restrictions there, in the invariants section. For field restrictions, I would use 'uf' parameter. As for example here (from my old book):

Re: Use SqlEntityProcessor in cached mode to repeat a query for a nested child element

2016-02-04 Thread Alexandre Rafalovitch
Where did cachePrimaryKey comes from? The documentation has cacheKey : https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Regards, Alex. Newsletter and resources for Solr beginners and intermediates:

Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-03 Thread Alexandre Rafalovitch
There is a framework to help write them: https://github.com/leonardofoderaro/alba Also, some recent plugins were released at the Revolution conference, maybe they have something useful: https://github.com/DiceTechJobs/SolrPlugins Regards, Alex. Newsletter and resources for Solr beginners

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Alexandre Rafalovitch
What are you importing from? Is the source and Solr machine collocated in the same fashion on dev and prod? Have you tried running this on a Linux dev machine? Perhaps your prod machine is loaded much more than a dev. Regards, Alex. Newsletter and resources for Solr beginners and

Re: Configuring cores to persist in the event of Solr restart

2016-01-10 Thread Alexandre Rafalovitch
Did you by any chance start the first-time with bin/solr start -e And then bin/solr restart? In that case, the solr home was not set after restart and needs to be passed in manually. Or some other unexpected solr.home situation. Just poking in the dark here. Regards, Alex On 10 Jan 2016

Re: apply document filter to solr index

2016-01-04 Thread Alexandre Rafalovitch
Well, you have a crawling and extraction pipeline. You can probably inject a classification algorithm somewhere in there, possibly NLP trained on manual seed. Or just a list of typical words as a start. This is kind of pre-Solr stage though. Regards, Alex On 4 Jan 2016 7:37 pm,

Re: Using post.jr for indexing in Solr 5.4.0

2016-01-01 Thread Alexandre Rafalovitch
Wait? You are trying to clean up text just before indexing? Have you tried an UpdateRequestProcessor to do that? Regards, Alex On 1 Jan 2016 1:14 am, "Zheng Lin Edwin Yeo" wrote: > Yes, I tried using the latest post.jar, and I got the same error. > > I have shortlisted

Re: Memory Usage increases by a lot during and after optimization .

2015-12-31 Thread Alexandre Rafalovitch
Wouldn't collection swapping be a better strategy in that case? Load and optimise in a separate server, then swap it in. On 30 Dec 2015 10:08 am, "Walter Underwood" wrote: > The only time that a force merge might be useful is when you reindex all > content every night or

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch
not. > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Thursday, December 31, 2015 11:40 AM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Testing Solr configuration, schema, and other fields > > Makes sense

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch
and so I've been seeking the > wisdom of the crowd. > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Thursday, December 31, 2015 12:42 AM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Testing Solr configuration, schema

Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-31 Thread Alexandre Rafalovitch
; against Solr indexing set using a join operator. > 5. Cache possible matches in SQL Server for a given record in order for a > human to disposition them. > > From what I read, Carrot is great for Solr clustering, but once you get into > RDBMS, you're out of luck. > > >

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Alexandre Rafalovitch
I might be just confused here, but I am not sure what your bottle neck actually is. You seem to know your critical path already, so how can we help? Starting new solr core from given configuration directory is easy. Catching hard errors from that is probably just gripping logs or a custom logger.

Re: Teiid with Solr - using any other engine except the SolrDefaultQueryEngine

2015-12-30 Thread Alexandre Rafalovitch
Are you trying to do federated search? What about carrot? Not the one that ships with Solr, the parent project. Regards, Alex On 31 Dec 2015 12:21 am, "Mark Horninger" wrote: > I have gotten Teiid and Solr wired up, but it seems like the only way to > query

Re: Changing Solr Schema with Data

2015-12-28 Thread Alexandre Rafalovitch
Is the schema change affects the data you want to keep? Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 29 December 2015 at 01:48, Salman Ansari wrote: > Hi, > > I am facing an issue where I need to change Solr schema

Re: Data import issue

2015-12-25 Thread Alexandre Rafalovitch
Do you have a full stack trace? A bit hard to help without that. On 24 Dec 2015 2:54 pm, "Midas A" wrote: > Hi , > > > Please provide the steps to resolve the issue. > > > com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > Communications link failure

Re: Multiple Unique Keys

2015-12-23 Thread Alexandre Rafalovitch
No. Whichever one triggers the document override should be your primary key. The rest is application logic. You can make the field required, but that's about it. Regards, Alex On 23 Dec 2015 3:32 pm, "Salman Ansari" wrote: > Hi, > > I am wondering if I can specify

Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
Are you sending documents from one client or many? Looks like an exhaustion of some sort of pool related to Commit within, which I assume you are using. Regards, Alex On 16 Dec 2015 4:11 pm, "Midas A" wrote: > Getting following warning while indexing ..Anybody please

Re: Append fields to a document

2015-12-16 Thread Alexandre Rafalovitch
Dec 16, 2015 7:43 PM, "Alexandre Rafalovitch" <arafa...@gmail.com> wrote: > >> ExternalFileField might be useful in some situations. >> >> But also, is it possible that your Solr schema configuration is not >> best suited for your domain? Is it - for example -

Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
, > > *Only two DIH, indexing different data. * > > On Thu, Dec 17, 2015 at 10:46 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > wrote: > > > How many? On the same node? > > > > I am not sure if running multiple DIH is a popular case. > > >

Re: warning while indexing

2015-12-16 Thread Alexandre Rafalovitch
il.com> wrote: > Alexandre , > > we are running multiple DIH to index data. > > On Thu, Dec 17, 2015 at 12:40 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > wrote: > > > Are you sending documents from one client or many? > > > > Looks like an exhau

Re: Append fields to a document

2015-12-16 Thread Alexandre Rafalovitch
ExternalFileField might be useful in some situations. But also, is it possible that your Solr schema configuration is not best suited for your domain? Is it - for example - possible that the additional data should be in child records? Pure guesswork here, not enough information. But, as

Re: Issues when indexing PDF files

2015-12-16 Thread Alexandre Rafalovitch
They could be using custom fonts and non-Unicode characters. That's probably something to explore with PDF specific tools. On 17 Dec 2015 1:37 pm, "Zheng Lin Edwin Yeo" wrote: > I've checked all the files which has problem with the content in the Solr > index using the Tika

Re: Is DIH going to be removed from Solr future versions?

2015-12-16 Thread Alexandre Rafalovitch
Are you saying to do a local mini-collection and then mirror final result to the real one? What about deletions? Per-entry cleanup statements and so on? DIH does full updates, not just additions. Or did I miss the focus? Regards, Alex On 15 Dec 2015 11:46 pm, "Erik Hatcher"

Re: Providing own _version field in solr doc

2015-12-14 Thread Alexandre Rafalovitch
At the first glance, this sounds like a perfect match to https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-DocumentCentricVersioningConstraints Just make sure your "timestamps" are truly atomic and not local clock-based. The drift could cause

Re: Getting a document version back after updating

2015-12-12 Thread Alexandre Rafalovitch
Does "versions=true" flag match what you are looking for? It is described towards the end of: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency Regards, Alex. Newsletter and resources for Solr beginners and

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Alexandre Rafalovitch
Don't think about indexing so much, think about searching. Say you are searching a video? What does that mean? Do you want to match random sequence of binary values that represent inter-frame change? Probably not. When you answer what you want to actually search (title? length? subscripts?), you

Re: Issue with Querying Solr

2015-12-08 Thread Alexandre Rafalovitch
Solr by default only returns 10 rows. SolrNet by default returns many rows. I don't know why that would cause OOM, but that's definitely your difference unless you dealt with it: https://github.com/mausch/SolrNet/blob/master/Documentation/Querying.md#pagination Regards, Alex. Newsletter

Re: Solr Auto-Complete

2015-12-06 Thread Alexandre Rafalovitch
For suffix matches, you copy text the field and in the different type add string reversal for both index and query portions. So you are doing prefix matching algorithm but on reversed strings. I can dig up an example if it is not clear. On 6 Dec 2015 8:06 am, "Salman Ansari"

Re: import file to solr

2015-12-06 Thread Alexandre Rafalovitch
There should be no limit. Try 100K, 50K sizes. Maybe you have an error somewhere. Also check Solr logs, not just DIH messages. On 6 Dec 2015 3:56 pm, "Kate Kas" wrote: > Hi, > > I am trying to import xml files using data import request handler. > > When i import xml file of

Re: Solr Auto-Complete

2015-12-04 Thread Alexandre Rafalovitch
You can see an example of similar use at: http://www.solr-start.com/javadoc/solr-lucene/index.html (search box). The corresponding schema is here: https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24 . It does have some extra special-case stuff

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-04 Thread Alexandre Rafalovitch
Not that hard to setup a cron and diff job and email when the diff is not-empty. A sort-of "is that what you expected" report. But, for myself, I also prefer schema and then managed. I do not like schemaless mode, even for development. Instead, I prefer to do "dynamicField *". P.s. I am thinking

Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Alexandre Rafalovitch
On 4 December 2015 at 19:23, Chris Hostetter wrote: > NotFoundRequestHandler Totally not in either Wiki or Reference Guide. :-( Must be part of the secret committer's lore. Thank you for sharing it with us, pure plebs :-) Newsletter and resources for Solr

Re: Stop adding content in Solr through /update URL

2015-12-03 Thread Alexandre Rafalovitch
You could add 'enable' flag in the solrconfig.xml and then enable/disable it differently on different servers: https://wiki.apache.org/solr/SolrConfigXml#Enable.2Fdisable_components Example:

Re: Difference in query behavior.

2015-11-30 Thread Alexandre Rafalovitch
On 30 November 2015 at 05:45, Modassar Ather wrote: > > I have a query title:(solr lucene api). The mm is set to 100% using q.op as > +(title:solr **title:faceting** title:api)~3 Does it though? solr lucene api => solr faceting api! Is it possible you are staring at the

Re: Facet count mismatch between solr simple facet and Json facet API.

2015-11-27 Thread Alexandre Rafalovitch
This is not quite enough information without seeing real data, I suspect. What do you get in the Admin Schema screen when you load the term counts? As a completely random poke-in-the-dark, do you by any chance get the same value more than once for the same record's multiValued field? I could see

Re: Error on DIH log

2015-11-26 Thread Alexandre Rafalovitch
Where does the BigInteger part comes from? Looks like serialisation mismatch. DIH is seeing just a string. If you can't fix this at the source, you may need a custom transformer or URP to post process this as a special case. Regards, Alex On 27 Nov 2015 12:42 am, "Midas A"

Re: OT: is Heliosearch discontinued?

2015-11-26 Thread Alexandre Rafalovitch
It is discontinued and most of the features had been rolled into Solr. So, if you did not pay attention to changes in various Solr 5.x releases, some innocent sounding features are actually giant drops from the Heliosearch period. :-) Newsletter and resources for Solr beginners and

Re: Solr UI open source

2015-11-26 Thread Alexandre Rafalovitch
You should not be exposing Solr directly to the user, that's like giving them a database admin account. Unless you REALLY know what you are doing. So, the Javascript UIs are mostly for internal purposes and for people to play with Solr. Therefore, usually, there is a server-side component that

Re: Solr UI open source

2015-11-26 Thread Alexandre Rafalovitch
ttps://github.com/o19s/solr_nginx > > We also have a framework Spyglass if you are interested in Ember > https://github.com/o19s/spyglass > > -Doug > > > On Thu, Nov 26, 2015 at 9:30 AM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> You should not be ex

Re: Solr UI open source

2015-11-26 Thread Alexandre Rafalovitch
ation block for each handler you'd like to whitelist >> location /solr/collection1/select { >> >> >> On Thu, Nov 26, 2015 at 11:14 AM, Alexandre Rafalovitch < >> arafa...@gmail.com> wrote: >> >>> I am happy to be corrected, but that repository says &

Re: Solr Date Format

2015-11-25 Thread Alexandre Rafalovitch
Solr internally only supports that format. However, it is possible to use an UpdateRequestProcessor to pre-process other formats. That's what happening when you are using the "schemaless" mode:

Re: Solr 5.2 child documents

2015-11-23 Thread Alexandre Rafalovitch
Do you get a parent doc? If not, maybe you forgot to commit the whole lot. On 23 Nov 2015 6:42 am, "Novin" wrote: > Hi, > > When I query q=*:* I can't get child documents back, Below is > configuration I am using for child Document to index in solr. > > Am I missing something?

Re: RealTimeGetHandler doesn't retrieve documents

2015-11-20 Thread Alexandre Rafalovitch
Actually I think / is a special character as of recent version of Solr. Can't remember why though. This could be the kind of things that would trigger an edge case bug. What happens if you request id3,id2,id1? In the opposite order? Are the same documents missing? Or same by request position? If

Re: Search with very large boolean filter

2015-11-20 Thread Alexandre Rafalovitch
I don't know what to do about 30K ids, but you definitely can improve on the ORing the ids with https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser Regards, Alex. Newsletter and resources for Solr beginners and intermediates:

Re: adding document with nested document require to set id

2015-11-18 Thread Alexandre Rafalovitch
If you have id listed as a required field (which I believe you need to anyway), what do you actually get when you add a document without nesting? What does the document echo back? Because if you are getting a document back without id field when it is declared required in the schema, that would be

Re: Multiple unique key in Schema

2015-11-17 Thread Alexandre Rafalovitch
When you index into Solr, you are overlapping the definitions into one schema. Therefore, you will need a unified uniqueKey. There is a couple of approaches: 1) Maybe you don't actually store the data as three types of entities. Think about what you will want to find and structure the data to

Re: search for documents where all words of field present in the query

2015-11-17 Thread Alexandre Rafalovitch
Are you sure your original description is not a reverse of your use-case? Now, it seems like you just want mm=100 which means "samsung" will match all entries, but "samsung 32G" will only match 3 of them. https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

Re: search for documents where all words of field present in the query

2015-11-17 Thread Alexandre Rafalovitch
This sounds more like a use case for https://github.com/flaxsearch/luwak Or a variation of Ted Sullivan's work: http://lucidworks.com/blog/author/tedsullivan/ I do not think this can be done in Solr directly. If your matched fields were always 2-tokens, you could do complex mm param. If the

Re: Query gives response multiple times

2015-11-17 Thread Alexandre Rafalovitch
h the Solr Admin? > It tells me the numFound but does not give me the all the fields requested > for all the results. Is this usual behaviour? > > Cheers, > > Shane > > > On Mon, Nov 16, 2015 at 7:11 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: >

Re: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Alexandre Rafalovitch
Here would be my debugging sequence: 1. Are you actually searching against: dispNamePrefix (and not against the default text field which has its own analyzer stack)? 2. Do you see the field definition in the Schema Browser screen? 3. If you on that screen, click "Load Term Info" do you see the

Re: Query gives response multiple times

2015-11-17 Thread Alexandre Rafalovitch
d results is dest="catch_all_fields_mt"/>. > > How can I find if I am using a custom request handler? I had assumed I was > using the default as in the Request Handler box it has /select. > > Thanks, > > Shane > > > > > > > > On Tue, Nov

Re: Query gives response multiple times

2015-11-16 Thread Alexandre Rafalovitch
I would check for copyField into that target field or something in UpdateRequestProcessors (in solrconfig.xml) that copies into that field. Baring those two, the field should return what you put into it. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Solr logging in local time

2015-11-16 Thread Alexandre Rafalovitch
The logging format is defined by log4j properties. Looking at Solr 5.3.1, we are using EnhancedPatternLayout, which apparently supports just putting the timezone in braces after the date format: http://stackoverflow.com/questions/9116425/apache-log4j-logging-with-specific-timezone I'd try that as

Re: Query gives response multiple times

2015-11-16 Thread Alexandre Rafalovitch
the quick responses. > > @Andrea Gazzarini > The field can have one or eight doubles in it. However, the response of > the query has 8 doubles and 64 doubles respectively. The values are > repeated 8 times. > > @Alexandre Rafalovitch > Thanks for the link. I am just getting started

Re: Query gives response multiple times

2015-11-16 Thread Alexandre Rafalovitch
On 16 November 2015 at 17:40, Shane McCarthy wrote: > I am using an instance of Islandora. Ah. This complicates the situation as there is an unknown - to most of us - layer in between. So, it is not clear whether this multiplication is happening in Solr or in Islandora. Your

Re: Document boost in Solr

2015-11-14 Thread Alexandre Rafalovitch
Did you try using debug.explain.other and seeing how it is ranked? On 14 Nov 2015 6:28 am, "Aditya" wrote: > Hi > > My website www.findbestopensource.com provides search over millions of > open > source projects. > > I recently found this issue in my website. Each

Re: HELP!!!!

2015-11-13 Thread Alexandre Rafalovitch
Welcome to the Solr world. Yes, usually you use a client application. If you are working in Java, you use SolrJ or you can look into Spring Data. For other languages, there are libraries too. You can see a reasonable list at: https://wiki.apache.org/solr/IntegratingSolr . Be aware that not all

Re: Arabic analyser

2015-11-10 Thread Alexandre Rafalovitch
If this is for a significant project and you are ready to pay for it, BasisTech has commercial solutions in this area I believe. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 November 2015 at 08:46, Mahmoud Almokadem

Re: Simple web interface for queries

2015-11-10 Thread Alexandre Rafalovitch
Solr is not actually designed to be directly exposed to the end-users. It is possible to delete the whole collection,etc. It is supposed to be treated as a database behind firewall, etc. Just thought I'll mention that in case you did not know it. Regards, Alex. Solr Analyzers,

Re: Costs/benefits of DocValues

2015-11-09 Thread Alexandre Rafalovitch
om/ On 9 November 2015 at 11:57, Yonik Seeley <ysee...@gmail.com> wrote: > On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch > <arafa...@gmail.com> wrote: >> I thought docValues were per segment, so the price of un-inversion was >> effectively paid on each commit for

Re: Costs/benefits of DocValues

2015-11-09 Thread Alexandre Rafalovitch
I thought docValues were per segment, so the price of un-inversion was effectively paid on each commit for all the segments, as opposed to just the updated one. I admit I also find the story around docValues to be very confusing at the moment. Especially on the interplay with "indexed=false". It

Re: Exception in grouping with docValues enable field.

2015-11-08 Thread Alexandre Rafalovitch
SOLR-4647 ? But your name is already in that JIRA, so perhaps something else similar. You also did not mention the version of Solr. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 8 November 2015 at 22:59, Modassar Ather

Re: data import extremely slow

2015-11-07 Thread Alexandre Rafalovitch
Have you thought of just using Solr. Might be faster than troubleshooting DIH for complex scenarios. On 7 Nov 2015 3:39 pm, "Yangrui Guo" wrote: > I found multiple strange things besides the slowness. I performed count(*) > in MySQL but only one-fifth of the records were

Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
Just to get the paranoid option out of the way, is 'id' actually the column that has unique ids in your database? If you do "select distinct id from imdb.director" - how many items do you get? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: data import extremely slow

2015-11-07 Thread Alexandre Rafalovitch
<erickerick...@gmail.com> wrote: > Alexandre, did you mean SolrJ? > > Here's a way to get started > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > Best, > Erick > > On Sat, Nov 7, 2015 at 2:22 PM, Alexandre Rafalovitch > <arafa...@gmail.com>

Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
se IMDB doesn't have a table > for cast & crew. It puts movie and person and their roles into one huge > table 'cast_info'. Hence there are multiple rows for a director, one row > per his movie. > > On Saturday, November 7, 2015, Alexandre Rafalovitch <arafa...@gmail.com> >

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-06 Thread Alexandre Rafalovitch
Elasticsearch removed deleteByQuery from the core all together. Definitely an outlier :-) Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 6 November 2015 at 20:18, Yonik Seeley wrote: > On Wed, Nov 4, 2015 at 3:36 PM, Shawn

Re: solr-8983-console.log is huge

2015-11-06 Thread Alexandre Rafalovitch
What about the Garbage Collection output? I think we have the same issue there. Frankly, I don't know how many people know what to do with that in a first place. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 6 November 2015 at 11:11,

Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
Well, I've started to answer, but it hit a nerve and turned into a guide. Which is now a blog post with 6 steps (not mentioning step 0 - Admitting you have a problem). I hope this is helpful: http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/ Regards, Alex. Solr

Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
On 5 November 2015 at 11:22, Shawn Heisey wrote: > As far as I know, there are no currently available books covering > version 5, but I believe there is at least one on the horizon. Rafal's book is "compatible" with Solr 5: http://solr.pl/solr-cookbook-third-edition/ . But

Re: how to efficiently get sum of an int field

2015-11-05 Thread Alexandre Rafalovitch
Ah, Unix. Isn't it wonderful (it is, but): http://unix.stackexchange.com/questions/3051/how-to-echo-a-bang Try single quotes and backslash before the bang. Or disable history characters. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: tikaparser docx file fails with exception

2015-11-05 Thread Alexandre Rafalovitch
It is quite clear actually that the problem is this: Caused by: java.io.CharConversionException: Characters larger than 4 bytes are not supported: byte 0xb7 implies a length of more than 4 bytes at org.apache.xmlbeans.impl.piccolo.xml.UTF8XMLDecoder.decode(UTF8XMLDecoder.java:162) at

Re: Solr Features

2015-11-05 Thread Alexandre Rafalovitch
n. I will appreciate any comments/feedback regarding > this. > > Regards, > Salman > > On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> Well, I've started to answer, but it hit a nerve and turned into a >> g

Re: Managing ZIP files inside ZIP files

2015-11-04 Thread Alexandre Rafalovitch
How are you injesting them now? I'd probably use Java8 with SolrJ and use new Virtual File System approach to read right out of the zip and gzip . http://docs.oracle.com/javase/8/docs/api/java/nio/file/FileSystems.html#newFileSystem-java.nio.file.Path-java.lang.ClassLoader- Tar is a bit harder,

Re: language plugin

2015-11-03 Thread Alexandre Rafalovitch
I wonder what would happen if the DistributedUpdateProcessorFactory is manually added into the chain and the LangDetect definition is moved AFTER it. As per https://wiki.apache.org/solr/UpdateRequestProcessor#Distributed_Updates This would mean that the detection code would be executed on each

Re: Many files /dataImport in same project

2015-11-03 Thread Alexandre Rafalovitch
On 3 November 2015 at 10:38, Gora Mohanty wrote: >> I missed previous discussions, but the DIH config file is given in a >> query parameter. So, if there is a bunch of them on a file system, one >> could probably do >> find . - name "*.dihconf" | xargs curl . > > Sorry, I

Re: SSL on Solr with CA signed certificate

2015-11-02 Thread Alexandre Rafalovitch
I think (not tested) that it should be safe to select Tomcat from the dropdown, as both use keytool (bundled with JDK) to generate the CSR. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 2 November 2015 at 09:53, davidphilip

Re: Many files /dataImport in same project

2015-11-02 Thread Alexandre Rafalovitch
On 2 November 2015 at 11:30, Gora Mohanty wrote: > As per my last > follow-up, there is currently no way to have DIH automatically pick up > different data-config files without manually editing the DIH > configuration each time. I missed previous discussions, but the DIH

[ANN]: Blog article: every Solr home and example in Solr 5.3

2015-11-02 Thread Alexandre Rafalovitch
If you've recently downloaded Solr 5.x and trying to figure out what example creates a home where and why the example creation command uses configset directory but not configset URL parameter, you may find this useful: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ Regards,

Re: Kate Winslet vs Winslet Kate

2015-11-02 Thread Alexandre Rafalovitch
I just had a thought that perhaps Complex Phrase parser could be useful here: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser You still need to mark that full name to search against specific field, so it may or may not in a more general stream

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Alexandre Rafalovitch
Which is what I believe Ted Sullivan is working on and presented at the latest Lucene/Solr Revolution. His presentation does not seem to be up, but he was writing about it on: http://lucidworks.com/blog/author/tedsullivan/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and

Re: language plugin

2015-10-29 Thread Alexandre Rafalovitch
Could you post your full chain definition. It's an interesting problem, but hard to answer without seeing exact current configuration. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 29 October 2015 at 03:25, Chaushu, Shani

<    3   4   5   6   7   8   9   10   11   12   >