Re: Solr Code structure Documentation

2015-10-26 Thread Alexandre Rafalovitch
Well, the source code is all there, if you need to know _exactly_. Run it under Debug. Run it under paid IntelliJ with Chronos if you will be doing it a lot. Same with Admin to Solr, just open a developer console in the browser and you have every web call documented just when you want them.

Re: using a custom update for all documents

2015-10-26 Thread Alexandre Rafalovitch
Roxana, You've been asked a couple of times by several people to explain your business needs (level higher than Solr itself). As it is, you are slowly getting deeper and deeper into Solr's internals, where there might be an easier question if we know what you are trying to achieve. It is your

Re: Analytics using Solr

2015-10-25 Thread Alexandre Rafalovitch
It is a very general question. So, the general answer is yes. To get a sample of what's possible, I recommend you check out Solr Revolution presentations from this year and presentations+video from last year. There were at least a couple that you may find interesting.

Re: Should I install 4.x or 5.x? Book recommendations?

2015-10-23 Thread Alexandre Rafalovitch
Definitely 5.x. Lots of new goodies. It is true that some of the startup scripts are different and the example schemas could be slightly confusing if following a book, but I think it is well worth starting on a good foot. Just remember, no "collection1" anymore, all cores/collections are explicit.

Re: Select sibling data via XPathEntityProcessor

2015-10-23 Thread Alexandre Rafalovitch
t me at an example that could get me > started that would be a great help. > > Thanks > > Alan. > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: 22 October 2015 15:43 > To: solr-user > Subject: Re: Select sibling data v

Re: Get this committed

2015-10-23 Thread Alexandre Rafalovitch
Begging at the Dev list is probably more efficient, though I am sure most of them are hanging around here as well. Regards, Alex. P.s. Sorry, I wish I could help. Not a committer. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 23 October

Re: Select sibling data via XPathEntityProcessor

2015-10-23 Thread Alexandre Rafalovitch
multiple roots (start tag in epilog?). > > Looks like I need to dig a bit deeper > > Regards, > Alan. > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: 23 October 2015 12:00 > To: solr-user > Subject: Re: Select siblin

Re: Should I install 4.x or 5.x? Book recommendations?

2015-10-23 Thread Alexandre Rafalovitch
com> wrote: > Hi Alex, > > What's the title of your book? An amazon link would be useful too. > > Thanks! > Rob > > On Fri, Oct 23, 2015 at 2:50 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> Definitely 5.x. Lots of new goodies. It is true t

Re: getting cached terms inside UpdateRequestProcessor...

2015-10-22 Thread Alexandre Rafalovitch
You need to tell the second call which documents to update. Are you doing that? There may also be a wrinkle in the URP order, but let's get the first step working first. On 22 Oct 2015 12:59 pm, "Roxana Danger" wrote: > yes, it's working now... but I can not use

Re: Select sibling data via XPathEntityProcessor

2015-10-22 Thread Alexandre Rafalovitch
I don't think DIH supports siblings. Have you thought of using XSLT processor before sending XML to Solr. Or using it instead of DIH during the update (not a well know part of Solr):

Re: getting cached terms inside UpdateRequestProcessor...

2015-10-22 Thread Alexandre Rafalovitch
You are doing things out of order. It's DIH, URP, then indexer. Any attempt to subvert that order for the record being indexed will end in problems. Have you considered doing a dual path? Index, then update. Of course, your fields all need to be stored for that. Also, perhaps you need to rethink

Re: Index Multiple entity in one collection core

2015-10-22 Thread Alexandre Rafalovitch
When you run a full-import, Solr will try to delete old documents before importing the new ones. If there is several top-level entities, they step on each other foot. Use preImportDeleteQuery to avoid that (as per

Re: getting cached terms inside UpdateRequestProcessor...

2015-10-22 Thread Alexandre Rafalovitch
olr/reed_jobs/update/details?commit=true > but it returns immediately with status 0 but does not execute the update... > How should the update be called for reindex/update all the imported docs. > with my chain? > > > Best regards, > Roxana > > > On 22 October 2015 at 14:14, A

Re: char filter factory and tokeniser issue in admin Analysis form

2015-10-20 Thread Alexandre Rafalovitch
On 20 October 2015 at 10:26, Lee Carroll wrote: > B*ll*cks, before posting I spent an hour searching for issues, honest. > Soon as I post within seconds I find > > https://issues.apache.org/jira/browse/SOLR-5800 We are always glad to be of help. Including by

Re: Configuration

2015-10-19 Thread Alexandre Rafalovitch
Sounds like a mission impossible given the number of inner joins. However, what are you _actually_ trying to do? Are you trying to reindex the data? Do you actually have the data to reindex? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Tokenize ShingleFilterFactory results and apply filters to tokens

2015-10-19 Thread Alexandre Rafalovitch
This sounds like an attempt to create an auto-complete using n-grams in text. In which case, Ted Sullivan's writing might be of relevance: http://lucidworks.com/blog/author/tedsullivan/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Lucene Revolution ?

2015-10-18 Thread Alexandre Rafalovitch
It went very well. Lots of interesting talks. I believe you (Mr. Bell) were even mentioned by Ted Sullivan for voting for his Jira proposal on the AutophrasingFilter. The talk was extremely interesting and I intend to follow up on it. :-) The slides are starting to come up already. Mine are at:

Re: Forking Solr

2015-10-16 Thread Alexandre Rafalovitch
I suspect these questions should go the Lucene Dev list instead. This one is more for those who build on top of standard Solr. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 16 October 2015 at 12:07, Ryan Josal

Re: Grouping facets: Possible to get facet results for each Group?

2015-10-12 Thread Alexandre Rafalovitch
Could you use the new nested facets syntax? http://yonik.com/solr-subfacets/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 11 October 2015 at 09:51, Peter Sturge wrote: > Been trying to coerce Group

Re: How to show some (paid) documents ahead of others (non-paid) - fantasy scenario

2015-10-11 Thread Alexandre Rafalovitch
What about Streaming Expressions? Could they be used here? Disclaimer: I have not used them myself yet. https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On

Re: Instant Page Previews

2015-10-07 Thread Alexandre Rafalovitch
I don't think that particular functionality is anything directly to do with Solr? You will have server component that will index web page (I am guessing) into Solr. That same component can generate preview image. Your frontend UI will get the URL/id from Solr and display the related image. Solr

Re: Solr vs Lucene

2015-10-01 Thread Alexandre Rafalovitch
Hi Mark, Have you gone through a Solr tutorial yet? If/when you do, you will see you don't need to code any of this. It is configured as part of the web-facing total offering which are tweaked by XML configuration files (or REST API calls). And most of the standard pipelines are already

Re: Solr vs Lucene

2015-10-01 Thread Alexandre Rafalovitch
> > At this point, I can't even figure out how to narrow down my confusion so > that I can post concise questions to the group. But I'll get there > eventually, starting with removing the wordbreak checker for the time-being. > Your response was encouraging, at least. > > Mark

Re: entity processing order during updates

2015-09-30 Thread Alexandre Rafalovitch
Have you tried just having two separate endpoints each with its own definition of DIH and URP? Then, you just hit those end-points one at a time in whatever order you need. Seems easier than a custom switching logic. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: entity processing order during updates

2015-09-30 Thread Alexandre Rafalovitch
file? > Thank you very much, > Roxana > > > > On 30 September 2015 at 14:48, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> Have you tried just having two separate endpoints each with its own >> definition of DIH and URP? Then, you just hit those end-

Re: New Project setup too clunky

2015-09-27 Thread Alexandre Rafalovitch
Mark, Thank you for your valuable feedback. The newbie's views are always appreciated. Admin Admin UI command is designed for creating a collection based on the configuration you already have. Obviously, it makes that point somewhat less than obvious. To create a new collection with

Re: Using a plugin to filter in schema.xml

2015-09-25 Thread Alexandre Rafalovitch
I think (I lost the library link) you would need to build a bridge by doing a custom Analyzer or Tokenizer and then using the library under the covers. Would be a nice contribution to open-source if you managed to achieve that. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and

Re: Different ports for search and upload request

2015-09-25 Thread Alexandre Rafalovitch
How about you do indexing on a completely different node and then swap the index into production using Solr aggregate aliases? https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection The problem here is that deleting existing content is

Re: Different ports for search and upload request

2015-09-24 Thread Alexandre Rafalovitch
But they would still compete for the servlet engine's threads. Putting them on different ports will not change anything. Now, if you wanted to put them on different network interfaces, that could be something. But I do not think it is possible, as the select and update are both just configuration

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-23 Thread Alexandre Rafalovitch
You may find the following articles interesting: http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html ( a whole epic journey) https://dzone.com/articles/indexing-chinese-solr Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: Solr DataImportHandler is not indexing all data defined

2015-09-17 Thread Alexandre Rafalovitch
Sanity check. Did you restart Solr or reloaded the core after you updated your schema definition? In the Admin UI, in the Schema Browser, you should be able to see all the fields you defined. Are those fields there? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: Atomic updates on multiple documents

2015-09-17 Thread Alexandre Rafalovitch
You could probably do this as a RequestUpdateProcessor (a custom one) that would take your submitted document, run a query and expand it to a bunch of documents. So, do the ID mapping internally. But you would need the ID/uniqueKeys. Definitely nothing out of the box, that I can think of.

Re: Detect term occurrences

2015-09-11 Thread Alexandre Rafalovitch
ces of those terms for any leaflet > document. > Could you give me a clue about how is the best way to perform it? > Perhaps, the best way is (as Walter suggests) to do all the queries every > time, as needed. > Regards, > > Francisco > > El jue., 10 de sept. de 2015 a la(s)

Re: Detect term occurrences

2015-09-10 Thread Alexandre Rafalovitch
Can you tell us a bit more about the business case? Not the current technical one. Because it is entirely possible Solr can solve the higher level problem out of the box without you doing manual term comparisons.In which case, your problem scope is not quite right. Regards, Alex. Solr

Re: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Alexandre Rafalovitch
A sanity check question. Was this test done with a completely new index after you enabled docvalues? Not just "delete all" but actually deleted index directory and rebuilt from scratch? If it still happens after such a thorough cleanup, it might be a bug. Regards, Alex. Solr Analyzers,

Re: Exception using Json Facet API with Multivalue Int field +docValues=true

2015-09-08 Thread Alexandre Rafalovitch
Could you make a small index from scratch using a subset of data and see if the problem happens anyway? If yes, you have a test case. If no, you may need to do a full rebuild to be fully assured. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Alexandre Rafalovitch
Both version seem to be painful in that they will retrieve the URL content multiple times. The first version is definitely wrong. The second version is probably wrong because both inner and outer entities are having the same name. I would try giving different name to the inner entity and seeing if

Re: SOLR DataImportHandler - Problem with XPathEntityProcessor

2015-09-08 Thread Alexandre Rafalovitch
What about DIH's own XSL pre-processor? It is XSL param on https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheXPathEntityProcessor No other ideas, unfortunately, I don't

Re: json update handler gone?

2015-09-07 Thread Alexandre Rafalovitch
You can define any number of the handler end-point definitions. Also, you can pass the update chain name as part of the URL parameters. So, it could be different for each call if you want. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yes please.: http://www.amazon.com/Solr-Troubleshooting-Maintenance-Alexandre-Rafalovitch/dp/1491920149/ :-) Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 September 2015 at 10:30, Yonik Seeley <ysee...@gmail.

Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yonik, Is this all visible on query debug level? Would it be effective to ask to run both queries with debug enabled and to share the expanded query value? Would that show up the differences between Lucene implementations you described? (Looking for troubleshooting tips to reuse). Regards,

Re: Position of Document in Listing (Search Result)

2015-09-03 Thread Alexandre Rafalovitch
So, basically for each car, you want to generate a query with the same parameter (e.g. make) and then say where in the results for that query, your particular car would be. Right? I think the only way is to run the query and to see where the car is in the result. So, a custom code of some sort.

Re: Position of Document in Listing (Search Result)

2015-09-03 Thread Alexandre Rafalovitch
That's a good point. What is the query sorting on? Shayan, can you give an example of a query with sorting/etc shown. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 3 September 2015 at 16:24, Chris Hostetter

Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Alexandre Rafalovitch
Put the IgnoreCommit on the default handler to stop clients from forcing the commit: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/IgnoreCommitOptimizeUpdateProcessorFactory.html Then have a separate normal handler and send your real commits through that if you

Re: Cached fq decreases performance

2015-09-03 Thread Alexandre Rafalovitch
FQ has to calculate the result bit set for every document to be able to cache it. Q will only calculate it for the documents it matches on and there is some intersection hopping going on. Are you seeing this performance hit on first query only or or every one? I would expect on first query only

Re: String bytes can be at most 32766 characters in length?

2015-09-03 Thread Alexandre Rafalovitch
ntly indexed as > fieldType=text_general. > > > > true > content > false > content > solr.processor.Lookup3Signature > > > > > > Regards, > Edwin > > > On 3 September 2015 at 09:46, Alexandre Rafalovitch <arafa...@gmail.com>

Re: String bytes can be at most 32766 characters in length?

2015-09-02 Thread Alexandre Rafalovitch
And that's because you have an incomplete chain. If you look at the full example in solrconfig.xml, it shows: true id false name,features,cat solr.processor.Lookup3Signature Notice, the last two processors.

Re: which solrconfig.xml

2015-09-02 Thread Alexandre Rafalovitch
Have you looked at Admin Web UI in details yet? When you look at the "Overview" page, on the right hand side, it lists a bunch of directories. You want one that says "Instance". Then, your solrconfig.xml is in "conf" directory under that. Regards, Alex. P.s. Welcome! Solr Analyzers,

Re: plz help me

2015-09-01 Thread Alexandre Rafalovitch
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html shows how to keep updates in a separate core. Notice that it is an intermediate-level article for query syntax. For persian text analysis, there is a pre-built analyser defiition in the techproducts example, start from that.

Re: Sorting parent documents based on a field from children

2015-09-01 Thread Alexandre Rafalovitch
On 1 September 2015 at 08:29, Mikhail Khludnev wrote: > Last check to check, make sure that you don't have deleted document in the > index for a while. You can check in at SolrAdmin. What's the significance of that particular advice? Is something in the join including

Re: Get distinct results in Solr

2015-09-01 Thread Alexandre Rafalovitch
gave >> > >> but put results into a separate string field. Then, you group on that >> > >> field. You cannot actually group on the long text field, that would >> > >> kill any performance. So a signature is your proxy. >> > >> >>

Re: plz help me

2015-09-01 Thread Alexandre Rafalovitch
ch. > and i want some filter for persian. > that pre-built text_fa doesn't satisfied me.have you better perisan filter > than that?or a soulotion to have this filter in persian? > tnx. > > On Tue, Sep 1, 2015 at 5:21 AM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: &

Re: Sorting parent documents based on a field from children

2015-09-01 Thread Alexandre Rafalovitch
On 1 September 2015 at 09:10, Mikhail Khludnev wrote: >> Not many >> people know about it, may help to disambiguate the syntax. >> > Oh. C'mon! it's announced for ages http://yonik.com/solr/query-syntax/ Not everybody reads and keeps track of every feature of Solr.

Re: Connect and sync two solr server

2015-09-01 Thread Alexandre Rafalovitch
Is this for multi-datacenter? If so, you may want to review Apple's presentation at the last Solr Revolution: https://www.youtube.com/watch?v=_Erkln5WWLw=2=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
search or other functions like > highlighting? > > Yes, the content must be in my index, unless I do a copyField to do > de-duplication on that field.. Will that help? > > Regards, > Edwin > > > On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafa...@gmail.com&

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
Can't you just treat it as String? Also, do you actually want those documents in your index in the first place? If not, have you looked at De-duplication: https://cwiki.apache.org/confluence/display/solr/De-Duplication Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
If you use DataImportHandler, you can combine LineEntityProcessor with RegexTransformer to split each line into a bunch of fields:

Re: Indexing Fixed length file

2015-08-28 Thread Alexandre Rafalovitch
Erik's version might be better with tabs though to avoid CSV's requirements on escaping comas, quotes, etc. And maybe trim those fields a bit either in awk or in URP inside Solr. But it would definitely work. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Search to Ignore ,

2015-08-27 Thread Alexandre Rafalovitch
This is both very specific and very general question at the same time. The way indexing and search are both done is via analyzer chains, as defined in your schema. So, you need to check what the definition is for the field you search and then play with that. There is Analysis screen in the Web

Re: Search opening hours

2015-08-25 Thread Alexandre Rafalovitch
Have you seen: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E https://wiki.apache.org/solr/SpatialForTimeDurations https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ Regards, Alex. Solr

Re: Bot protection (CAPTCHA)

2015-08-25 Thread Alexandre Rafalovitch
The standard answer is that exposing the API is a REALLY bad idea. To start from, you can issue the delete commands through the API. And they can be escaped in multiple different ways. Plus, you have admin UI there as well to manipulate the cores as well as to see the configuration files for

Re: User Authentication

2015-08-24 Thread Alexandre Rafalovitch
Thanks for the email from the future. It is good to start to prepare for 5.3.1 now that 5.3 is nearly out. Joking aside (and assuming Solr 5.2.1), what exactly are you trying to achieve? Solr should not actually be exposed to the users directly. It should be hiding in a backend only visible to

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-24 Thread Alexandre Rafalovitch
These look like requirements for a generic Solr search, maybe with focus on proximity and/or phrase matching. Perhaps some white-listing filter if you have a fixed set of words you care about. E.g. with KeepWordFilter in the analyzer chain.

Re: Using copyField with dynamicField

2015-08-24 Thread Alexandre Rafalovitch
It should work (at first glance). copyField does support wildcards. Do you have a field called text? Also, your field name and field type text have the same name. Not sure it is the best idea. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Alexandre Rafalovitch
Are you by any chance doing store=true on the fields you want to search? If so, you may want to switch to just index=true. Of course, they will then not come back in the results, but do you really want to sling huge content fields around. The other option is to do lazyLoading=true and not

Re: Solr: How to index range-pair fields?

2015-08-22 Thread Alexandre Rafalovitch
Sorry Venkat, this is pushing beyond my immediate knowledge. You'd just need to experiment. But the document still looks a bit wrong, specifically I don't understand where those extra 366 values are coming from. It should be just a two-dimensional coordinates, first one for start of the range,

Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
I can't find the discussion/presentation about it (about 2 years ago), but basically you can use LatLong geographic field to do this. You represent start date/time on X axis and end date/time on Y axes. Then, for search you intersect it with a rectangle of your desired check dates. Hopefully

Re: Solr Query + vs AND

2015-08-21 Thread Alexandre Rafalovitch
If you can use + and -, please do so. That's what Lucene uses under the covers (MUST, SHOULD, MUST NOT). Anything else is mapping to that. You can also enable the debug flag on your queries and see exactly how the other forms (e.g. AND) are mapped to the underlying Lucene queries. Regards,

Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
On 21 August 2015 at 15:32, vaedama sudheer.u...@gmail.com wrote: presentDays: [ [01 15 366 366], [13, 16, 366, 366], [19, 25, 366, 366] ] This does not look right. Your January 1 2015 should map to a single number, representing 'X' in the coordinates. Your January 15 2015 should map to another

Re: exclude folder in dataimport handler.

2015-08-21 Thread Alexandre Rafalovitch
A transformer on the outer entity will run before the inner entity is invoked. So, you might be able to remove the list of files to ignore before the inner entity starts extracting from them. You could also pre-generate a list of files by doing ls/find with your requirements and then just read

Re: Solr: How to index range-pair fields?

2015-08-21 Thread Alexandre Rafalovitch
These look right. Then, you just play around with mapping. Your dates to coordinates could be as granular as you want as long as they fit into data type. And with this being school, your epochs might be smaller (e.g. semesters) and kept as a separate number. Regards, Alex. Solr

Re: How to use DocumentAnalysisRequestHandler in java

2015-08-20 Thread Alexandre Rafalovitch
If this is for a quick test, have you tried just faceting on that field with document ID set through query? Facet returns the indexed/tokenized items. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 20 August 2015 at

Re: Reindexing

2015-08-19 Thread Alexandre Rafalovitch
Reload will get the new schema definitions. But all the indexed content will stay as is and will probably start causing problems if you changed analyzer definitions seriously. You probably will have to reindex from scratch/external source. Sorry. Solr Analyzers, Tokenizers, Filters, URPs

Re: Solr cache for specific field

2015-08-18 Thread Alexandre Rafalovitch
I am not sure I understand the problem statement. Is it speed? Memory usage? Something very specific about SolrCloud? To me it seems the problem is that your 'fq' _are_ getting cached when you may not want them as the list is different every time. You could disable that cache. Or you could try

Re: Solr cache for specific field

2015-08-18 Thread Alexandre Rafalovitch
Have you tried this with Cache=false? https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters Because the internal representation of the field value already may be doing what you want. And the caching of non-repeating filters is what slowing it down. I would just do that as a

Re: Index very large number of documents from large number of clients

2015-08-15 Thread Alexandre Rafalovitch
This is beyond my direct area of expertise, but one way to look at this would be: 1) Create new collections offline. Down to each of the 6000 clients having its own private collection (embedded SolrJ/server). Or some sort of mini-hubs, e.g. a server per N clients. 2) Bring those collections into

Re: Solr relevant results

2015-08-15 Thread Alexandre Rafalovitch
on values in other fields. And then just order by it. Is that right? On Fri, Aug 14, 2015 at 10:58 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Clarification: In the client that is doing the _indexing_/sending data to Solr. Not the one doing the querying. And custom URP if you can't change

Re: phonetic filter factory question

2015-08-15 Thread Alexandre Rafalovitch
From the teaching to fish category of advice (since I don't know the actual answer). Did you try Analysis screen in the Admin UI? If you check Verbose output mark, you will see all the offsets and can easily confirm the detailed behavior for yourself. Regards, Alex. Solr Analyzers,

Re: Copy fields and appending of values

2015-08-14 Thread Alexandre Rafalovitch
I would not be surprised if default value is assigned AFTER all the copy field is done. That would make a lot more sense. So, you may want to try setting that default value earlier in the indexing process. Specifically, by creating a custom UpdateRequestProcessor chain and using DefaultValue URP:

Re: Solr relevant results

2015-08-14 Thread Alexandre Rafalovitch
What's the search string? Or is the search string irrelevant and that's just your compulsory ordering. Assuming anything that searches has to be returned and has to fit into that order, I would frankly just map your special codes all together to some sort of 'sort order' number. So, Code=C =

Re: Solr relevant results

2015-08-14 Thread Alexandre Rafalovitch
what you are saying about mapping Code to numbers. But can you help with some examples of actual solr queries on how to do this? Thanks On Fri, Aug 14, 2015 at 2:46 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's the search string? Or is the search string irrelevant and that's

Re: Solr relevant results

2015-08-14 Thread Alexandre Rafalovitch
, URPs and even a newsletter: http://www.solr-start.com/ On 14 August 2015 at 23:57, Alexandre Rafalovitch arafa...@gmail.com wrote: My suggestion was to do the mapping in the client, before you hit Solr. Or in a custom UpdateRequestProcessor. Because only your client app knows the order you

Re: Default field for query

2015-08-13 Thread Alexandre Rafalovitch
On 13 August 2015 at 12:19, Scott Derrick sc...@tnstaafl.net wrote: If i specify a search q=foo bar , Is there a way to set a default field if a field is not given? You want 'df' parameter, unless I misunderstood the question? Íf you are using default query parser (e.g. not eDisMax, etc),

Re: Indexed stored

2015-08-13 Thread Alexandre Rafalovitch
Correct. In fact, faceting pulls its values normally from the indexed terms anyway. It completely ignores stored. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 13 August 2015 at 19:49, Nagasharath sharathrayap...@gmail.com wrote: If I just

Re: Cluster down for long time after zookeeper disconnection

2015-08-10 Thread Alexandre Rafalovitch
Did you look at release notes for Solr versions after your own? I am pretty sure some similar things were identified and/or resolved for 5.x. It may not help if you cannot migrate, but would at least give a confirmation and maybe workaround on what you are facing. Regards, Alex. Solr

Re: New Solr installation fails to create collection/core

2015-08-10 Thread Alexandre Rafalovitch
Setup new core instance directory: /var/solr/data/demo ... Failed to create core 'demo' due to: Error CREATEing SolrCore 'demo': Unable to create core [demo] Caused by: /var/solr/data/demo/data Was one of these entries typed by hand? Because I see 'data/demo' and 'demo/data'. Which does not

Re: Embedded Solr stopped to index after a while

2015-08-06 Thread Alexandre Rafalovitch
(shooting in the dark) What does your data directory looks like? File sizes, etc. And which Operating System. 4Gb is when Windows FAT filesystem has a size limit, but it really should not be that. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Embedded Solr now deprecated?

2015-08-05 Thread Alexandre Rafalovitch
I thought the Embedded server was good for a scenario where you wanted quickly to build a core with lots of documents locally. And then, move the core into production and swap it in. So you minimize the network traffic. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a

Re: Initializing core takes very long at times

2015-08-05 Thread Alexandre Rafalovitch
I wonder if that's also something that could be resolved by having a custom Network level handler, on a pure Java level. I see to vaguely recall it was possible. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 5 August 2015

Re: solr status 404 error

2015-08-04 Thread Alexandre Rafalovitch
What do you get at just http://localhost:8080/ ? My guess would be that you may have already had something else on that port and your Solr instance did not actually start. If in doubt, I would test that by bringing your Solr instance down and trying to revisit the URL. You should get a generic

Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread Alexandre Rafalovitch
Did you re-index and commit completely after the definition switch? Looks like internal representation conflict. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 August 2015 at 11:31, wwang525 wwang...@gmail.com wrote: Hi

Re: SOLR 5.3

2015-08-04 Thread Alexandre Rafalovitch
Are you watching lucene-dev list? The discussion is happening there. In short, the preparations have started, but there are things to cleanup and no RCs have been out yet. I don't think even a branch has been cut yet. So, a while to go still. Solr Analyzers, Tokenizers, Filters, URPs and

Re: Can Apache Solr Handle TeraByte Large Data

2015-08-03 Thread Alexandre Rafalovitch
That's still a VERY open question. The answer is Yes, but the details depend on the shape and source of your data. And the search you are anticipating. Is this a lot of entries with small number of fields. Or a - relatively - small number of entries with huge field counts. Do you need to

Re: Documentation for: solr.EnglishPossessiveFilterFactory

2015-08-03 Thread Alexandre Rafalovitch
Seems simple enough that the source answers all the questions: https://github.com/apache/lucene-solr/blob/lucene_solr_4_9/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java#L66 It just looks for a couple of versions of apostrophe followed by s or S.

Re: Can Apache Solr Handle TeraByte Large Data

2015-08-03 Thread Alexandre Rafalovitch
Just to reconfirm, are you indexing file content? Because if you are, you need to be aware most of the PDF do not extract well, as they do not have text flow preserved. If you are indexing PDF files, I would run a sample through Tika directly (that's what Solr uses under the covers anyway) and

Re: Can Apache Solr Handle TeraByte Large Data

2015-08-03 Thread Alexandre Rafalovitch
Well, If it is just file names, I'd probably use SolrJ client, maybe with Java 8. Read file names, split the name into parts with regular expressions, stuff parts into different field names and send to Solr. Java 8 has FileSystem walkers, etc to make it easier. You could do it with DIH, but it

Re: Are Solr releases predictable? Every 2 months?

2015-08-02 Thread Alexandre Rafalovitch
They are not that predictable. Somebody has to volunteer to be a release manager and then there is a flurry of cleanups, release candidates, etc. You can see all that on the Lucene-Dev mailing list. For example, a 5.3 has been proposed (as an idea) on July 30th. But not much happened since. But

Re: Search for All CAPS words

2015-07-30 Thread Alexandre Rafalovitch
Have you tried copyField with different field type for different fields yet? That would be my first step. Make the copied field indexed-only, not stored for efficiency. And you can then either search against that copied field directly or use eDisMax against both fields and give that field a

Re: Search for All CAPS words

2015-07-30 Thread Alexandre Rafalovitch
So, what you want is to duplicate a specific token, rename one of the copies, and inject it with the same offset as the original. So GATE = gate, _gate but gate=gate. That, to me, is a custom token filter. You can probably use KeywordRepeatFilterFactory as a base:

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-27 Thread Alexandre Rafalovitch
Thank you for the update. The MSWord format changed significantly from .doc to .docx so has a different parser I suspect. I would not be surprised if old binary-format parser would miss something exotic in the documents (e.g. content of text boxes or frames). Regards, Alex. Solr

<    4   5   6   7   8   9   10   11   12   13   >