Re: Skip Headers & Footers while text extraction using Apache Tika parsing for PPT & PDF formats

2019-09-04 Thread Alexandre Rafalovitch
I think you have to start from the lowest level and then go up the stack. Solr uses Tika if you use extract handler (and for production you may not want to) Tika uses PDFBox to extract from PDF Searching PDFBox remove headers gets you:

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Alexandre Rafalovitch
What about combining: 1) KeywordRepeatFilterFactory 2) An existing folding filter (need to check it ignores Keyword marked word) 3) RemoveDuplicatesTokenFilterFactory That may give what you are after without custom coding. Regards, Alex. On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld -

Re: Query number of Lucene documents using Solr?

2019-08-26 Thread Alexandre Rafalovitch
Luke may have it at least for a quick check: https://github.com/dmitrykey/luke (part of last/next? version of Lucene now). Regards, Alex. On Mon, 26 Aug 2019 at 16:20, Bram Van Dam wrote: > > Possibly somewhat unusual question: I'm looking for a way to query the > number of *lucene

Re: Solr indexing for unstructured data

2019-08-22 Thread Alexandre Rafalovitch
In Admin UI, there is schema browsing screen: https://lucene.apache.org/solr/guide/8_1/schema-browser-screen.html That shows you all the fields you have, their configuration and their (tokenized) indexed content. This seems to be a good midpoint between indexing and querying. So, I would check

Re: "Missing" Docs in Solr

2019-08-16 Thread Alexandre Rafalovitch
at shouldn't be a problem as updates to aliases are > atomic as I understand them. GC's also are fine during that period. > > It's really weird > > On Fri, Aug 16, 2019 at 3:51 AM Alexandre Rafalovitch > wrote: > > > I would take the server log for those 10 seconds (

Re: "Missing" Docs in Solr

2019-08-16 Thread Alexandre Rafalovitch
I would take the server log for those 10 seconds (plus buffer) and really try to see if something happens in that period. I am thinking an unexpected commit, index large, alias switch. That may help you to narrow down the kind of error. Another option is whether you got empty result or a

Re: get the position of matched word in the response

2019-08-04 Thread Alexandre Rafalovitch
What happens if they search for "hello monkey" and match against "hello my monkeys"? What should it return? Why does your database not contain "hello" instead of 199? I am saying because if your clients are truly searching for just one word, then Solr may be an overkill for you. Perhaps you are

Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
gt; > > Can you please give pointers to look into, We are using DIH for production > > and facing few issues. We need to start phasing out > > > > > > Thanks and Regards, > > Srinivas Kashyap > > > > -Original Message- > > From: Alexandre Rafa

Re: Dataimport problem

2019-07-31 Thread Alexandre Rafalovitch
A couple of things: 1) Solr on Tomcat has not been an option for quite a while. So, you must be running an old version of Solr. Which one? 2) Compare that you have the same Solr config. In Admin UI, there will be all O/S variables passed to the Java runtime, I would check them side-by-side 3) You

Re: Add certain documents right after particular document

2019-07-21 Thread Alexandre Rafalovitch
So, if the recommendations are dynamic and come from outside Solr, why do you need Solr to do anything at this stage? Sounds like the original result list is where Solr responsibility ends. You are not exposing Solr directly to the UI (you should not), so whatever your middleware is, can be coded

Re: Removing message_raw_header prefix from field names

2019-07-15 Thread Alexandre Rafalovitch
That is probably a configuration in solrconfig.xml for the extract handler. If so, you should be able to modify it easily. Regards, Alex On Mon, Jul 15, 2019, 12:46 PM Zheng Lin Edwin Yeo, wrote: > Hi, > > When I index EML files to Solr, I realised that there are certain fields > which

Re: how to use copy filed as only taken after the suffix

2019-07-15 Thread Alexandre Rafalovitch
Hi Uma, You have three options: 1) If you are indexing only that field and not returning to the user, than you can mark it store=false and use one of the many filters to transform your content 2) If you are interested in both storing and indexing, then be aware that the stored representation will

Re: How to query against dynamic fields without listing them all?

2019-07-14 Thread Alexandre Rafalovitch
The other options is to use query field alias: https://lucene.apache.org/solr/guide/8_1/the-extended-dismax-query-parser.html#field-aliasing-using-per-field-qf-overrides (example: https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml#L20 ). This still

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-08 Thread Alexandre Rafalovitch
You may also want to look at the existing systems, such as https://nifi.apache.org/ Regards, Alex. On Mon, 8 Jul 2019 at 08:23, Joseph_Tucker wrote: > > Thanks again. > > I guess I'll have to start researching how to create such custom indexing > scripts and determine which language would be

Re: Solr 6.6.0 - DIH - Multiple entities - Multiple DBs

2019-07-05 Thread Alexandre Rafalovitch
I don't think you should be designing this around DIH. It was never planned for complex scenarios. Or particularly fault tollerant, which you may need. Either use SolrJ or a third party tools that integrate with Solr. Regards, Alex On Fri, Jul 5, 2019, 7:43 AM Joseph_Tucker, wrote: >

Re: Add dynamic field to existing index slow

2019-06-30 Thread Alexandre Rafalovitch
able hard commit and > soft commit per 1000 docs > > I am wondering whether any configuration can speed it > > > > > Sent from Yahoo Mail for iPhone > > > On Sunday, June 30, 2019, 10:39 AM, Alexandre Rafalovitch > wrote: > > Indexing new documents is just

Re: Add dynamic field to existing index slow

2019-06-30 Thread Alexandre Rafalovitch
Indexing new documents is just adding additional segments. Adding new field to a document means: 1) Reading existing document (may not always be possible, depending on field configuration) 2) Marking existing document as deleted 3) Creating new document with reconstructed+plus new fields 4)

Re: Relevance by term position

2019-06-28 Thread Alexandre Rafalovitch
This past thread may be relevant: https://markmail.org/message/aau6bjllkpwcpmro It suggests that using SpanFirst of XMLQueryParser will have automatic boost for earlier matches. The other approach suggested was to use Payloads (which got better since the original thread). Regards, Alex. On

Re: Error 401 and 404 while Solr product search.

2019-06-27 Thread Alexandre Rafalovitch
Those are two different collections. Have you tried solving the easier one (non secure), by accessing /solr/master_doterraUSContent_Index/select in the browser? Seems to be the collection is not there or named differently. So, that would be the first step to check. Regards, Alex. On Thu, 27

Re: [EXTERNAL] - Re: Solr not returning stored field

2019-06-27 Thread Alexandre Rafalovitch
documents I > copied the user value from the response and pasted user:value into the > Admin console and get no results. Yet, in my code using SolrQuery I see a > response for the same user:value with the ranking field. > > > > This makes no sense to me. The Admin console is usually

Re: Solr not returning stored field

2019-06-27 Thread Alexandre Rafalovitch
(If no other SolrJ specific advice shows up) Can you divide the problem in a middle a see what happens and whether the issue is Solr or SolrJ side. Specifically, execute the query directly against Solr and see what happens. Also I would triple-check that the documents you are getting back

Re: sample_techproducts tutorial (8.1 guide) has wrong collectioname?

2019-06-27 Thread Alexandre Rafalovitch
Actually, the tutorial does say "Here’s the first place where we’ll deviate from the default options." and the result name should be techproducts. It is the image that is no longer correct and needs to be updated. And perhaps the text should be made clearer. A pull request with updated image

Re: Encrypting Solr Index

2019-06-25 Thread Alexandre Rafalovitch
No index encryption in the box. I am aware of a commercial solution but no details on how good or what the price is: https://www.hitachi-solutions.com/securesearch/ Regards, Alex On Tue, Jun 25, 2019, 11:32 AM Ahuja, Sakshi, wrote: > Hi, > > I am using solr 6.6 and want to encrypt index

Re: Solr 8.0.0 Customized Indexing

2019-06-25 Thread Alexandre Rafalovitch
You have couple of options to delete: 1) Explicit delete request 2) Expiration management: https://lucene.apache.org/solr/8_1_0//solr-core/org/apache/solr/update/processor/DocExpirationUpdateProcessorFactory.html 3) If you are indexing in clear batches (e.g. monthly, and keep last 3 month), you

Re: Derived Field Solr Schema

2019-06-21 Thread Alexandre Rafalovitch
The easiest way is to do that with Update Request Processors: https://lucene.apache.org/solr/guide/7_7/update-request-processors.html Usually, you would clone a field and then do your transformations. For your specific example, you could use: *) FieldLengthUpdateProcessorFactory - int rather than

Re: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Alexandre Rafalovitch
If you look at the data files, is any extension suddenly taking way more space? That may give a clue. Also is schema the same? Like you did not enable docvalues on strings by default or similar. Regards, Alex On Thu, Jun 13, 2019, 6:19 AM Markus Jelsma, wrote: > Hello, > > We are

Re: ExtractRequestHandler with url instead of path to file

2019-06-12 Thread Alexandre Rafalovitch
Have you tried enabling remoteStreaming? https://lucene.apache.org/solr/guide/8_0/content-streams.html#remote-streaming Regards, Alex. On Wed, 12 Jun 2019 at 08:59, marotosg wrote: > > Hi, > > I would like to make a request to Solr to index documents hosted as urls. > This works when I send

Re: [SPAM] Re: query parsed in different ways in two identical solr instances

2019-06-10 Thread Alexandre Rafalovitch
llation or you may oppose to their use by > written request sent by recorded delivery to The Microsoft Research – > University of Trento Centre for Computational and Systems Biology Scarl, > Piazza Manifattura 1, 38068 Rovereto (TN), Italy. > P Please don't print this e-mail unless you

Re: query parsed in different ways in two identical solr instances

2019-06-10 Thread Alexandre Rafalovitch
r by the cited GDPR. > It is your right to be informed on which of your data are used and how; > you may ask for their correction, cancellation or you may oppose to their > use by written request sent by recorded delivery to The Microsoft Research > – University of Trento Centre for Comp

Re: query parsed in different ways in two identical solr instances

2019-06-06 Thread Alexandre Rafalovitch
Those two queries look same after sorting the parameters, yet the results are clearly different. That means the difference is deeper. 1) Have you checked that both collections have the same amount of documents (e.g. mismatched final commit). Does basic "query=*:*" return the same counts in the

Re: Loading pre created index files into MiniSolrCloudCluster of test framework

2019-06-05 Thread Alexandre Rafalovitch
Is there something special about parent/child blocks you cannot do through JSON? Or XML? Both Solr XML and Solr JSON support it. New style parent/child mapping is also supported in latest Solr but I think it is done differently. Regards, Alex On Wed, Jun 5, 2019, 6:29 PM Pratik Patel,

Re: Adding Multiple JSON Documents

2019-06-03 Thread Alexandre Rafalovitch
Hi John, This may be useful: https://www.slideshare.net/arafalov/json-in-solr-from-top-to-bottom (there is the video of the session at the end too). Basically, we have two ways to process JSON and sometimes they look very similar and you have to be very deliberate in indicating which one is the

Re: Edit rights to apache solr wiki

2019-05-31 Thread Alexandre Rafalovitch
Actually, the WIKI is going away (for all Apache project, not just Solr). So, the preferred way now is to contribute to the Solr Reference Guide, which is developed as part of normal Solr process and can be patched as any other Git-based project:

Re: Solr query with long query

2019-05-30 Thread Alexandre Rafalovitch
You can use POST instead of GET. But you may also want to see if you can refactor those 1500 strings somehow. If you don't use it already, maybe Terms query parser could be useful: https://lucene.apache.org/solr/guide/7_7/other-parsers.html#terms-query-parser Also, if at least some of those

Re: Newbie permissions problem running solr

2019-05-30 Thread Alexandre Rafalovitch
It is a Unix ".." - as in parent directory. So the path would be: /usr/local/solr/example/cloud/node2/logs And I am guessing you have installed Solr with one user and are trying to use it with another. So, maybe a sudo is required. Or maybe you could just download a fresh Solr install, unzip it

Re: basic question about updating a docValue

2019-05-07 Thread Alexandre Rafalovitch
Sounds like you had some documents in the index already from before you made ths change. You may need to delete, commit, reindex. Rather than trying to overwrite. Regards, Alex On Tue, May 7, 2019, 1:03 AM Jerry Lin, wrote: > Hi, > > I'm new to Solr and am using Solr 8, and the Java API

Re: Reverse-engineering existing installation

2019-05-06 Thread Alexandre Rafalovitch
let makes it straightforward to get the canonical XML. > > It looks like our schema.xml files are rather different from files > like solr/example/solr/collection1/conf/schema.xml > > Any suggestions of sections I should focus on? > > On Sat, May 4, 2019 at 8:11 AM Alexan

Re: Reverse-engineering existing installation

2019-05-04 Thread Alexandre Rafalovitch
XMLStarlet still works just fine. So if you want the fast way, that is the one. Otherwise, some xml editors can do it (not sure which ones) or you can look for XSLT or XQuery examples on the web. XMLStarlet actually just spits out XSLT internally, or even externally if you ask. Regards,

Re: Reverse-engineering existing installation

2019-05-02 Thread Alexandre Rafalovitch
My presentation from 2016 may be interesting as I deconstruct a Solr example, including the tips/commands on how to do so: https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016 The commands start around the slide 20. Hope this helps, Alex.

Re: problem indexing GPS metadata for video upload

2019-05-01 Thread Alexandre Rafalovitch
What happens when you run it against a standalone Tika (recommended option anyway)? Do you see the relevant fields? Not every Tika field is captured, that is configured in solrconfig.xml. So if Tika extracts them, next step is to check the mapping. Regards, Alex On Wed, May 1, 2019, 5:38

Re: multi-level Nested entities in dih

2019-04-30 Thread Alexandre Rafalovitch
> Thanks and Regards, > Srinivas Kashyap > > -----Original Message----- > From: Alexandre Rafalovitch > Sent: 30 April 2019 05:06 PM > To: solr-user > Subject: Re: multi-level Nested entities in dih > > DIH may not be able to do arbitrary nesting. And it is not recom

Re: multi-level Nested entities in dih

2019-04-30 Thread Alexandre Rafalovitch
DIH may not be able to do arbitrary nesting. And it is not recommended for complex production cases. However, in general, you also have to focus on what your _search_ will look like. Amd only then think about the mapping. For example, is that whole tree gets mapped to and returned as a single

Re: Problem while indexing DATE field in SOLR.

2019-04-26 Thread Alexandre Rafalovitch
Though one can insert an UpdateRequestProcessor to convert any date format. See solrconfig.xml for how it is setup (as part of 'schemaless' parsing). Regards, Alex On Fri, Apr 26, 2019, 3:57 AM Nicolas Franck, wrote: > Dates need to be send in UTC format: > > -mm-ddTHH:MM:SSZ > > or

Re: "dismax" parameter "bq" filters instead of boosting

2019-04-17 Thread Alexandre Rafalovitch
of bq at least describes it as an "optional" query that only > influences the score, not the result list. > > > > On 16 Apr 2019, at 23:59, Alexandre Rafalovitch wrote: > > > > If you set q.op=OR (and not as 'AND' you defined in your config), you > > will see

Re: "dismax" parameter "bq" filters instead of boosting

2019-04-16 Thread Alexandre Rafalovitch
If you set q.op=OR (and not as 'AND' you defined in your config), you will see the difference between your last two queries. The second last one will show 6 items and the last one still 5. As is, with your custom config, booster query is added as one more clause in the search. q.op=ALL forces it

Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Alexandre Rafalovitch
; and to be honest I don't even remember how I got to the sow parameter... > > and I'm not sure what that means for all other queries I have > > > >Il martedì 16 aprile 2019, 13:09:26 CEST, Alexandre Rafalovitch > > ha scritto: > > > > The issue is t

Re: "dismax" parameter "bq" filters instead of boosting

2019-04-16 Thread Alexandre Rafalovitch
That's a bit "fast" to expect somebody to reproduce this from information given. Or even in general to check the mailing list, given that we are not paid support :-) Could you please 1) Download the latest 8.0 distribution 2) Do one of the basic examples 3) Give the search query that shows

Re: autoGeneratePhraseQueries not working

2019-04-16 Thread Alexandre Rafalovitch
The issue is that the Standard Query Parser does pre-processing of the query and splits it on whitespace beforehand (to deal with all the special syntax). So, if you don't use quoted phrases then by the time the field specific query analyzer chain kicks in, the text is already pre-split and the

Re: Spatial Search using two separate fields for lat and long

2019-04-13 Thread Alexandre Rafalovitch
Specifically, the pre-processing can be done with UpdateRequestProcessors: https://lucene.apache.org/solr/guide/7_2/update-request-processors.html In your case, you probably want to chain *) CloneUpdate:

Re: Real time get - URL size limitation

2019-04-11 Thread Alexandre Rafalovitch
Two quick thoughts without computer access: 1) have you tried post? Usually they do work for all calls. 2) if the list does not change often, you can add it to the request handler definition. Or even as a separate paramset to pass by reference. Either way you would not need to have it in URL every

Re: Solr web crawler with recursive option

2019-04-11 Thread Alexandre Rafalovitch
One of the files that post tool identified as XML is not. Possibly a 404 error or some such. So it is trying to parse the file and sees non-xml content right at start. Or if you are sure it is an XML file, maybe there is a BOM mark. Either way try to isolate the specific file. On a bigger picture

Re: Solr Cache clear

2019-04-08 Thread Alexandre Rafalovitch
You may have warming queries to prepopulate your cache. Check your solrconfig.xml. Regards, Alex On Mon, Apr 8, 2019, 4:16 PM Lewin Joy (TMNA), wrote: > ** PROTECTED 関係者外秘 > How do I clear the solr caches without restarting Solr cluster? > Is there a way? > I tried reloading the

Re: solr tika extraction video creation date problem (hours ahead)

2019-04-05 Thread Alexandre Rafalovitch
Well, Tika would use different libraries to extract different formats. So maybe there is a bug. I would just get a standalone tika (of matching version to the one in Solr) and see what the output from two sample files are. Then, I would check with the latest Tika, just in case. I would also use

Re: solr tika extraction video creation date problem (hours ahead)

2019-04-04 Thread Alexandre Rafalovitch
Sounds like timezone normalization issue. Possibly at Tika stage. Check what your SOLR_TIMEZONE variable set to. Not sure in which file. Regards, Alex On Thu, Apr 4, 2019, 12:50 AM Where is Where, wrote: > Hello , I was following the instruction > >

Re: dataimport for full-import

2019-03-29 Thread Alexandre Rafalovitch
It is probably autocommit setting in your solrconfig.xml. But you may also want to consider indexing into a new core and then doing a core swap at the end. Or re-aliasing if you are running a multiCore collection. Regards, Alex On Fri, Mar 29, 2019, 2:25 AM 黄云尧, wrote: > when I do the

Re: Alternative for DIH

2019-01-31 Thread Alexandre Rafalovitch
Apache NiFi may also be something of interest: https://nifi.apache.org/ Regards, Alex. On Thu, 31 Jan 2019 at 11:15, Mikhail Khludnev wrote: > > Hello, > > I did this deck some time ago. It might be useful for choosing one. >

Re: PatternReplaceFilterFactory problem

2019-01-28 Thread Alexandre Rafalovitch
In Admin UI, there is an Analysis screen. You can enter your text and your query there and see what happens to it at every step of the processing pipeline. This should tell you whether the problem is in indexing, query, or somewhere else entirely (e.g. you are querying a different field as Scott

Re: _version_ field missing in schema?

2019-01-23 Thread Alexandre Rafalovitch
d > because it's easier for me to just use xml to define my schema. Is > there > a preferred approach? I don't (want to) use solr cloud, as for our > use > case a single instance of solr is more than enough. > > Thanks for your help, > Aleks > > Alexandre Rafalovitch writes: >

Re: _version_ field missing in schema?

2019-01-22 Thread Alexandre Rafalovitch
What do you mean schema.xml from managed-schema? schema.xml is old non-managed approach. If you have both, schema.xml will be ignored. I suspect you are not running with the schema you think you do. You can check that with API or in Admin UI if you get that far. Regards, Alex On Tue, Jan

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-14 Thread Alexandre Rafalovitch
I think asking this question on Tika mailing list may give you better answers. Then, if the conclusion is that the behavior is configurable, you can see how to do it in Solr. It may be however, that you need to do the parsing outside of Solr with standalone Tika. Standalone Tika is a production

Re: Question about Solr concept

2019-01-03 Thread Alexandre Rafalovitch
I believe the answer is yes, but specifics depends on whether you mean online or offline index creation (as in when does the content appear) and also why you want to do so. Couple of ideas: 1) If you just want to make sure all updates are visible at once, you can control that with commit

Re: Facing issue while transforming and indexing custom JSON

2018-12-31 Thread Alexandre Rafalovitch
Do you have _src_ field declared in schema? It is just a non-indexed string: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.5.0/solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema#L169 Regards, Alex. On Mon, 31 Dec 2018 at 04:35, Shubhangi Shinde

Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-31 Thread Alexandre Rafalovitch
quot;solr.LogUpdateProcessorFactory" /> > > class="solr.RunUpdateProcessorFactory" /> > > > > > Regards, > Edwin > > On Mon, 31 Dec 2018 at 11:29, Alexandre Rafalovitch > wrote: > > > Specifically, a custome Update Requ

Re: Removing words like "FONT-SIZE: 9pt; FONT-FAMILY: arial" from content

2018-12-30 Thread Alexandre Rafalovitch
Specifically, a custome Update Request Processor chain can be used before indexing. Probably with HTMLStripFieldUpdateProcessorFactory Regards, Alex On Sun, Dec 30, 2018, 9:26 PM Vincenzo D'Amore Hi, > > I think this kind of text manipulation should be done before indexing, if > you have

Re: How to debug empty ParsedQuery from Edismax Query Parser

2018-12-27 Thread Alexandre Rafalovitch
EchoParams=all May also be helpful to pinpoint differences in params from all sources, including request handler defaults. Regards, Alex On Thu, Dec 27, 2018, 8:25 PM Shawn Heisey On 12/27/2018 10:47 AM, Kay Wrobel wrote: > > Now starting from SOLR version 5+, I receive zero (0) results

Re: Solr schema to parse and search source code

2018-12-16 Thread Alexandre Rafalovitch
https://microsoft.github.io/language-server-protocol/ is probably your best bet. But also, perhaps you can just use or start from SourceGraph? https://sourcegraph.com/start Regards, Alex. On Sat, 15 Dec 2018 at 20:26, Steven White wrote: > > Hi everyone, > > I'm in need of providing a search

Re: Case insensitive query for fetching facets

2018-12-07 Thread Alexandre Rafalovitch
If you are on the latest Solr (7.3+), try switching from TextField to SortableTextField in your string_ci definition above. That type implicitly uses docValues and should return original text for faceting purposes, while still allowing analyzers. Regards, Alex. On Thu, 6 Dec 2018 at 08:26,

Re: Can I use configsets with custom stopwords per collection?

2018-12-03 Thread Alexandre Rafalovitch
stop words, but I do not know the expected behavior. Regards, Alex. On Mon, 3 Dec 2018 at 11:05, Alexandre Rafalovitch wrote: > > I am not sure I fully understand what you are saying. > > When you create a collection based on a configset, all the files > should be copied, includin

Re: Can I use configsets with custom stopwords per collection?

2018-12-03 Thread Alexandre Rafalovitch
I am not sure I fully understand what you are saying. When you create a collection based on a configset, all the files should be copied, including the stopwords. You can also provide an absolute path. Solr also supports variable substitutions (as seen in solrconfig.xml library statements), but

Re: Can I use configsets with custom stopwords per collection?

2018-12-03 Thread Alexandre Rafalovitch
The stopwords are defined at the field type level as part of the analyzer chain. So, you have per-field granularity. Not just per-collection. As stop-words are using files (though we have managed version as well, you can share or not-share as much as you want even across different field type

Re: Solr Request Handler

2018-12-03 Thread Alexandre Rafalovitch
You should not be exposing Solr directly to the client, but treating it more as a database. Given that, why would you not write your processing code in that middle-ware layer? Regards, Alex. On Mon, 3 Dec 2018 at 06:43, Lucky Sharma wrote: > > Hi have one scenario, > where I need to make a

Re: Flatten term frequency

2018-11-29 Thread Alexandre Rafalovitch
a bug? Should I submit an issue? > > On Thu, Nov 29, 2018 at 2:03 PM Doug Turnbull < > dturnb...@opensourceconnections.com> wrote: > > > I think the similarity way (setting k1 to 0) or a constant score query are > > probably the best ways. Omitting term freqs and position w

Re: Enquiry about scheduling for re-indexing

2018-11-29 Thread Alexandre Rafalovitch
Solr does not have a built-in scheduler for triggering indexing. Only for triggering commits and purging auto-expiring records. So, if you want to trigger DIH indexing, you need to use an external scheduling mechanism for that. Regards, Alex. On Thu, 29 Nov 2018 at 01:03, Ma Man wrote: > >

Re: Flatten term frequency

2018-11-29 Thread Alexandre Rafalovitch
Perhaps constant score would be useful here: http://lucene.apache.org/solr/guide/7_5/the-standard-query-parser.html#constant-score-with Also, all the options like omitTermFreqAndPositions are described here:

PSA: Activate 2018 videos are now available

2018-11-28 Thread Alexandre Rafalovitch
For all those who wanted to be at the conference for the talks :-) but could not: https://www.youtube.com/watch?v=Hm98XL0Mw5c=PLU6n9Voqu_1HW8-VavVMa9lP8-oF8Oh5t (Plug) Mine was: "JSON in Solr: from top to bottom", video at: https://www.youtube.com/watch?v=WzYbTe3-nFI , slides at:

Re: Query regarding Dynamic Fields

2018-11-27 Thread Alexandre Rafalovitch
However, to add to Edward's message, you can with eDismax create synthetic field names that expand to multiple fields under the covers, using per-field 'qf' parameter. See: https://lucene.apache.org/solr/guide/7_5/the-extended-dismax-query-parser.html#field-aliasing-using-per-field-qf-overrides

Re: Two field phrase search

2018-11-23 Thread Alexandre Rafalovitch
It is not clear how much flexibility you expect in those queries. Can the second word never be full name? Can there be more than 2 words? How do you know the length of the prefix? When you say prefix, do you mean 'jo' is expected to match 'joseph'? So, just generically, I would say why not index

Re: Solr Cloud - Store Data using multiple drives

2018-11-21 Thread Alexandre Rafalovitch
You really have to split your index. The good news is that you can use aliases to search multiples cores at ones. That's probably what you are looking for. So, you can start with one index and when it gets close to capacity, add the second one and the third one, etc. But have an alias that you

Re: querying on field of type string doesn't work as expected

2018-11-19 Thread Alexandre Rafalovitch
You can always replace String type with Text type and KeywordAnalyzer definition. That keeps the whole input as one token, but still allows to modify (e.g. normalize spaces with PatternReplaceCharFilterFactory) or even one of the ICU filters (warning: ICU is dark magic...) Regards, Alex. On

Re: Solr Cloud - Store Data using multiple drives

2018-11-19 Thread Alexandre Rafalovitch
This seems very similar to: https://lists.apache.org/thread.html/48b6dcb20058de29936616633b88d21e1b6f6a32bc968d161eae4a21@%3Csolr-user.lucene.apache.org%3E Regards, Alex. On Mon, 19 Nov 2018 at 11:15, Tech Support wrote: > > Hello Solr Team, > > > > I am using Solr 7.5. , Indexed data stored

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Alexandre Rafalovitch
at 8:36 AM David Hastings > wrote: > > > Which function of the SKG are you using? significantTerms? > > > > On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch > > wrote: > > > > > I think the underscore actually comes from the Shingles (parameter >

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Alexandre Rafalovitch
I think the underscore actually comes from the Shingles (parameter fillerToken). Have you tried setting it to empty string? Regards, Alex. On Thu, 15 Nov 2018 at 17:16, Pratik Patel wrote: > > Hi Markus, > > Thanks for the reply. I tried using ShingleFilter and it seems to > be working.

Re: How to use multiple data drives?

2018-11-15 Thread Alexandre Rafalovitch
You can configure where your data directory is in core.properties: https://lucene.apache.org/solr/guide/7_5/defining-core-properties.html#defining-core-properties-files Or probably via API. Regards, Alex. On Thu, 15 Nov 2018 at 12:45, John Milton wrote: > > Hi Solr Team, > > I have installed

Re: Include date calculation in field list?

2018-11-13 Thread Alexandre Rafalovitch
field list. deadline:[* TO NOW] was > simple enough to convert to function query syntax. But what about something > more complex? It feels strange to have to convert from one syntax to another. > I can't be the only one thinking like this, surely? :) > > /Jimi > > -Ursprun

Re: Include date calculation in field list?

2018-11-12 Thread Alexandre Rafalovitch
Function query looks like the nearest match to your requirement: https://lucene.apache.org/solr/guide/7_5/function-queries.html#ms-function You can use it in the field list too. Regards, Alex. On Mon, 12 Nov 2018 at 09:12, Hullegård, Jimi wrote: > > Hi, > > Maybe I have been working too

Re: Sql server data import

2018-11-09 Thread Alexandre Rafalovitch
Which version of Solr is it? Because we have not used schema.xml for a very long time. It has been managed-schema instead. Also, have you tried using DIH example that uses database and modifying it just enough to read data from your database. Even if it has a lot of extra junk, this would test

Re: Ingesting/Querying Documents with Nested/Related Documents and extracting Full-text

2018-11-08 Thread Alexandre Rafalovitch
The extract handler is mostly there for prototyping purposes. It uses Tika under the covers and you can use that yourself in the client. Given your merge requirements, it would probably be best to have that separated out. In terms of structuring, you can do nested document, combined with [child]

Re: Rename of Category.QUERYHANDLER

2018-11-05 Thread Alexandre Rafalovitch
SOLR-9947 perhaps? On Mon, 5 Nov 2018 at 16:08, Shawn Heisey wrote: > > On 11/5/2018 1:09 PM, Furkan KAMACI wrote: > > Solr 6.3.0 had SolrInfoMBean.Category.QUERYHANDLER. However, I cannot see > > it at Solr 6.5.0. > > You might find it at QUERY instead. I seem to remember some confusion > about

Re: Restrict search on term/phrase count in document.

2018-11-05 Thread Alexandre Rafalovitch
That is kind of unusual. What is the business issue you are trying to solve? Perhaps there is a different way to look at this problem. Regards, Alex On Mon, Nov 5, 2018, 5:20 AM Modassar Ather Hi, > > Is there a way to restrict search with a term/phrase occurring n number of > times in it?

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Alexandre Rafalovitch
Simon Hughes presentation on just finished Activate may be relevant: https://www.slideshare.net/SimonHughes13/vectors-in-search-towards-more-semantic-matching The video will be available in a couple of weeks, I am guessing from LucidWorks channel. Related repos: *)

Re: Merging data from different sources

2018-10-30 Thread Alexandre Rafalovitch
Maybe https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory Regards, Alex On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote: > Hi, > > I am trying to merge files from different sources and with different > content (except for one

Re: partial update in solr

2018-10-29 Thread Alexandre Rafalovitch
I am not sure. I haven't tried this particular path. Your original question was without using SolrJ. Maybe others have. However, I am also not sure how much sense this makes. This Atomic processor is to make it easier to do the merge when you cannot modify the source documents. But if you are

Re: partial update in solr

2018-10-29 Thread Alexandre Rafalovitch
Maybe this was introduced in the later version of Solr. Check the Changes file to compare yours and the releases version. Regards, Alex On Mon, Oct 29, 2018, 6:37 AM Zahra Aminolroaya, wrote: > Thanks Alex. I try the following to set the atomic processor: > >

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch
unless im missing something? > > On Thu, Oct 25, 2018 at 10:41 AM David Hastings < > hastings.recurs...@gmail.com> wrote: > > > Wow, thanks for that. Will do some research and come back with the > > inevitable questions I will have. > > > > On Thu, Oct 25

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch
gt; On Thu, Oct 25, 2018 at 10:25 AM Alexandre Rafalovitch > wrote: > > > That's being worked on as well. We've migrated the documentation from > > Confluence to standalone setup, so not all the pieces are in place > > yet. > > > > Regards, > >

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch
t very > search friendly :) > > On Thu, Oct 25, 2018 at 9:29 AM Alexandre Rafalovitch > wrote: > > > I think you are looking for: > > > > http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs > > > > Or, as a second option, >

Re: Solr 7.5/skg

2018-10-25 Thread Alexandre Rafalovitch
I think you are looking for: http://lucene.apache.org/solr/guide/7_5/json-facet-api.html#semantic-knowledge-graphs Or, as a second option, http://lucene.apache.org/solr/guide/7_5/stream-source-reference.html#significantterms Regards, Alex. On Thu, 25 Oct 2018 at 08:47, David Hastings wrote:

Re: partial update in solr

2018-10-24 Thread Alexandre Rafalovitch
You could use something like AtomicUpdateProcessorFactory: https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory Regards, Alex. On Wed, 24 Oct 2018 at 04:48, Zahra Aminolroaya wrote: > > Does Solr have a partial update like elastic? > > Elastic

Re: Solr filter query on STRING field [Was:Re: solr filter query on text field]

2018-10-24 Thread Alexandre Rafalovitch
First one treats space as end of operation, so the second keyword is searched against default field (id). Try putting the whole thing into the quotes. Or use Field Query Parser: https://lucene.apache.org/solr/guide/7_5/other-parsers.html#field-query-parser Regards, Alex. On Wed, Oct 24, 2018,

Re: Query to multiple collections

2018-10-22 Thread Alexandre Rafalovitch
Have you tried using aliases: http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api You can also - I think - specify a collection of shards/collections directly in the query, but there may be side edge-cases with that (not sure). Regards, Alex. On Mon, 22 Oct 2018 at

<    1   2   3   4   5   6   7   8   9   10   >